Asset ID: |
1-72-1439858.1 |
Update Date: | 2012-06-28 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1439858.1
:
Sun Storage 7000 Unified Storage System: All SMB (Windows) shares inaccessible
Related Items |
- Sun Storage 7310 Unified Storage System
- Sun Storage 7410 Unified Storage System
- Sun ZFS Storage 7120
- Sun Storage 7110 Unified Storage System
- Sun ZFS Storage 7320
- Sun ZFS Storage 7420
- Sun Storage 7210 Unified Storage System
|
Related Categories |
- PLA-Support>Sun Systems>DISK>NAS>SN-DK: 7xxx NAS
- .Old GCS Categories>Sun Microsystems>Storage - Disk>Unified Storage
|
In this Document
Created from <SR 3-3740498461>
Applies to:
Sun Storage 7110 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7310 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7320 - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7420 - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7410 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
7000 Appliance OS (Fishworks)
Symptoms
Attempting to access any share on the appliance prompts users for username/password or fails with access denied.
Cause
1. Lost connection with Active Directory Domain Controller.
2. Failure of SMB server service.
Solution
Once connected to an Active Directory Domain, the ZFS Storage Appliance maintains a connection with the Domain Controller that was used for the Domain join operation. The connection is used to provide pass-through authentication for members of the Domain as they access resources on the ZFSSA.
If this connection is broken, pass-through authentication fails, and the ZFSSA will be unable to authenticate clients, usually causing them to be prompted for a user name and password that cannot be satisfied by any Domain credentials.
Therefore, the first step in diagnosing an authentication problem is to check the state of this connection. This can be done at the BUI or the CLI administration interface by navigating to Configuration / Services / AD. If the appliance is actively connected to a Domain Controller, it will see an AD server with a host name and IP address. If it is not actively connected to a DC, it will display an indication that no server was found, as in the example below (from the BUI):
Actively Connected:
Mode: Domain
Domain: naslab2k8.west.oracle.com
Selected Domain Controller: naslab03.naslab2k8.oracle.sun.com (192.168.2.22)
Not Connected:
Mode: Domain
Domain: naslab2k8.west.sun.com
Selected Domain Controller: None
--OR--
Selected Domain Controller: (0.0.0.0)
The connection can be lost due to software upgrade, temporary or permanent AD server unavailability or a DNS or other network outage. The 2011.1 system software has an improved ability to reconnect when this occurs, but it remains possible that manual intervention will be required.
The solution to this is to manually rejoin the AD Domain. Follow these steps from the same Configuration / Services / AD screen:
- Click "Join Workgroup"
- Enter a workgroup name
- Confirm
- Click "Join Domain"
- Enter admin credentials and rejoin the domain.
If assistance with these steps is required or system is unable to join successfully, refer to <Document:1402353.1>.
On the other hand, if it is identified that the system has an active connection to AD, and all users remain unable to authenticate, there is most likely a problem with the SMB server service.
At this point, it is best to engage a ZFSSA Technical Support engineer with opening a Service Request. The responsible engineer for your Service Request will either provide instructions to collect advanced diagnostic data, or make arrangements to connect to the system remotely and collect the data themselves.
Support note:
First check /var/ak/logs/debug.sys for a message similar to the following:
Jul 23 09:07:28 server-1 smbd[5293]: [ID 216626 daemon.debug] smbrdr[43]: reply mismatch (117)
Jul 23 09:07:28 server-1 smbd[5293]: [ID 216626 daemon.debug] smbrdr[117]: reply mismatch (43)
Jul 23 09:07:28 server-1 smbd[5293]: [ID 898164 daemon.debug] smbrdr_tree_connectx: REPLY_MESSAGE_MISMATCH
If these messages are seen, the problem is almost certainly <Sunbug:6975798> P3 utility/cifs CIFS server cannot recover from DC reply mismatch in libsmbrdr.
There are binary fixes available for recent versions of 2010.Q3. If the customer is on 2010.Q3 and has no immediate plans to upgrade, leave them on this version and request the binary relief. This is because we don't yet have sufficient evidence that 2011.1 fixes the problem entirely. However, the problem should not be present on 2011.1, if you encounter it there, you'll need to get RPE/backline involved.
If the "reply mismatch" messages are not seen, it's likely one of the issues currently in RPE. In this case, after verifying that the service is online, a crashdump should be obtained, as the problem is not well understood at this time.
If the customer is unwilling to reboot, get a gcore of smbd (gcore -g `pgrep smbd`) and a diagnostic, and suggest that they get a crashdump if there's a future occurrence.
If there is no time to wait for support and a urgent need to get the service back online immediately, there are two options.
First, restart the SMB server service. To do this, navigate to Configuration/Services in the BUI and click the circular arrows icon next to the SMB service. This will restart the service and most likely bring it back online.
However, note that it will also make it nearly impossible for ZFSSA Technical Support to determine the cause of the event.
The other option is to perform what is called a Diagnostic Reboot. To do this, from any BUI screen, click the circular arrows icon at the top left of the screen, just beneath the Oracle logo. This will bring up a confirmation box in which the "Gather Diagnostics" checkbox must be selected and then the system needs to be rebooted.
This reboot takes a bit longer than a normal reboot, as the diagnostic data must be read, compressed and written. The time that this operation requires is directly related to the amount of memory in the system, and may take several minutes on large configurations.
Back to <Document 1402353.1> Sun Storage 7000 Unified Storage System: How to Troubleshoot Active Directory Issues.
References
<NOTE:1402353.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Active Directory Issues
@ <BUG:6975798> - CIFS SERVER CANNOT RECOVER FROM DC REPLY MISMATCH IN LIBSMBRDR
Attachments
This solution has no attachment