Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1001334.1 : StorEdge A1000, A3000, A3500 requirements for proper operation.
PreviouslyPublishedAs 201796 Product Sun StorageTek A1000 Array Sun StorageTek A3000 Sun StorageTek A3500 Array Part
Impact This FIN has critical impact to all A3X00/A3500FC/A1000 configurations on all Sun Ultra Enterprise platforms using all versions of Solaris/SunOS. Ultra Systems experiencing faults documented in this FAB can be down for extended periods or until LUN 0 is installed. The rmlog.log and messages should be checked for errors as there have been numerous instances of hosts being shutdown while resolution daemon has been recovering failed I/O's. In this case, the indications are that under heavy I/O, recovery of a failed block may not happen for an hour and 20 minutes. Customer will likely have rebooted the host before then, starting the problem over again. The customer reboots the host because the customer might think that the resolution daemon is in a hung state. The customer might hope that after the reboot the daemon will re-initiate and complete the recovery process. Unfortunately, a host reboot is no substitute for the lack of an optimal LUN 0. After the reboot and if there is heavy I/O, the recovery time will takes much longer and the customer will be likely to reboot the host again hoping this will fix the problem, but the symptom indicates that the problem is not fixed. Symptoms The problems associated with the deletion of LUN 0 include the inability (or substantial delays) for the resolution daemon to perform properly during I/O failures. A test conducted during a customer escalation verified that the addition of an optimal LUN 0 allowed the controllers, resolution daemon and associated components providing for fail-over capability to function properly. LSI/Symbios have confirmed that the removal of LUN 0 is not a valid or supported configuration. While RM6 does allow a user to do this, the removal of LUN 0 will cause unpredictable behavior, including incorrect communication problems (through both GUI and CLI) with the array and data loss due to random LUN failures. The GUI command to delete a LUN can be applied to LUN 0. The CLI command raidutil can delete LUN 0 either with "raidutil -D all" or "raidutil -D 0". At this point the system is vulnerable to losing communication with the host if either a SCSI bus reset is generated for any reason or a LIP is generated when using a fibre-channel connection. Users often want to resize LUN 0 since the factory default is only 10MB. This involves deleting it and recreating it which opens up a "no LUN 0" problem window. The Release Notes Addendum for 6.22 are incorrect as they state that the LUN 0 can be deleted after new LUNs are created and are optimal. This even has been clearly documented in Bug Id: 4296354. Also documented in this bug are the rules broken for the SCSI 2 & 3 specifications which do not allow for the absence of LUN 0. Update for FIN I0573-2; ----------------------- In this -2, the following has been updated to FINI0573-1; 1) The sixth paragraph has been added to the PROBLEM DESCRIPTION as shown below. The GUI command to delete a LUN can be applied to LUN 0. The CLI command raidutil can delete LUN 0 either with "raidutil -D all" or "raidutil -D 0". At this point the system is vulnerable to losing communication with the host if either a SCSI bus reset is generated for any reason or when using a fibre-channel connection, a LIP. Users often want to resize LUN 0 since the factory default is only 10MB. This involves deleting it and recreating it which opens up a "no LUN 0" problem window. 2) The 4th through 9th paragraphs have been added to the CORRECTIVE ACTION section which describes the commands to be avoided, how to remake LUN 0, and how to recover and reset the entire array. BugId: 4313266 ESC: 524844 MANUAL: 805-7758-11 - Sun StorEdge RAID Manager 6.22 Release Notes for A1000, A3x00, and A3500FC 805-7756-10 - Installation and Support Guide for Solaris 806-0478-10 - Sun StorEdge RAID Manager 6.22 User's Guide 806-3721-10 - Sun StorEdge RAID Manager 6.22 Release Notes Addendum Resolution Enterprise Customers and authorized Field Service Representatives may avoid the above mentioned problem by following the recommendations as shown below. If a host exhibits delays or an inability to recover from I/O faults or re-balance LUN's, look for the presence (or absence) of an optimal LUN 0. On systems having no LUN 0, run RM6 to add an optimal LUN 0 to the configuration. On systems without disk space available, consultation will be required with the customer to architect a workaround to allow for the addition of LUN 0 on a time and materials basis. The problem can be avoided by not deleting LUN 0. This is not typically a viable solution in that it creates a single point of failure or wastes a lot of disk space. Because all the LUNs in a Drive Group must be the same RAID level, the customer either doesn't use the remaining capacity of the Drive Group or creates additional RAID 0 LUNs. LUN 0 comes from the factory on all arrays as a 10MB RAID 0 device which is not a useful size, nor is it intended to be. While the size is an arbitrary choice, the RAID level must be 0 because if only 1 disk was available to be seen by the RDAC controllers, that is the only RAID level it could create (other RAID levels require more than 1 disk by definition). Customers should try to give themselves better RAS by recreating LUN 0 as a RAID 1, 3, or 5 LUN. Since the only time LUN 0 is typically resized is as part of an initial install, the exposure of recreating LUN 0 with a different RAID level is minute. One can delete and recreate any LUN including zero in a single command line: "raidutil -D 0 -n 0" but this is not really an atomic operation as there is one internal operation to delete LUN 0 and then another to recreate it. In order to remake LUN 0, make sure that another optimal LUN exists on the controller, A or B, which owns LUN 0. The default LUN 0 will be on controller A, unless its been explicitly moved. "lad" or "rdacutil -i cXtXd0" will show you where the controllers are in the system and which controller owns the LUNs for an array. Its much safer to do the LUN 0 deletion and recreation when there is no other activity on the array, and its SCSI bus or FC loop. If the entire array is to be re-organized then use the GUI Reset Configuration command under Configuration->File->Reset Configuration. It leaves a default LUN 0 on controller A. Make sure you always use the path to a controller with at least one LUN on it when using the CLI version "raidutil -c path -X", see bug 4281850. "raidutil -D all" should never be used. In extreme cases, there is a serial port command (Syswipe) that will completely clean up the configuration, losing all data. If the array should get into a state where there is no LUN 0, then powering the array off and back on will cause it to go through Start of Day (SOD) processing which creates a default LUN 0. In this case, only the controller modules need to be power cycled, not all the trays. A host reboot will not accomplish the same thing, unless one is running RM 6.1.1 or earlier. The Release Notes Addendum to RM 6.22, 806-3721-10, describes safe procedures for creating a new LUN 0. It also contains a procedure for restoring communication to a FC array in the rare case communication is lost during the above operations. The addendum is available internally at; http://download.oracle.com/docs/cd/E19957-01/806-3721-10/806-3721-10.pdf While this FIN initially targets RM6 managed arrays A1000/A3x00/ A3500FC, there is a high probability that this problem effects other or all of Sun's LUN based storage arrays. Sun is testing its other LUN based arrays to determine if any violate the SCSI 2 or 3 specification rule which requires the presence of an optimal LUN 0 even after additional LUNs are created.Comments While leaving LUN 0 "alone" as a RAID 0 LUN is OK, it is not without it's own set of risks. First of all, customer will not want to just "throw-away" the remaining capacity of the Drive Group (Total capacity minus 10MB) and will probably want to build other LUNs in the remaining capacity. As such, they would have to be the same RAID Level. The use of a RAID 0 LUN *anywhere* in a RAID Module opens the customer up to a single point of failure. Date: 18-APR-2006
Previously Published As 100074 Internal Eng Business Unit Group KE Authors Internal Kasp FAB Legacy ID 100074, I0573-2 (FIN) Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: Avoidance: Patch Responsible Manager: null Original Admin Info: null Internal SA-FAB Eng Submission StorEdge A1000, A3000, A3500, Product_uuid 2a792916-0a18-11d6-8d0a-c3d03933af3c|Sun StorageTek A1000 Array 2a7ca41a-0a18-11d6-82f2-e96014c515ea|Sun StorageTek A3000 2a8022d4-0a18-11d6-8043-ee5a180fdb7f|Sun StorageTek A3500 Array Attachments This solution has no attachment |
||||||||||||
|