Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1001334.1
Update Date:2011-02-22
Keywords:

Solution Type  FAB (standard) Sure

Solution  1001334.1 :   StorEdge A1000, A3000, A3500 requirements for proper operation.  


Related Items
  • Sun Storage A3000 Array
  •  
  • Sun Netra st A1000 Array
  •  
  • Sun Storage A3500 FC Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
201796


Product
Sun StorageTek A1000 Array
Sun StorageTek A3000
Sun StorageTek A3500 Array

Part
  • Part No: 825-3869-02
  • Part Description: MNL Set, SUN RSM ARRAY 2000
  • Part Model: -
Part
  • Part No: 798-0188-01
  • Part Description: SS, CD ASSY, RAID Manager 6.1
  • Part Model: -
Part
  • Part No: 798-0522-01
  • Part Description: CD ASSY, RAID
  • Part Model: Manager6.1.1 (2+) -
Part
  • Part No: 798-0522-02
  • Part Description: CD ASSY, RAID
  • Part Model: Manager6.1.1 (2A) -
Part
  • Part No: 798-0522-03
  • Part Description: CD ASSY, RAID Manager6.1.1 UPDATE 2
  • Part Model: -
Part
  • Part No: 704-6708-10
  • Part Description: CD, SUN STOREDGE RAID Manager6.22
  • Part Model: -
Xoption
  • Xoption Number: -
  • Xoption Description: StorEdge A1000
Xoption
  • Xoption Number: -
  • Xoption Description: StorEdge A3000
Xoption
  • Xoption Number: -
  • Xoption Description: StorEdge A3500
Xoption
  • Xoption Number: -
  • Xoption Description: StorEdge A3500FC
Xoption
  • Xoption Number: 6530A
  • Xoption Description: Sun RSM Array 63GB 15X4GB
Xoption
  • Xoption Number: 6531A
  • Xoption Description: Sun RSM Array 147GB 7X4GB
Xoption
  • Xoption Number: 6532A
  • Xoption Description: A3000 15*4.2GB/7200 FWSCSI
Xoption
  • Xoption Number: 6533A
  • Xoption Description: RSM2000 35*4.2GB/7200 FWSCSI
Xoption
  • Xoption Number: 6534A
  • Xoption Description: A3000 15*9.1GB/7200 FWSCSI
Xoption
  • Xoption Number: 6535A
  • Xoption Description: A3000 35*9.1GB/7200 FWSCSI
Xoption
  • Xoption Number: SG-XARY122A-16G
  • Xoption Description: 16GB StorEdge A1000
Xoption
  • Xoption Number: SG-XARY122A-50G
  • Xoption Description: 50GB StorEdge A1000
Xoption
  • Xoption Number: SG-XARY124A-36G
  • Xoption Description: 36GB StorEdge A1000
Xoption
  • Xoption Number: SG-XARY124A-109G
  • Xoption Description: 109GB StorEdge A1000
Xoption
  • Xoption Number: SG-XARY126A-72G
  • Xoption Description: 72GB StorEdge A1000
Xoption
  • Xoption Number: SG-XARY126A-144G
  • Xoption Description: 144GB StorEdge A1000
Xoption
  • Xoption Number: SG-XARY135A-72G
  • Xoption Description: 72GB StorEdge A1000 For Rack
Xoption
  • Xoption Number: SG-XARY131A-16G
  • Xoption Description: 16GB StorEdge A1000 For Rack
Xoption
  • Xoption Number: SG-XARY133A-36G
  • Xoption Description: 36GB StorEdge A1000 For Rack
Xoption
  • Xoption Number: SG-XARY351A-180G
  • Xoption Description: A3500 1 Cont Mod/5 Trays/18GB
Xoption
  • Xoption Number: SG-XARY351A-360G
  • Xoption Description: A3500 1 Cont Mod/5 Trays/36GB

Impact
This FIN has critical impact to all A3X00/A3500FC/A1000 configurations on
all Sun Ultra Enterprise platforms using all versions of Solaris/SunOS.
Ultra Systems experiencing faults documented in this FAB can be down
for extended periods or until LUN 0 is installed.

The rmlog.log and messages should be checked for errors as there
have been numerous instances of hosts being shutdown while resolution
daemon has been recovering failed I/O's.  In this case, the indications
are that under heavy I/O, recovery of a failed block may not happen for
an hour and 20 minutes.  Customer will likely have rebooted the host
before then, starting the problem over again.

The customer reboots the host because the customer might think that
the resolution daemon is in a hung state.  The customer might hope that
after the reboot the daemon will re-initiate and complete the recovery
process.  Unfortunately, a host reboot is no substitute for the lack of
an optimal LUN 0.  After the reboot and if there is heavy I/O,
the recovery time will takes much longer and the customer
will be likely to reboot the host again hoping this will fix the
problem, but the symptom indicates that the problem is not fixed.


Symptoms
The problems associated with the deletion of LUN 0 include the
inability (or substantial delays) for the resolution daemon to perform
properly during I/O failures.  A test conducted during a customer
escalation verified that the addition of an optimal LUN 0 allowed the
controllers, resolution daemon and associated components providing for
fail-over capability to function properly.

LSI/Symbios have confirmed that the removal of LUN 0 is not a valid or
supported configuration.  While RM6 does allow a user to do this, the
removal of LUN 0 will cause unpredictable behavior, including
incorrect communication problems (through both GUI and CLI) with the
array and data loss due to random LUN failures.

The GUI command to delete a LUN can be applied to LUN 0. The CLI
command raidutil can delete LUN 0 either with "raidutil -D all" or
"raidutil -D 0". At this point the system is vulnerable to losing
communication with the host if either a SCSI bus reset is generated for
any reason or a LIP is generated when using a fibre-channel connection.
Users often want to resize LUN 0 since the factory default is only 10MB.
This involves deleting it and recreating it which opens up a "no LUN 0"
problem window.

The Release Notes Addendum for 6.22 are incorrect as they state that
the LUN 0 can be deleted after new LUNs are created and are optimal.
This even has been clearly documented in Bug Id:  4296354.  Also
documented in this bug are the rules broken for the SCSI 2 & 3
specifications which do not allow for the absence of LUN 0.

Update for FIN I0573-2;
-----------------------
In this -2, the following has been updated to FINI0573-1;

1) The sixth paragraph has been added to the PROBLEM DESCRIPTION
as shown below.

The GUI command to delete a LUN can be applied to LUN 0. The CLI
command raidutil can delete LUN 0 either with "raidutil -D all" or
"raidutil -D 0". At this point the system is vulnerable to losing
communication with the host if either a SCSI bus reset is generated
for any reason or when using a fibre-channel connection, a LIP.
Users often want to resize LUN 0 since the factory default is only
10MB.  This involves deleting it and recreating it which opens up a
"no LUN 0" problem window.

2) The 4th through 9th paragraphs have been added to the CORRECTIVE
ACTION section which describes the commands to be avoided, how to
remake LUN 0, and how to recover and reset the entire array.
BugId: 4313266
ESC:   524844
MANUAL: 805-7758-11 - Sun StorEdge RAID Manager 6.22 Release Notes for
A1000, A3x00, and A3500FC
805-7756-10 - Installation and Support Guide for Solaris
806-0478-10 - Sun StorEdge RAID Manager 6.22 User's Guide
806-3721-10 - Sun StorEdge RAID Manager 6.22 Release Notes
Addendum

Resolution
Enterprise Customers and authorized Field Service Representatives may
avoid the above mentioned problem by following the recommendations
as shown below.

If a host exhibits delays or an inability to recover from I/O faults or
re-balance LUN's, look for the presence (or absence) of an optimal
LUN 0.

On systems having no LUN 0, run RM6 to add an optimal LUN 0 to the
configuration.  On systems without disk space available, consultation
will be required with the customer to architect a workaround to allow
for the addition of LUN 0 on a time and materials basis.

The problem can be avoided by not deleting LUN 0.  This is not
typically a viable solution in that it creates a single point of failure
or wastes a lot of disk space.  Because all the LUNs in a Drive Group
must be the same RAID level, the customer either doesn't use the
remaining capacity of the Drive Group or creates additional RAID 0
LUNs.  LUN 0 comes from the factory on all arrays as a 10MB RAID 0
device which is not a useful size, nor is it intended to be.  While the
size is an arbitrary choice, the RAID level must be 0 because if only 1
disk was available to be seen by the RDAC controllers, that is the only
RAID level it could create (other RAID levels require more than 1 disk
by definition).

Customers should try to give themselves better RAS by recreating LUN 0
as a RAID 1, 3, or 5 LUN.  Since the only time LUN 0 is typically
resized is as part of an initial install, the exposure of recreating
LUN 0 with a different RAID level is minute.

One can delete and recreate any LUN including zero in a single command
line:  "raidutil -D 0 -n 0" but this is not really an atomic operation
as there is one internal operation to delete LUN 0 and then another to
recreate it.

In order to remake LUN 0, make sure that another optimal LUN exists on
the controller, A or B, which owns LUN 0. The default LUN 0 will be on
controller A, unless its been explicitly moved. "lad" or "rdacutil -i
cXtXd0" will show you where the controllers are in the system and which
controller owns the LUNs for an array.  Its much safer to do the LUN 0
deletion and recreation when there is no other activity on the array,
and its SCSI bus or FC loop.

If the entire array is to be re-organized then use the GUI Reset
Configuration command under Configuration->File->Reset Configuration.
It leaves a default LUN 0 on controller A. Make sure you always use the
path to a controller with at least one LUN on it when using the CLI
version "raidutil -c path -X", see bug 4281850. "raidutil -D all"
should never be used.  In extreme cases, there is a serial port command
(Syswipe) that will completely clean up the configuration, losing all data.

If the array should get into a state where there is no LUN 0, then
powering the array off and back on will cause it to go through Start of
Day (SOD) processing which creates a default LUN 0.  In this case, only
the controller modules need to be power cycled, not all the trays.  A
host reboot will not accomplish the same thing, unless one is running
RM 6.1.1 or earlier.

The Release Notes Addendum to RM 6.22, 806-3721-10, describes safe
procedures for creating a new LUN 0.  It also contains a procedure for
restoring communication to a FC array in the rare case communication is
lost during the above operations.  The addendum is available internally at;

  http://download.oracle.com/docs/cd/E19957-01/806-3721-10/806-3721-10.pdf

While this FIN initially targets RM6 managed arrays A1000/A3x00/
A3500FC, there is a high probability that this problem effects other or
all of Sun's LUN based storage arrays.  Sun is testing its other LUN
based arrays to determine if any violate the SCSI 2 or 3 specification
rule which requires the presence of an optimal LUN 0 even after
additional LUNs are created.
Comments
While leaving LUN 0 "alone" as a RAID 0 LUN is OK, it is not without
it's own set of risks.  First of all, customer will not want to just
"throw-away" the remaining capacity of the Drive Group (Total capacity
minus 10MB) and will probably want to build other LUNs in the remaining
capacity.  As such, they would have to be the same RAID Level.  The
use of a RAID 0 LUN *anywhere* in a RAID Module opens the customer up
to a single point of failure.

Modification History
Date: 18-APR-2006
  • Updated URL to RM 6.22 release notes

 


Previously Published As
100074
Internal Eng Business Unit Group
KE Authors
Internal Kasp FAB Legacy ID
100074, I0573-2 (FIN)
Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date:
Avoidance: Patch
Responsible Manager: null
Original Admin Info: null
Internal SA-FAB Eng Submission
StorEdge A1000, A3000, A3500,
Product_uuid
2a792916-0a18-11d6-8d0a-c3d03933af3c|Sun StorageTek A1000 Array
2a7ca41a-0a18-11d6-82f2-e96014c515ea|Sun StorageTek A3000
2a8022d4-0a18-11d6-8043-ee5a180fdb7f|Sun StorageTek A3500 Array

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback