Document Audience: | INTERNAL |
Document ID: | I0551-1 |
Title: | Boot process and controller on-line process may take hours in systems with large StorEdge A3000, A3500 or A1000 configs |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2002-10-31 |
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0551-1
Synopsis: Boot process and controller on-line process may take hours in systems with large StorEdge A3000, A3500 or A1000 configsCreate Date: Oct/31/02
Keywords:
Boot process and controller on-line process may take hours in systems with large StorEdge A3000, A3500 or A1000 configs
Top FIN/FCO Report: No
Products Reference: Large StorEdge A1000, A3000 or A3500 Configurations
Product Category: Storage / Sw Admin;
Product Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- ANYSYS - System Platform Independent -
X-Options Affected
------------------
- A1000 ALL StorEdge A1000 -
- A3000 ALL StorEdge A3000 -
- A3500 ALL StorEdge A3500 -
Parts Affected:
Part Number Description Model
----------- ----------- -----
798-1036-01 CD Assy RAID MGR 6.1.1 -
704-6708-10 CD SUN STOREDGE RAID MGR 6.22 -
704-7937-05 CD SUN STOREDGE RAID Manager6.22.1 -
References:
BugId: 4238051 - sd: (sparc) device probe extremely slow when multiple
LUNs per target.
4630273 - installing rm6 on Solaris 9 leads to a long pause at
system boot.
ESC: 523848 - E10k/A3500 cluster - SCSI-maintenance - system
performance/hang afterwards.
539493 - bug 4238051 /sd.conf entries cause excessive boot times
w/A3500 and EMC
Issue Description:
When an A3x00/A3500FC is installed by default, it supports eight
different LUNs with same targets before starting next target. The
default targets of the Raid Controllers are 4 and 5.
As part of the Raid Manager 6 installation, a modification has been made
to the /kernel/drv/sd.conf file. It will include a Raid Manager section
for Targets 1-15 and LUNs 1-7 as shown below:
# BEGIN RAID Manager additional LUN entries
# DO NOT EDIT from BEGIN above to END below...
name="sd" class="scsi"
target=0 lun=1;
name="sd" class="scsi"
target=0 lun=2;
name="sd" class="scsi"
target=0 lun=3;
name="sd" class="scsi"
target=0 lun=4;
name="sd" class="scsi"
target=0 lun=5;
name="sd" class="scsi"
target=0 lun=6;
name="sd" class="scsi"
target=0 lun=7;
.
.
.
(Middle portion ommitted for reading)
.
.
.
name="sd" class="scsi"
target=15 lun=1;
name="sd" class="scsi"
target=15 lun=2;
name="sd" class="scsi"
target=15 lun=3;
name="sd" class="scsi"
target=15 lun=4;
name="sd" class="scsi"
target=15 lun=5;
name="sd" class="scsi"
target=15 lun=6;
name="sd" class="scsi"
target=15 lun=7;
# END RAID Manager additional lun entries
There are two problems with this:
1. During the booting process of the node, the sd driver will timeout
for every non-existent LUN. If multiple A3x00's or A1000's are attached,
it takes at least an hour to complete reboot cycle time for supporting
the first 16 LUNs. If 16-32 LUNs have to be supported on the A3x00 or
the A1000, the reboot cycle time takes even longer to complete.
2. During the process of bringing a StorEdge A3500 or A3000 controller
back online, it will run a drvconfig. If the sd.conf file contains
extra (unused) targets and LUNs, the process for rescanning the
device tree for extra (unused) targets and LUNs may even take longer
to complete a reboot on an Enterprise 10000 system.
3. On Solaris 9, probing all the device nodes put under rdnexus by RM6 is
slowed by a kernel change, which is in power management code, as described
in bug 4630273. This new code spends about 1 second to scan every third
rdriver instance in the Solaris device tree. The duration of the slowdown
can be understood in some simple math:
X = total number of rdnexus entry in /kernel/drv/rdnexus.conf
example of rdnexus entry:
name="rdnexus" parent="pseudo" instance=0;
V = total number of rdriver generic module entry in
/kernel/drv/rdriver.conf.
example of rdriver generic module entry:
name="rdriver" parent="rdnexus" target=4 lun=0;
N = the total number of rdriver instances in the Solaris device tree
T = the slowdown duration in seconds
N = X*V
T = N/3
The V value can increase by running the add16lun.sh, add32lun.sh
scripts, changing sd.conf, running genscsiconf(1m), changing the
target id on the Rdac controller, and attaching other types of
arrays. The rdriver generic module entries are always in a symetric
format, ie each target will have the same number of luns, so one can
figure out the V value by multiply LUNs supported by number of
devices listed in rdriver.conf. In the case of 8 lun supported in
A1000, X=64 and if default target id and rmparam are used, V=8*3 for
target 0,4,5 listed in rdriver.conf. T=64*8*3/3=512 seconds.
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
Corrective Action:
Enterprise Customers and Authorized Enterprise Field Service
Representatives may avoid the above mentioned problem with
A3X00 and A1000 boot delays by performing the recommendation
shown below:
Below is a step-by-step procedure to speed up the reboot time,
as well as bringing back the Raid Controller from an "offline"
state to an "online" state. The main purpose for this procedure
is to make sure that the drvconfig doesn't have to negotiate
three non-existent LUNs.
Note: A3x00 Raid Controllers are Targets 4 and 5 by default setting.
1. From the command prompt, type the following:
# /etc/raid/bin/lad
c1t5d0s0 1T92401270 LUNS: 0 1 2
c2t5d0s0 1T92401348 LUNS: 0 1 2
c3t4d3s0 1T92600542 LUNS: 3 4 5
c4t5d0s0 1T92400129 LUNS: 0 1 2
c7t4d3s0 1T92401081 LUNS: 3 4 5
c8t4d3s0 1T92401082 LUNS: 3 4 5
Every Raid Controller in these configurations are set to target t4
(eg c3[t4]d3s0) or target t5 (c1[t5]d0s0) from the /etc/raid/bin/lad
output. These are the target entries that need to be retained in
the /kernel/drv/sd.conf file. These target numbers may vary between
different configurations.
2. From the command editor, edit the /kernel/drv/sd.conf
In the "RAID Manager" section, delete ALL targets
except 4 and 5 which the A3x00's Raid Controllers must use.
3. The new sd.conf for the "RAID Manager" section should look
like this after the modification:
# BEGIN RAID Manager additional LUN entries
# DO NOT EDIT from BEGIN above to END below...
name="sd" class="scsi"
target=4 lun=1;
name="sd" class="scsi"
target=4 lun=2;
name="sd" class="scsi"
target=4 lun=3;
name="sd" class="scsi"
target=4 lun=4;
name="sd" class="scsi"
target=4 lun=5;
name="sd" class="scsi"
target=4 lun=6;
name="sd" class="scsi"
target=4 lun=7;
name="sd" class="scsi"
target=5 lun=1;
name="sd" class="scsi"
target=5 lun=2;
name="sd" class="scsi"
target=5 lun=3;
name="sd" class="scsi"
target=5 lun=4;
name="sd" class="scsi"
target=5 lun=5;
name="sd" class="scsi"
target=5 lun=6;
name="sd" class="scsi"
target=5 lun=7;
# END RAID Manager additional LUN entries
Note: Targets 4 and 5 LUN 0 are not shown because they have
already been defined in the sd.conf near the top.
4. The booting cycle time should be reduced when performing the reboot
after saving the file. A reconfiguration reboot is not necessary
on the A3x00(s) or A1000(s).
NOTE: The configuration is not set at targets 4 and 5 by default on
every A1000(s), it is necessary to make sure that these entries go
into the "RAID Manager" section of sd.conf. Otherwise after
rebooting, the RM6 SW will not see your A1000 Raid Module.
This procedure may affect third party multi-LUN devices. If the
systems have third party multi-LUN devices, verify their target ID
setting and make sure you do not disable them with this procedure.
One could reduce the slowdown in boot time, especially on Solaris 9
systems, by removing rdnexus entries in /kernel/drv/rdnexus.conf. The
total number of rdnexus entries that can be removed depends on
individual systems but general guide lines are as following:
. Use 'ls /devices/pseudo | grep rdnexus' to check the number of
rdnexus nodes been used.
. Leave enough entries for future expansion. Each rdnexus node
represents a HBA port connected to a Rdac controller. In general,
16 rdnexus entries in /kernel/drv/rdnexus.conf is sufficient for
systems with 4 arrays or less.
. Remove the rdnexus entries by starting from highest instance number.
. Some of the test results show good improvement in boot time.
system configuration:
A1000 with 32 lun support, V=96
Solaris 8 with 64 rdnexus entry boot time = 3 minutes
Solaris 9 with 64 rdnexus entry boot time = 39 minutes
Solaris 9 with 16 rdnexus entry boot time = 13 minutes
Comments:
--------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
-------------------------------------------------------------------------