Document Audience: | INTERNAL |
Document ID: | I1125-1 |
Title: | StorEdge 6120 and 6320 Arrays with firmware prior to 3.1.x may experience downtime or data integrity if double disk errors or back-end loop hangs occur. |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2004-11-17 |
___________________________________________________
*** Sun Confidential: Internal Use and Authorized VARs Only ***
__________________________________________________________________
This message including any attachments is confidential information
of Sun Microsystems, Inc. Disclosure, copying or distribution is
prohibited without permission of Sun. If you are not the intended
recipient, please reply to the sender and then delete this message.
__________________________________________________________________
FIELD INFORMATION NOTICE
(For Authorized Distribution by Sun Services)
FIN #: I1125-1
Synopsis: StorEdge 6120 and 6320 Arrays with firmware prior to 3.1.x may experience downtime or data integrity if double disk errors or back-end loop hangs occur.Create Date: Oct/29/04
SunAlert: No
Top FIN/FCO Report: Yes
Products Reference: Sun StorEdge 6120/6320 Arrays
Product Category: Storage / SW Admin
Product Affected:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- Anysys - System Platform Independent -
X-options Affected:
--------- --------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- SE6120 ALL Sun StorEdge 6120 Array -
- SE6320 ALL Sun StorEdge 6320 Array -
Parts Affected:
----------------------
Part Number Description Model
----------- ----------- -----
540-5559-XX ASSY T4 2.5 CONTRLR W/SLED -
References:
BugId: Refer to Patch README file for a complete list of fixed bugs.
PatchId: 115179-11: 6120 Array FW patch ID.
116655-02: 6120 Mgmt Software Patch ID.
115589-06: 6320 SP Software Patch ID.
115179-11: 6320 Firmware Patch ID.
114591-20: 6320 StorADE Patch ID.
116819-01: 6320 NTC Firmware Patch ID.
117106-01: 6320 FBR Patch ID.
116656-02: 6320 Management Software.
113193-04: 6320 Patchpro Patch ID.
Manual: 819-0051-10: Disk Scrubbing Sun StorEdge Arrays.
817-0201-19: Product Release Notes for 6120.
816-7880-19: Product Release Notes for 6320.
URL: https://spe.sun.com/spx/control/Login
http://ns-qcc.ebay/HealthCheck/index.html
http://sunsolve.sun.com/handbook_pub/Systems/6120/docs.html
http://sunsolve.central.sun.com/handbook_internal/Systems/6120/docs.html
http://sunsolve.sun.com/handbook_pub/Systems/6320/docs.html
http://sunsolve.central.sun.com/handbook_internal/Systems/6320/docs.html
http://sdpsweb.central/FIN_FCO/FIN/FINI1125-1_dir/SPE/Cust_List.sxc
http://sdpsweb.central/FIN_FCO/FIN/FINI1125-1_dir/SPE/Cust_Letter.sxw
Issue Description:
-------------------------------------------------------------------------
| Change History: |
| =============== |
| Oct/29/04 Updated Step 1 and Step 2 in Corrective Action. |
| |
| Bug 5065023. |
-------------------------------------------------------------------------
Sun StorEdge 6120/6320 Arrays with firmware (FW) versions earlier than 3.1.x
may cause system downtime and data integrity if double disk errors or back-end
loop hangs occur.
The configurations affected by this issue are as follows:
. Any storage 6120/6320 arrays running 3.0.x, or 3.1.0, 3.1.1, 3.1.2 FW.
It should be noted that, while 3.1.0, 3.1.1, 3.1.2 and 3.1.3 have the disk
scrubber and BEFIT features available, there have been quality improvements
in 3.1.4 or later which would make it beneficial to upgrade those systems
as well.
To determine the firmware version for an array, log on to the array, and
run the "ver" command at the array prompt.
These issues are addressed by the following software patches.
. For Storage 6120 Arrays:
Array FW patch ID: 115179-11
Mgmt Software Patch ID: 116655-02
. For Storage 6320 Arrays:
SP Software Patch ID: 115589-06
Firmware Patch ID: 115179-11
StorADE Patch ID: 114591-20
NTC Firmware Patch ID: 116819-01
FBR Patch ID: 117106-01
Management Software: 116656-02
Patchpro Patch ID: 113193-04
The new array firmware provides these features.
Back End Fault Isolation:
-------------------------
BEFIT monitors the back end loops of the array and isolates
failed components which may hang both backend loops. As an
example, if a drive fails and hangs both loops, BEFIT would
detect the loops being hung and start the process of isolating
each drive to find the faulty one. Once found, that drive will
be taken offline and the system will continue to run.
Disk Scrubber:
--------------
Disk scrubber is a software component that continually checks
storage array disk drives for latent media read error conditions
and fixes them automatically. Disk Scrubbing is very important
to maintain the high availability of array volumes. The disks
are electromechanical devices and have potential to get media
read error related problems. The best practice is to run the
Disk Scrubber continuously in background which fixes the media
error conditions.
Explicit LUN Failover:
----------------------
ELF provides a way for multi-pathing host driver to manage the
LUN ownership so that the ping-pong effect that exists in
Implicit LUN Failover implementations can be eliminated.
Implicit LUN failover will be disabled when Explicit LUN
failover is used. Explicit LUN failover and Implicit LUN
failover are mutually exclusive.
Other Changes:
--------------
. Better I/O Queue depth management
. 4-node cluster support
. Codebase was C-styled and lint cleaned up
. The latest disk drive firmware included.
To review the root cause of an issue, please refer to the Disk Scrubber
White Paper document on the following documents.
. Manual P/N 819-0051-10 - Disk Scrubbing Sun StorEdge Arrays.
817-0201-19 - Product Release Notes for 6120.
816-7880-19 - Product Release Notes for 6320.
. Sun System Handbook URL for 6120 on:
http://sunsolve.sun.com/handbook_pub/Systems/6120/docs.html
http://sunsolve.central.sun.com/handbook_internal/Systems/6120/docs.html
. Sun System Handbook URL for 6320 on:
http://sunsolve.sun.com/handbook_pub/Systems/6320/docs.html
http://sunsolve.central.sun.com/handbook_internal/Systems/6320/docs.html
Implementation:
---
| X | MANDATORY (Fully Proactive)
---
---
| | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
Corrective Action:
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned issue.
Follow the steps below to upgrade array, disk, and system firmware to the
recommended levels. Please use the following Customer List to identify
sites which may be affected.
http://sdpsweb.central/FIN_FCO/FIN/FINI1125-1_dir/SPE/Cust_List.sxc
Note that this Customer List is shared with FIN I1121-1 and includes
StorEdge T3+/3900/6900 systems related to that FIN.
Use the following Customer Letter as needed to communicate this issue
to customers.
http://sdpsweb.central/FIN_FCO/FIN/FINI1125-1_dir/SPE/Cust_Letter.sxw
STEP 1:
For 6120 Arrays and 6120-HA and 6320 Arrays:
--------------------------------------------
NOTE: 6320 vol verifies may be done through Storade.
Prior to the upgrade of controller and disk firmware, a vol verify
needs to be performed on every volume in the array. The Field does not
have to be present to do the verifications but is responsible for
their successful completion. This step is required if you are using
3.0.X firmware or lower. The following guidelines may be followed in
this verification process.
A. Do not use the "fix" modifier. If dual disk errors
are seen, this can result in the unmounting of the volume and
unscheduled down time. The verify operation is for reporting
purposes only.
B. Depending on host I/O activity the optional "rate"
modifier (rate 1 - 8) may or may not be applied.
C. Appropriate time needs to be given to complete all
verifies on all volumes. Depending on host activity, drive
sizes, raid types, etc, the verifies can take several days.
D. There must be a complete interrrogation of the syslog
to confirm that all verify operations have completed
successfully. Here is a verify "start" and "end" entry with no
disk errors in between.
Oct 09 07:09:17 MASD[1]: N: Vol verify (t1pool0) started
Oct 09 13:12:14 MASD[1]: N: Vol verify (t1pool0) ended
If errors are encountered, syslog could contain something such as
the following:
13:21:12 t3b3 sh78[1]: N: Volume vol1 verification started
13:33:50 t3b3 ISR1[1]: W: u1d8 SCSI Disk Error Occurred
13:33:50 t3b3 ISR1[1]: W: Sense Key = 0x3, Asc = 0x11, Ascq = 0x0
13:33:50 t3b3 ISR1[1]: W: Sense Data Description =
Unrecovered Read Error
Any such errors MUST be addressed prior to the continuation of the
upgrade. Appropriate repairs, drive swaps, lun rebuilds etc, need
to be done through the normal support channels taken by the
accounts. The field CANNOT proceed with the upgrades until such
verifications (and repairs) are confirmed by the attending SUN
representative.
STEP 2: Upgrade drive FW to the latest level if not running
at the minimum levels specified below.
Drive Model Minimum FW FW Patch
----------- ---------- --------
ST336752FC 0205 109962-14
ST336753FC 0149 116748-03
MAS3367FC 0701 116816-01
ST373307FC 0207 114708-04
MAP3735FC 1401 116514-05
DK32EJ72FC FQ0C 116464-01
ST3146807FC 0207 114709-04
MAP3147FC 1401 116815-03
DK32J14FC 2Q0A 116465-01
ST373453FC 0349 113673-01
MAS3735FC 0701 116817-01
For 6120 Arrays:
----------------
STEP 3: Upgrade Array FW to 3.1.4 or later.
STEP 4: Reboot the array.
STEP 5: Upgrade host-side Management Software, if installed.
For 6120-HA:
------------
STEP 3: Failover the master controller to the alternate controller.
STEP 4: Upgrade the master controller.
STEP 5: Failback the alternate controller to the master controller.
STEP 6: Upgrade the alternate controller.
STEP 7: Upgrade host-side Management Software, if installed
For 6320:
---------
STEP 3: Run vol verify via StorADE on each of the arrays if your
system is at MIRE 1.1.x or lower. Correct any disk errors found
by vol verify.
STEP 4: Upgrade the system to MIRE 1.2.3.
Refer to the following reference material as needed.
819-0051-10: Disk Scrubbing Sun StorEdge Arrays.
817-0201-19: Product Release Notes for 6120.
816-7880-19: Product Release Notes for 6320.
Healthcheck website url at:
http://ns-qcc.ebay/HealthCheck/index.html
Comments:
Please refer to Patch README file for a complete list of fixed bugs.
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Sun Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Sun Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Sun Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunSolve Internal Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
_______________
* Access the top level URL of https://spe.sun.com
FIN/FCO Homepage Access:
_________________________
* Access the top level URL of http://sdpsweb.Central/FIN_FCO/index.html
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
To submit either a FIN or FCO refer to the following URLs for templates
and instructions;
* For FCO: http://pronto.central/fco.html
* For FIN: http://pronto.central/fin.html
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------