---------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0966-1
Synopsis: Best practices guidelines are available for StorEdge T3/T3+ arrays which encounter "disk error 03" messages.
Create Date: May/07/03
SunAlert: No
Top FIN/FCO Report: No
Products Reference: StorEdge T3/T3+ Array
Product Category: Storage / Service
Product Affected:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- Anysys ALL System Platform Independent -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- T3 ALL T3 StorEdge Array -
- T3+ ALL T3+ StorEdge Array -
Part Number Description Model
----------- ----------- -----
- - -
Sun StorEdge T3/T3+ "disk error 03" messages are an indication that an
operation within the array did not complete successfully. Failure to
properly investigate the cause of these errors could result in
incorrect resolution for T3/T3+ array issues, leading to unnecessary
system downtime.
This issue affects any Sun StorEdge T3 (T3A) or T3+ (T3B) array which
has experienced an operational error, indicated by "disk error 03"
displayed in the syslog file.
A "disk error 3" is an informational notice. This indicates that
additional investigation is required, and further action may be
necessary. That action could include monitoring the health of the
array, or replacing parts. These errors will be logged to the array
syslog file and have the following format:
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.
It is important that StorEdge T3/T3+ disk errors are monitored on a
regular basis and that proper action be taken based on the guidelines
provided below:
1. Monitor disk errors in syslog.
Periodically run "vol verify" to minimize the possibility of
encountering media-related double faults during a data reconstruction.
The recommended period is once a month. T3/T3+ firmware versions
1.18.1 and 2.1.3 incorporate enhanced "vol verify" functionality to
facilitate this. 1.18.1 and 2.1.3 versions of the firmware also
include more intelligence in disk error handling. These versions
disable disks for certain error conditions that indicate that the disk
is failing or is about to fail.
2. Sense error codes and the corresponding recommendations:
Note that "REPLACE" in the following table means:
For T3 firmware version 1.18.1 and above, and for T3+ firmware version
2.1.3 and above, the system will automatically disable the drive and
the drive will be ready for removal.
For firmware versions earlier than the above, manual checking of error
codes in the syslog file and manual disabling of the drive is required
before it can be replaced.
=====================================================================
| | |
|Sense Error Codes | Action Required |
|(with exceptions) | |
|==================+==================================================|
| 01/5d/xx | REPLACE. |
| | The failure of this drive is imminent. It is |
| | recommendedto backup the data from this drive, |
| | if possible,and replace the drive as soon as |
| | possible. |
| | |
|------------------+--------------------------------------------------|
| 02/04/01 | Validate (Is it in process of becoming ready?) |
| | Run "fru stat" twice with a one minute interval.|
| | If the drive is still in the "NOT READY" state, |
| | REPLACE the drive. |
| For any other | |
| 02/xx/xx | REPLACE |
|------------------+--------------------------------------------------|
| 03/11/xx | If in a Raid-1, or -5 configuration, the RAID |
| | controller (or manager) will reconstruct the |
| | data from the remaining disk in the volume and |
| | write it back to the failed LBA. This will cause|
| | the drive to automaticallyreplace the failed LBA.|
| | If RAID-0, or not in a RAID config, replace the |
| | drive. |
| For any other | |
| 03/xx/xx | REPLACE |
|------------------+--------------------------------------------------|
| | |
| 04/xx/xx | REPLACE. Hardware failure. |
=====================================================================
OPERATOR GUIDELINES:
====================
In general, a "disk error 0x3" displayed in the T3 syslog file is
handled and recovered by the T3 firmware itself. However, multiple
"disk error 0x3" messages might indicate that one of the FRUs is not
working properly or is defective. This decision cannot be taken without
checking the health of the whole system.
"disk error 3" messages in the syslog are generally preceded by the
sense error codes, which specifies the reason of failure of a
particular disk.
Example 1:
Jul 12 23:52:11 ISR1[1]: W: u1d6 SCSI Disk Error Occurred
(path = 0x0)
Jul 12 23:52:11 ISR1[1]: W: Sense Key = 0x3, Asc = 0x11,
Ascq = 0x0
Jul 12 23:52:11 ISR1[1]: W: Sense Data Description = Unrecovered
Read Error
Jul 12 23:52:11 ISR1[1]: W: Valid Information = 0x68fb4
Jul 12 23:52:11 ISR1[1]: N: u1d6 SVD_DONE: Command Error = 0x3
Jul 12 23:52:11 ISR1[1]: N: u1d6 sid 148 stype 1001 disk error 3
This error, Sense Key = 0x3, Asc = 0x11, is a recoverable error. No
action is recommended from the operator.
Example 2:
Jul 24 07:50:20 ISR1[1]: N: u2d8 SCSI Disk Error Occurred
(path = 0x1)
Jul 24 07:50:20 ISR1[1]: N: Sense Key = 0x1, Asc = 0x5d,
Ascq = 0x0
Jul 24 07:50:20 ISR1[1]: N: Sense Data Description = Failure
Prediction Threshold Exceeded
Jul 24 07:50:20 ISR1[1]: N: u1d6 SVD_DONE: Command Error = 0x3
Jul 24 07:50:20 ISR1[1]: N: u1d6 sid 148 stype 1001 disk error 3
In this case, many "disk error 3" may be expected.
This error, Sense Key = 0x1, Asc = 0x5d, is an unrecoverable error.
Replace the disk.
Example 3:
While IO is going on, "Disk error 0x3" messages could be displayed
repeatedly when a disk becomes bad and is disabled by the T3 firmware.
Look at state 4D below. In this case, replace the bad/disabled disk.
hws26-118:/etc:<113>vol stat
v1 u1d1 u1d2 u1d3 u1d9
mounted 4D 0 0 0
v2 u2d1 u2d2 u2d3
mounted 0 0 0
OR
hws26-118:/etc:<117>fru stat
CTLR STATUS STATE ROLE PARTNER TEMP
------ ------- ---------- ---------- ------- ----
u1ctr ready enabled master u2ctr 30.5
u2ctr ready enabled alt master u1ctr 30.5
----------------------------------------------------------------------
| DISK | STATUS | STATE | ROLE | PORT1 | PORT2 | TEMP | VOLUME |
| | | | | | | | |
|======+========+==========+===========+=======+=======+======+========|
| u1d1 | fault | disabled | data disk | bypass| bypass| - | v1 |
| u1d2 | ready | enabled | data disk | ready | ready | 34 | v1 |
| u1d3 | ready | enabled | data disk | ready | ready | 38 | v1 |
| u1d4 | ready | enabled | unassigned| ready | ready | 30 | - |
| u1d5 | ready | enabled | unassigned| ready | ready | 34 | - |
| u1d6 | ready | enabled | unassigned| ready | ready | 38 | - |
| u1d7 | ready | enabled | unassigned| ready | ready | 37 | - |
| u1d8 | ready | enabled | unassigned| ready | ready | 36 | - |
| u1d9 | ready | enabled | standby | ready | ready | 30 | v1 |
----------------------------------------------------------------------
i) In case of MANDATORY FINs, Sun Services will attempt to contact
all affected customers to recommend implementation of the FIN.
ii) For CONTROLLED PROACTIVE FINs, Sun Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Sun Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.central/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.central/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://spe.sun.com
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------