Document Audience: | INTERNAL |
Document ID: | I0691-1 |
Title: | New disk drive initialization problem if replaced failed disk drive |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2001-07-30 |
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0691-1
Synopsis: New disk drive initialization problem if replaced failed disk driveCreate Date: Jun/27/01
Keywords:
New disk drive initialization problem if replaced failed disk drive
SunAlert: No
Top FIN/FCO Report: No
Products Reference: Disk Drives on StorEdge T3 Array
Product Category: Storage / Service
Product Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- Anysys ALL System Platform Independent -
X-Options Affected
------------------
- T3 ALL StorEdge T3 Array -
X6713A - - FC-AL 18.2GB 10KRPM 1" Disk -
X6714A - - FC-AL 36.4GB 10KRPM 1.6" Disk -
X6716A - - FC-AL 18.2GB 10KRPM 1.6" Disk -
X6717A - - FC-AL 72.8GB 10KRPM 1.6" Disk -
Parts Affected:
Part Number Description Model
----------- ----------- -----
540-4440-01 18GB Assembly/FRU -
540-4367-01 36GB Assembly/FRU -
540-4519-01 73GB Assembly/FRU -
390-0053-01 Seagate ST318304FC 18GB - Disk Seagate A726
390-0056-01 Seagate ST336704FC 36GB - Disk Seagate A726
390-0036-01 Seagate ST173404FC 73GB - Disk Seagate A727
References:
BugId: 4407776 - T3 not properly initialize disk paths after disk hot
swap.
URL: http://hes.west/nws/products/T3/tools/t3path_chk
Issue Description:
Sun StorEdge T3 arrays contain dual-ported FC-AL disk drives. The drive
ports are connected to the T3 back-end loops via the loop cards (Loop 1
and Loop 2) and are capable of receiving I/O from either path (Path 0
or Path 1). These paths are displayed by T3 CLI commands and monitoring
software using different conventions. The paths are mapped as follows:
Loop 1 = Path 0 = PPATH and Loop 2 = Path 1 = APATH.
A single drive failure and subsequent disk replacement that encounters
bug 4407776 does not have a major impact on the operation of the Sun
StorEdge T3 array. Functionally, the new disk works as a drive with a
failed APATH, and will process all I/O via its PPATH connection. If
the remaining port fails or there are problems that cause Loop 1 to
fail (prior to a reset), the disk will be disabled and it will appear
as a failed drive.
If two failed disk drives are replaced within the same LUN over time,
and subsequently a path failure occurs on Loop 1 (PPATH or Path 0), the
only remaining path to those two drives, the LUN will unmount and the
data on that LUN will be unavailable. The LUN will be off-line until
the path problem is fixed and the T3 is manually reset. It may also be
necessary to remove and recreate the LUN or perform other complex
recovery actions to ensure data integrity.
In the previous scenario, if the cache mode on the T3 is in writebehind
when the path failure occurs, there is the possibility that write data
staged in cache will not be written to disk before the LUN is taken
off-line. This will result in data loss.
There are no obvious symptoms that indicate this problem has been
encountered. Monitoring software will show no faults on the array, and
most diagnostic commands will report that the array is healthy.
All Sun StorEdge T3 Array configurations are affected by this bug.
The problem can be discovered by examining the output of the following
command executed from the T3 CLI:
T3:/:<1>.disk pathstat u[1|2]d1-9
This command executed on a T3 partner group which exhibited the problem
shows the following:
T3:/:<1>.disk pathstat u1d1-9
DISK PPATH APATH CPATH PATH_POLICY FAIL_POLICY
--------------------------------------------------
u1d1 [0 U] [1 U] APATH APATH PATH
u1d2 [0 U] [1 U] APATH APATH PATH
u1d3 [0 U] [1 U] APATH APATH PATH
u1d4 [0 U] [1 U] PPATH PPATH PATH
u1d5 [0 U] [1 U] PPATH PPATH PATH
u1d6 [0 U] [1 U] PPATH PPATH PATH
u1d7 [0 U] [1 U] PPATH PPATH PATH
u1d8 [0 U] [1 U] PPATH PPATH PATH
u1d9 [0 U] [1 U] PPATH PPATH PATH
pass
T3:/:<1>.disk pathstat u2d1-9
DISK PPATH APATH CPATH PATH_POLICY FAIL_POLICY
----------------------------------------------------
u2d1 [0 U] [1 U] APATH APATH PATH
u2d2 [0 U] [1 U] APATH APATH PATH
u2d3 [0 U] [1 U] APATH APATH PATH
u2d4 [0 U] [1 U] PPATH PPATH PATH
u2d5 [0 U] [1 U] PPATH PPATH PATH
u2d6 [0 U] [1 U] PPATH PPATH PATH
u2d7 [0 U] [1 U] PPATH PPATH PATH
u2d8 [0 U] [-1 U] PPATH PPATH PATH
u2d9 [0 U] [1 U] PPATH PPATH PATH
pass
Note the '-1' in the output for the APATH column of disk u2d8. If any
disk shows a '-1' in the ".disk pathstat" output, it can be assumed
that disk has encountered this bug. Any path exhibiting this behavior
is unavailable as a failover path for the affected drive.
A tool called "t3path_chk" has been developed to aid in the
identification of this problem. It can be executed via a CRON job
automatically on multiple T3 arrays. To obtain the tool and
instructions for its use, see:
http://hes.west/nws/products/T3/tools/t3path_chk
Root cause: The Sun StorEdge T3 Array controller firmware does not
allow both paths to a new disk to be initialized with the new disk WWN
(world wide number) following a disk hot swap. During a controller boot
cycle, the T3 firmware initializes both paths to all existing disks by
their WWN. The firmware functions that handle disk initialization do
not operate properly on a hot swapped disk with a different WWN than
the one found at boot time. As a result, the new disk will not have
its APATH initialized.
The fix for this bug will be included in version 1.18 of the Sun
StorEdge T3 Array controller firmware scheduled to release in August,
2001.
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
Corrective Action:
An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.
Following a disk drive replacement, the system should be allowed to
complete its reconstruction to the new drive and return to a fully
redundant FRU state. During the next maintenance window, the T3
should be reset to reinitialize the disk paths to the new drive.
Comments:
----------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
---------------------------------------------------------------------------