Asset ID: |
1-77-1484031.1 |
Update Date: | 2012-08-27 |
Keywords: | |
Solution Type
Sun Alert Sure
Solution
1484031.1
:
Sun Storage 6140/6180/6540/6580/6780, FLX380, 2500/2500M2 Arrays Running Certain Firmware levels of 7.x (and lower) Might Suspend Media Scan Without Warning
Related Items |
- Sun Storage 6540 Array
- Sun Storage 6580 Array
- Sun Storage 2501-M2 Array
- Sun Software - Generic
- Sun Storage 6140 Array
- Sun Storage 6780 Array
- Sun Storage Flexline 380 Array
- Sun Storage 2501 Array
- Sun Hardware - Generic
- Sun Storage 6180 Array
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
- .Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
|
In this Document
Applies to:
Sun Storage 6140 Array
Sun Storage 6540 Array
Sun Storage 6580 Array
Sun Storage 6780 Array
Sun Storage Flexline 380 Array
Information in this document applies to any platform.
___________________________________
SUNBUG:7125089
Date of Resolved Release: 15-Aug-2012
___________________________________
Description
On certain Sun Storage platforms running firmware 7.x or lower, Media Scan (Disk Scrubbing) can become suspended without notification. Thus, when a drive does fail, the resultant reconstruction may also fail due to one or more unreadable sectors on the source drive which have not been previously repaired during a Virtual Disk Driver (VDD) repair operation performed during a data scrubbing cycle.
Disk scrubbing is a background process performed by the array controllers to provide error detection on the drive media. The advantage of disk scrubbing is that the process can find media errors before they disrupt normal drive reads and writes. More importantly, the process reduces the chance of double disk failures and data loss caused by unreadable sectors during reconstruct.
Occurrence
This issue can occur on the following platforms:
- Sun Storage 6540, 6140 and FLX380 with 7.x Firmware 07.60.56.10 and lower
- Sun Storage 6180, 6580, 6780 and 2500 M2 with 7.x Firmware 07.80.51.10 and lower
- Sun Storage 2500 with with 7.x Firmware 07.35.67.10 and lower
Note: No other storage arrays or systems are affected by this issue.
To determine the current firmware level on the array, use one of the following examples:
A) with Common Array Manager (CAM) BUI:
1) Open CAM
2) Ensure the array in question is registered (which has to be done for SSCS as well)
3) Select "Storage System" in the left pane (which is actually the default screen when CAM comes up)
4) Check the "Firmware Revision" column in the main screen
B) with CAM CLI:
# sscs list storage-system <arrayname>
Where 'sscs' is under:
/* Solaris: /opt/SUNWstkcam/bin/
/* Linux: /opt/sun/cam/bin/
/* Windows: C:\Program Files\Sun\Common Array Manager\bin
Example:
# sscs list storage-system st6780-tvp-540-a
sscs list storage-system st6780-tvp-540-a
Name: st6780-tvp-540-a
ID: 600A0B8000477B5C000000004FC8A6B5
Type: 6780
Version: 07.80.51.10
Vendor: SUN Microsystems
Model: Sun Storage 6780 System
Capacity: 4.330 TB
Available Capacity: 1.789 TB
Symptoms
Media scan for data scrubbing is suspended without warning and might fail to complete, resulting in possible data integrity issues.
Methods to Verify if Media Scan is Running:
1) Collect Supportdata (refered to doc 1002514.1 how to collect an supportdata).
2) Unzip supportdata.
3) Check the following to identify if a storage array is affected by this issue.
3a) if the majorEventLog.txt file from supportdata does not show an event type:
2022, Description: "Media scan (scrub) started", within the last 30 Days.
Example :
Date/Time: Mon Jul 23 16:22:24 BST 2012
Sequence number: 13710
Event type: 2022
Event category: Notification
Priority: Informational
Description: Media scan (scrub) started
Event specific codes: 0/0/0
Component type: Volume
Component location: Volume_2_01
Logged by: Controller in slot A
Raw data:
4d 45 4c 48 03 00 00 00 8e 35 00 00 00 00 00 00
22 20 4d 00 30 6c 0d 50 04 00 00 10 00 00 00 00
04 00 00 00 04 00 00 00 0d 00 00 00 0d 00 00 00
16 00 00 00 00 56 00 6f 00 6c 00 75 00 6d 00 65
00 5f 00 32 00 5f 00 30 00 31 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01 00 00 00 00 00 00 01 14 00 00 00 10 00 13 06
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
and
3b) if vdmShowRAIDVolList (included in stateCaptureData.txt) does not show "MEDIA SCAN"
in the last column "ExclOp" on any volumes. For dual controller arrays, you need
to check this command for both controllers.
You are not affected by this issue if you see any MEDIA SCANs.
Example of an affected storage array:
Executing vdmShowRAIDVolList(0,0,0,0,0,0,0,0,0,0) on controller A:
Total RAIDVolumes: 8
Curr Prim
RVAddr Ssid State Pcs Owner Owner VG# PI ExclOp
===========================================================
0x0f7665b4 000000 RV_OPTIMAL 3 This This 1 0 NONE
0x0f774388 000001 RV_OPTIMAL 3 Alt Alt 2 0 NONE
0x0f773188 000002 RV_OPTIMAL 3 This This 2 0 NONE
0x0f773a74 000003 RV_OPTIMAL 3 Alt Alt 3 0 NONE
0x0f7767e4 000004 RV_OPTIMAL 5 This This 4 0 NONE
0x0f772c88 000005 RV_OPTIMAL 5 Alt Alt 4 0 NONE
0x0f77876c 000006 RV_OPTIMAL 5 This This 5 0 NONE
0x0f779998 000007 RV_OPTIMAL 5 Alt Alt 5 0 NONE
Workaround
The current workaround is to suspend media scan, then re-enable media scan on volume basis. This action does not require downtime and can be done during normal production. This workaround can be performed using CAM BUI, CAM cli (sscs) or Santricity during production.
To accomplish this, do the following:
A) with CAM BUI:
1) Open CAM
2) Select the array in question "Storage System" in the left panel
3) Select the volumes
4) Select one volume and change the "Disk Scrubbing Enabled:" from true to false
5) Click the save button in the bottom right corner.
6) Repleat step 4-5 for all configured volumes
7) Select one volume and change the "Disk Scrubbing Enabled:" from false to true
8) Click the save button in the bottom right corner.
9) Repleat step 7-8 for all Configured Volumes
B) with CAM CLI:
Go to the following directories based on your OS:
/* Solaris: /opt/SUNWstkcam/bin/
/* Linux: /opt/sun/cam/bin/
/* Windows: C:\Program Files\Sun\Common Array Manager\bin
1) sscs list -a <arrayname> volume
2) do for each listed volume :
sscs modify -a <arrayname> -k disable volume <volume_name>
3) do for each listed volume :
sscs modify -a <arrayname> -k enable volume <volume_name>
C) with CAM CLI script (Solaris and Linux only)
If the storage array has a large number of volumes configured, Option "A" and "B" could take an extended amount of time. The following shell commands will disable and enable media scrubbing on all configured volumes:
to the following directories based on your OS:
/* Solaris: /opt/SUNWstkcam/bin/
/* Linux: /opt/sun/cam/bin/
# bash
# array=<myarray>
# for vol in `./sscs list -a $array volume | awk '{print $2}'`; do ./sscs modify -a $array -k disable volume $vol; sleep 2; done
# for vol in `./sscs list -a $array volume | awk '{print $2}'`; do ./sscs modify -a $array -k enable volume $vol; sleep 2; done
NOTE: Do NOT forget in step 1 to replace <array> with the name of the storage array.
D) with CAM CLI advanced script
An advanced bash(1) script for Solaris and Linux is available via Oracle support.
This issue is addressed in the following firmware:
- Sun Storage 6540, 6140 and FLX380 Firmware 07.60.63.10 and later
- Sun Storage 6180, 6580, 6780 and 2500 M2 Firmware 07.80.62.10 and later
- Sun Storage 2500 with with 7.x Firmware 07.35.72.10 and later
The firmware is currently provided in the following patches for CAM 6.9:
- 147660-03 Solaris
- 147661-02 Windows
- 147662-02 Linux
Patches
<SUNPATCH:147660-03>
<SUNPATCH:147661-02>
<SUNPATCH:147662-02>
History
15-Aug-2012: Document released, issue Resolved
21-Aug-2012: Added Storage 6180 array to Products affected
27-Aug-2012: Updated Title to include 6180 array - no other change in content
The advanced bash script in Workaround D) can be internally downloaded from:
http://tsc-storage.us.oracle.com/products/CAM/tools.html
Root Cause : A media scan agent that was being deleted was treated as active when the last media scan
agent completed. That caused media scans to not be restarted since there was more than one
media scan agent detected.
Questions regarding this document should be addressed to
[email protected] and copy the
responsible engineer listed below.
Internal Contributor/Submitter: [email protected], [email protected]
Internal Eng Responsible Engineer: [email protected]
Internal Services Knowledge Engineer: [email protected]
Internal Eng Business Unit Group: Storage
Internal Escalation ID: 3-5595034191
Internal Resolution Patches:147660-03, 147661-02, 147662-02
References
SUNBUG:7125089
Attachments
This solution has no attachment