Asset ID: |
1-72-1312148.1 |
Update Date: | 2012-07-12 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1312148.1
:
Sun Storage 25xx and 6180 Arrays: Troubleshooting Battery Failures During Learn Cycle
Related Items |
- Sun Storage 2540-M2 Array
- Sun Storage 2540 Array
- Sun Storage 2510 Array
- Sun Storage 6180 Array
- Sun Storage 2530-M2 Array
- Sun Storage 2530 Array
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Arrays>SN-DK: ST25xx
- .Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 2xxx Arrays
|
In this Document
Applies to:
Sun Storage 2530-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2540-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2530 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2540 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2510 Array - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.
Symptoms
For Sun Storage 25xx arrays running at Controller Firmware Revision 07.35.xx.xx and the following Learn Cycle Notifications and Battery Cache failures:
Notifications:
A:Fri Oct 15 16:46:42 MEST 2010 : 639 : 0/0/0 : 7310 : Notification : Battery : tray85 : Learn Cycle Started
B:Fri Oct 15 16:46:20 MEST 2010 : 640 : 0/0/0 : 7310 : Notification : Battery : tray85 : Learn Cycle Started
...
B:Fri Oct 15 23:17:37 MEST 2010 : 647 : 0/0/0 : 730E : Notification : Battery : tray85 : Battery capacity is sufficient
OR
For Sun Storage 25xx-M2 or 6180 arrays with the following Learn Cycle Notifications and Battery Cache failures:
Notifications:
B:Mon Jan 16 15:35:34 CET 2012 : 1048 : 0/0/0 : 7310 : Notification : Battery : Tray.99.Controller.B.Battery.B : Learn Cycle Started
A:Mon Jan 16 15:35:34 CET 2012 : 1049 : 0/0/0 : 7310 : Notification : Battery : Tray.99.Controller.A.Battery.A : Learn Cycle Started
B:Mon Jan 16 19:03:03 CET 2012 : 1050 : 0/0/0 : 7302 : Notification : Battery : Tray.99.Controller.B.Battery.B : Battery Capacity Low
...
A:Mon Jan 16 20:38:26 CET 2012 : 1053 : 0/0/0 : 210C : Notification : Battery : Tray.99.Controller.A.Battery.A : Controller cache battery failed
...
B:Mon Jan 16 20:38:35 CET 2012 : 1056 : 0/0/0 : 730F : Notification : Battery : Tray.99.Controller.B.Battery.B : Incomplete Learn Cycle
A:Mon Jan 16 20:38:37 CET 2012 : 1057 : 0/0/0 : 730F : Notification : Battery : Tray.99.Controller.A.Battery.A : Incomplete Learn Cycle
Followed by the following Critical Faults:
Reference: <Document: 1021057.1> How to verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 and J4000 Critical Faults via the User Interface.
- Common Array Manager alarm code xx.66.1006: A cache backup battery has failed.
- Common Array Manager alarm code xx.66.1040: A controller cache backup battery has failed.
Cause
For the 2500 Array only:
<Bug:6987616> "Exmoor smart battery gets failed during a learn cycle due to i2c bus errors" in the controller firmware has caused SMART battery failures to occur during the learn cycle.
This is only observed in Sun Storage 2500 arrays running 07.35.xx.xx firmware. This is caused by i2c bus faults that are falsely raised by the controller firmware.
The issue above only affects the 2510, 2530, and 2540 array models. No other models, including 2540-M2 and 2530-M2, are impacted by this fault. No other firmware releases are impacted by this fault.
For the 25xx-M2 or the 6180 Array:
<Bug: 7132366>"2500-m2/6180: Battery Failures for Incomplete Learn Cycle" batteries on controllers at a firmware revision earlier than version 7.80.51.10 have failed for Incomplete Learn Cycle.
OR:
<Bug: 7132372> "2500-m2/6180: rev07 battery failed during learn cycle" revision 7 batteries at a firmware revision earlier than version 7.80.51.10 fail during learn cycle.
AND:
<Bug: 7108031> "Battery Failures for Incomplete Learn Cycle" Write cache on StorageTek 6180 and 25xx-M2 arrays become disabled while Battery Learning Cycle processing occurs.
Solution
Each battery should be analyzed individually for this fault. The 2500 systems can have one or two batteries depending on the controller configuration. The 25xx-M2 and the 6180 arrays have two batteries.
1. Verify that you have a critical fault of Battery Failure (xx.66.1006)
Reference: <Document 1021057.1> Verify Sun StorageTek[TM] 2500, Sun Storage[TM] 6000, and Sun Storage J4000 Critical Faults via the User Interface.
- Common Array Manager alarm code xx.66.1006: A cache backup battery has failed.
- Common Array Manager alarm code xx.66.1040: A controller cache backup battery has failed.
- If there are no critical faults for Battery Failure, then you may have a different issue, refer to <Document 1021054.1> Troubleshooting Sun Storage Array Battery Faults.
- If there is a Battery Failure fault as shown above for the 2500 Array, continue to Step 2.
- If there is a Battery Failure fault as shown above for the 2500-M2 or the 6180 Array, continue to Step 4.
2. Verify array firmware
Reference: <Document 1021067.1> Verify Storage[TM] Array Firmware via the User Interface.
- If the firmware is 06.xx.xx.xx, then you may have a different issue, refer to <Document 1021054.1> Troubleshooting Sun Storage Array Battery Faults.
- If the firmware is 07.35.xx.xx and below 07.35.67.10, then continue to Step 3.
- If the firmware is 07.35.67.10 or above, you can stop here. Your array is not impacted by the issue in scope of this document.
3. Verify the LastLearnStart date from the stateCaptureData output
- Create a stateCaptureData file for your array:
service -d -c save -t state -p -o stateCaptureData.dmp
Solaris: /opt/SUNWsefms/bin/
Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin
Linux: /opt/sun/private/fms/bin
- List your alarms, refer to <Document 1021057.1> Verify Sun StorageTek[TM] 2500, Sun Storage[TM] 6000, and Sun Storage J4000 Critical Faults via the User Interface. Get the date of the fault from the alarm:
Example:
Alarm ID : alarm9
Description: A cache backup battery has failed Tray.85.Battery.A
Severity : Critical
Element : t85bat1
GridCode : 70.66.1006
Date : 2010-08-23 22:54:25
- Get the LastLearnStart date from the battery for the controller specified, by opening the stateCaptureData.dmp file you created, and searching for the keyword "LastLearnStart":
Example:
Controller = CTLR_A
Local Battery Slot = 1
DOMI Agent = Initialized
Bmgr WakeUp Time = 07/19/2011 13:04:15
LastLearnStart Time = 08/24/2010 08:27:23
NOTE: There will be an entry for each battery and again for each controller. So if you have two RAID controllers, you will have a total of four entries in the file (2 for each battery). Only one entry on either controller is sufficient to verify.
- If the stateCapture file does not contain a LastLearnStart, go to Step 4.
- If the difference between the Alarm and the LastLearnStart for the battery slot, is less than or equal to 24 hours (as in the example above), go to Step 5.
- If the difference between the Alarm and the LastLearnStart for the battery slot, is greater than 24 hours, contact Oracle to have the battery replaced.
4. For the 2500 Array, Verify the events sequence for 7310, followed by 730E, followed by 210C.
For the 2500-M2 Array, or the 6180 Array, verify the events sequence for 7310, followed by 7302, followed by 210A, followed by 210C and 730F.
Sun Storage Common Array Manager (CAM)
Browser:
- Expand Storage Arrays in the left menu pane.
- Expand your storage array name in the left menu pane.
- Expand Troubleshooting in the left menu pane.
- Click on Events.
- In the right pane, click on the -|-> icon. If you mouse over it, it will state Advanced Filter.
- Set Event to Log Events.
- Set Event Type to Component.
- Set Read the last X Kbytes From Log File to 100.
- Set String Filter to Battery.
- Click on the Details of any event that is shown.
- Review the Description Field.
- Get the value of the array log event ID from the description.
NOTE: The filter in Step 9 is case-sensitive.
Example:
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x7310] NOTICE: Learn Cycle Started
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x730E] Battery capacity is sufficient
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x210C] Controller cache battery failed
SSCS CLI:
- Get the list of events:
sscs list -d <array_name> -t LogEvent -f Battery event
NOTE: The -f option is case-sensitive.
Solaris: /opt/SUNWstkcam/bin/
Linux: /opt/sun/cam/bin/
Windows: C:\Program Files\Sun\Common Array Manager\bin
- Get the event details:
sscs list -d array_name event event_id
- Get the value of the array log event ID from the description:
Example:
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x7310] NOTICE: Learn Cycle Started
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x730E] Battery capacity is sufficient
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x210C] Controller cache battery failed
For the 2500 Array:
- If event 0x210C follows 0x730E, continue to Step 5.
- If event 0x210C does not follow a 0x730E event, contact Oracle to have the battery replaced.
For the 2500-M2 or 6180 Array:
- If event 0x210C and 730F follows 0x7302, contact Oracle to further troubleshoot this problem, as there is no current workaround for this issue.
- If event 0x210C and 730F do not follow a 0x7302 event, contact Oracle to have the battery replaced.
5. Reset the controller for the failed battery
Based on the information supplied, <Bug:6987616> "Exmoor smart battery gets failed during a learn cycle due to i2c bus errors" in the controller firmware is indicated.
This is caused by i2c bus faults that are falsely raised by the controller firmware.
In order to clear the i2c bus faults the controller needs to be reset.
Browser:
- Select Physical Devices -> select Controllers.
- Select Reset Controller for the controller reporting the Cache Battery Failure.
SSCS:
sscs reset -a array_name controller [A or B]
For example, based on the following alarm, reset Controller A:
Alarm ID : alarm9
Description: A cache backup battery has failed Tray.85.Battery.A
Severity : Critical
Element : t85bat1
GridCode : 70.66.1006
Date : 2010-08-23 22:54:25
NOTE: For SIMPLEX arrays (single controller), an outage is required, as the data path will be unavailable during the reset.
- If the battery failure clears after resetting the controller, no further work is required.
- If the battery failure does not clear, contact Oracle to have the battery replaced.
Attachments
This solution has no attachment