Asset ID: |
1-75-1021054.1 |
Update Date: | 2012-10-01 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1021054.1
:
Troubleshooting Sun Storage[TM] Array SMART Battery Faults
Related Items |
- Sun Storage 6580 Array
- Sun Storage 2540-M2 Array
- Sun Storage 2510 Array
- Sun Storage 6180 Array
- Sun Storage 2540 Array
- Sun Storage 6780 Array
- Sun Storage 2530-M2 Array
- Sun Storage 2530 Array
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Arrays>SN-DK: ST25xx
- .Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
|
PreviouslyPublishedAs
270028
Applies to:
Sun Storage 2510 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6780 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2540-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6580 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2530-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
All Platforms
Purpose
To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Storage Disk 2000, 3000, 6000 RAID Arrays & JBODs Community.
The purpose of this document is to help users identify problems with batteries in the StorageTek 2500, Sun Storage 2500-M2 or Sun Storage 6x80 arrays. These batteries provide power to the controller's data cache in the event of a power outage. If you have a Sun Storage Flexline 210, 240, 280, 380 or Sun StorageTek 6130, 6140 or 6540 array, please refer to <Document 1392919.1> Troubleshooting Sun Storage[TM] Array non-SMART Battery Faults.
Symptoms Include
The following table contains the most common faults seen by Common Array Manager (CAM) or SANtricity:
Grid ID | CAM Critical Fault | SANtricity Critical Fault |
xx.66.1005 |
Battery Near Expiration |
BATTERY_NEAR_EXPIRATION |
xx.66.1006 |
There has been a failure in the ICC battery pack |
FAILED_BATTERY |
xx.66.1039 |
Controller Cache Battery Near Expiration |
NON_FRU_BATTERY_NEAR_EXPIRATION or INTEGRATED_BATTERY_NEAR_EXPIRATION |
xx.66.1040 |
A controller cache backup battery has failed |
NON_FRU_FAILED_BATTER or FAILED_INTEGRATED_BATTERY |
xx.66.1091 |
Battery Tray.xx.Battery.xx has transitioned to an unknown state |
BATTERY_UNKNOWN_STATE |
xx.66.1101 |
There has been a failure in the ICC battery pack |
FAILED_BATTERY_SYSTEM |
xx.66.1176 |
Battery has a full charge capacity below the replacement capacity threshold |
BATTERY_REPLACEMENT_REQUIRED |
xx.66.1254 |
Battery has expired |
EXPIRED BATTERY |
xx.66.1255 |
Battery has expired |
EXPIRED_INTEGRATED_BATTERY |
xx.66.1261 |
Battery is over temperature |
BATTERY_OVERTEMP |
Other possible conditions include:
- You just replace the battery, but it still shows failed.
- Amber LED lit on battery
- Amber LED lit on array
Batteries are monitored by two methods, an Expiration Timer and SMART (Self-Monitoring Analysis and Reporting Technology) battery technology. In some cases it is possible to have both methods active on the same array:
- Only the SMART battery technology is used if the array type is a 25x0-M2 or 6x80 with controller firmware version 7.77 and higher.
- Both SMART battery technology and Expiration Timer are both in use if the firmware is 7.60 and lower (including 6.x).
The Expiration Timer is a simple counter whereas the newer SMART battery technology internally tests the ability of the batteries to hold a charge. Both are used to determine battery replacement but the Expiration Timer is also susceptible to outside conditions which can lead to reports of premature failures.
Oracle will no longer replace SMART batteries unless there is sufficient evidence that the battery has actually failed. For additional details on SMART battery technology, see <Document 1207186.1> SMART Battery Functionality in 2500 and 6000 Arrays.
Troubleshooting Steps
- Verify the Array Type, Firmware Version and Fault.
Since the steps to resolve battery issues will differ based on the Hardware and Firmware involved, it is necessary to gather this information in order to determine the proper troubleshooting steps.
- To determine the Array type, see <Document 1021066.1> Verify Sun Storage[TM] Array Array type via the User Interface.
- To determine the Firmware version, see <Document 1021067.1> Verify Storage[TM] Array Firmware via the User Interface.
- To determine the Faults, see <Document 1021057.1> Verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 Critical faults via the User Interface.
If there are no faults but the battery is not optimal, go to step 9.
The following table lists the most common faults associated with batteries. If you have an array with redundant batteries and both batteries have a fault, each fault should be evaluated on it's own. Sometimes a single remedy will fix multiple faults. If you have a single battery with multiple faults, go to step 9, contact Oracle support.
Critical Fault | Array Type | Firmware Version | Remedy |
Battery Near Expiration |
6x80 |
< 7.77 |
Go to step 6 |
25x0 |
Any |
Go to step 6 |
Battery Expired |
25x0/6x80 |
Any |
Go to step 5 |
Over Temperature |
25x0/25x0-M2/6x80 |
Any |
Go to step 2 |
Replacement Required |
25x0/25x0-M2/6x80 |
Any |
Go to step 9 |
Battery Failed |
25x0 |
>=7.35.67.10 |
Go to step 9 |
>=7.35.10.10 |
Got to step 3 |
6.x |
Go to step 4 |
25x0-M2/6x80 |
>=7.80.51.10 |
Go to step 9 |
25x0-M2/6x80 |
>=7.77 |
Go to step 3 |
6x80 |
< 7.77 |
Go to step 3 |
Unknown |
25x0/25x0-M2/6x80 |
Any |
Go to step 8 |
If you do not see your critical fault in the above list, proceed to step 2.
- Battery Temperature is Out of Range.
There is a known <Bug:7123598> in the 6180 and 25x0-M2 arrays running 7.77 firmware which can falsely give you this error message. It will typically clear by itself but if it happens during an array check, it can be logged. A resolution for this issue can be found in <Document 1392313.1> Random "BBU Overheated" for Battery Backup Unit on Sun Storage 2500-M2 and 6180 Arrays.
If after implementing the fix for this issue the problem remains, go to step 9.
- Check for Failures in Battery Learn Cycle.
Please refer to <Document 1312148.1> Troubleshooting 25xx and 6180 Storage[TM] Array Battery Failures During Learn Cycle. If this does not resolve the problem, proceed to step 4.
If array type is 25x0 running firmware version 6.x, proceed to step 4.
If the array type is 25x0-M2 or 6x80, proceed to step 5.
- Check Life Remaining.
In the case of a 25x0 array running firmware version 6.x, SMART battery hardware is used but down rev firmware can create a premature failure. Run the command below to obtain the Life Remaining value.
# sscs list -d myarray -t Battery fru Tray.85.Battery.A
Element Name : Tray.85.Battery.A
Element Status : Optimal
Enabled State : Enabled
FRU Number : 1T71200441PS
FRU Type : Battery
Firmware : N/A
Id : SUN.371-2482-01.1T71200441PS
IdentifyingNumber : 1T71200441PS
*Life Remaining : 288 Days*
ManufactureDate : Wed Feb 28 19:00:00 EST 2007
Model
- If the Life Remaining value is 0, to to step 6.
- If the Life Remaining value is between 1 and 1095, go to step 9.
- If the Life Remaining value is greater than 1095 or negative, go to step 5.
If the life remaining is greater than zero(0), please contact Oracle to have the battery replaced.
- Confirm Array System Time is correct.
Batteries that have the Expiration Timer active are subject to premature failures if the array system time gets improperly set. Typically this is the result of a rogue NTP server. Use <Document 1021108.1> Verifying and Setting Sun Storage[TM] Array System Time, to verify the array system time. If the system time is incorrect, search the majorEventLog.txt (from supportdata bundle) to see if a rogue NTP server is the cause.
# grep NTP majorEventLog.txt
Description: Controller clocks set via NTP or SNTP
Description: Controller clocks set via NTP or SNTP
#
If you find any instances of the above, you can reset the array system time but the problem is likely to return unless the rogue NTP server is addressed. Reset the array system time and wait 5 minutes.
- If the critical fault clears, no further action is needed.
- If the critical fault remains and you recently replaced the battery, or the battery is expired, go to step 6.
- If the critical fault remains, the array system time is correct and the battery has not recently been replaced, to to step 9.
- Reset the Battery Age.
Go to <Document 1021695.1> Resetting the Battery Age for a StorageTek[TM] 2500 and Sun Storage[TM]6000 Array. If after resetting the battery age the problem remains, go to step 9.
Note: For 25x0 arrays running at least 7.35 firmware it is possible to edit the NVSRAM to negate the Expiration Timer. This edit will be permanent unless the array's NVSRAM is reloaded or upgraded.
/* `service` is under:
/* Solaris: /opt/SUNWsefms/bin/
/* Linux: /opt/sun/cam/private/fms/bin/
/* Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin\
# service -d <arrayname> -c read -q nvsram region=0xEE
# service -d <arrayname> -c set -q nvsram region=0xEE offset=0x2D value=0xFF
# service -d <arrayname> -c set -q nvsram region=0xEE offset=0x2E value=0xFF
# service -d <arrayname> -c read -q nvsram region=0xEE
Example:
# service -d myarray -c read -q nvsram region=0xEE
Executing the read command on myarray
Controller A Region Id = (238) REGION_USER_CONFIG_DATA
0000: 0000 c220 0000 0000 0050 0600 0000 0000 ... .....P......
0010: 0000 0000 0000 0000 f001 0000 8080 0000 ................
0020: 0000 0000 0000 0000 8c86 008a 0000 0000 ................
0030: 80be 9f41 1300 2000 0f00 1400 0000 0000 ...A.. .........
Controller B Region Id = (238) REGION_USER_CONFIG_DATA
0000: 0000 c220 0000 0000 0050 0600 0000 0000 ... .....P......
0010: 0000 0000 0000 0000 f001 0000 8080 0000 ................
0020: 0000 0000 0000 0000 8c86 008a 0000 0000 ................
0030: 80be 9f41 1300 2000 0f00 1400 0000 0000 ...A.. .........
Completion Status: Success
#
# service -d myarray -c set -q nvsram region=0xEE offset=0x2D value=0xFF
Executing the set command on myarray
Completion Status: Success
#
# service -d myarray -c set -q nvsram region=0xEE offset=0x2E value=0xFF
Executing the set command on myarray
Completion Status: Success
#
# service -d myarray -c read -q nvsram region=0xEE
Executing the read command on myarray
Controller A Region Id = (238) REGION_USER_CONFIG_DATA
0000: 0000 c220 0000 0000 0050 0600 0000 0000 ... .....P......
0010: 0000 0000 0000 0000 f001 0000 8080 0000 ................
0020: 0000 0000 0000 0000 8c86 008a 00ff ff00 ................
0030: 80be 9f41 1300 2000 0f00 1400 0000 0000 ...A.. .........
Controller B Region Id = (238) REGION_USER_CONFIG_DATA
0000: 0000 c220 0000 0000 0050 0600 0000 0000 ... .....P......
0010: 0000 0000 0000 0000 f001 0000 8080 0000 ................
0020: 0000 0000 0000 0000 8c86 008a 00ff ff00 ................
0030: 80be 9f41 1300 2000 0f00 1400 0000 0000 ...A.. .........
Completion Status: Success
#
- Reseat the Battery.
Note: Batteries in 25x0 and 25x0-M2 array controllers are not externally accessible and require that the controller be removed in order to reseat the battery. This will create an interruption of the datapath to that controller. Batteries in 6x80 array controllers may be reseated without interrupting the datapath.
A premature failure of the battery may not clear the failed state unless it is reseated. To resolve this it is necessary to reseat the battery in order to reset the Battery Installation Date. Use the CAM Service Advisor or SANtricity Recovery Guru for the specific steps. Once this is completed, repeat the previous step (6) to reset the battery age.
- Battery in an Unknown State.
Please see <Document 1283914.1> Troubleshooting Sun Storage[TM] Array Unknown Battery Status for further troubleshooting.
- Contact Oracle Support.
Collect supportData:
- <Document 1002514.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] Common Array Manager.
- <Document 1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.
Log a Service Request.
References
Attachments
This solution has no attachment