Document Audience: | INTERNAL |
Document ID: | A0183-1 |
Title: | Sun Storage T3 Power Cooling Unit Battery Packs may experience early life failures. |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Tue Jan 22 00:00:00 MST 2002 |
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD CHANGE ORDER
(For Authorized Distribution by SunService)
FCO #: A0183-1
Status: inactive
Synopsis: Sun Storage T3 Power Cooling Unit Battery Packs may experience early life failures.Date: Jan/22/2002
Keywords:
Sun Storage T3 Power Cooling Unit Battery Packs may experience early life failures.
SunAlert: No
Top FIN/FCO Report: Yes
Products Reference: T3 Array Battery Pack
Product Category: Storage / Array
Product Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
------- --------
- Anysys - System Platform Independent -
X-Options Affected
--------- -------
- T3 ALL T3 StorEdge Array (See Below)
Parts Affected:
Part Number Description Model
----------- ----------- -----
300-1454-01 PWR Supply, Purple 1, NIMH
370-3956-01 Battery Pack, Purple 1, NIMH
(SCSI Devices)
Type Vendor Model SerialNumber(Min) SerialNumber(Max) Firmware
---- ------ ------- ------------------ ------------------ --------
N/A
References:
DPCO: 233.A RSL purge of affected PCU/Batteries
DPCO: 288 Defective battery purge
BugID: 4403799
ESC: 529253
ECO: WO_21095 Release of Battery FRU only
LEAP: 1565 T3 Array Battery Refresh Cycle Change
FIN: I0674-1
FIN: I0677-1
WWStopShip: WP001#15897
Manual: 806-1062-11 Sun StorEdge T3 Disk Tray Installation Operation and
Service Manual
Manual: 806-1063-11 Sun StorEdge T3 Disk Tray Installation Administration
Guide
Issue Description:
Sun StorEdge T3 Arrays shipped prior to December 31, 2000 may experience early
life failures of their Power Cooling Unit (PCU) Battery Packs.
Early life failures of the T3 Array (Purple) Power Cooling Unit (PCU) Battery
Packs are contributing to higher than expected number of Power Cooling Unit
failures in the field. Sun Engineering have determined that approximately 70%
of all PCU failures are due to early life battery pack failures.
Affected Batteries within the Power Cooling Unit were initially only charged to
30% capacity and subsequently sat on the shelf for a prolonged period. During
this time the batteries went into a deep discharge. A deep discharge can induce
the lowest capacity cell to reverse it's polarity because of differences in the
capacity of each cell. Polarity reversal during a deep discharge leads to gas
pressure building within the battery, until gas releases from the vent of the
positive terminal, in turn fluid leakage occurs, causing and the battery to
suffer severe damage.
The StorEdge T3 Array contains dual Power Cooling Units so that if one fails the
other Power Cooling Unit takes over, and the failure is reported in the syslog
file. When this happens the data cache is no longer used and instead of writing
to cache the system writes to disk. This is a safeguard against losing data in
the cache in the event of a subsequent power failure.
Following is an example of battery failure messages:
Jan 11 15:38:10 BATD[1]: N: Battery Refreshing cycle starts from this point.
Jan 11 15:38:11 LPCT[1]: N: u1pcu1: Refreshing battery
Jan 11 15:38:14 LPCT[1]: N: u2pcu1: Refreshing battery
Jan 11 15:38:17 LPCT[1]: N: u1pcu1: Battery not OK
** Expected message...system has detected drop in voltage from u1pcu1
as a result of the discharge test.
Jan 11 15:38:22 BATD[1]: N: u1pcu1: hold time was 12 seconds.
Jan 11 15:38:25 BATD[1]: W: u1pcu1: Replace battery, hold time low.
Jan 11 15:52:48 LPCT[1]: N: u2pcu1: Battery not OK
** u2pcu1 expected message....
Jan 11 15:52:50 BATD[1]: N: u2pcu1: hold time was 878 seconds.
** u2pcu1 passed the test as this is about a 14 minute hold time.
Jan 12 06:38:21 BATD[1]: W: u1pcu1: Replace battery, hold time low.
Jan 12 06:38:21 BATD[1]: N: u1pcu1 Battery took too long to recharge.
Jan 12 06:38:21 BATD[1]: N: u1pcu2:skips
battery refresh because the other PCU u1pcu1 : PCU1 hold time low
** Expected we would skip the u1pcu2 refresh test since u1pcu1
has failed.
Jan 12 06:38:23 LPCT[1]: N: u2pcu2: Refreshing battery
Jan 12 06:52:09 LPCT[1]: N: u2pcu2: Battery not OK
Jan 12 06:52:12 last message repeated 1 time
Jan 12 06:52:15 BATD[1]: N: u2pcu2: hold time was 833 seconds.
** u2pcu2 is ok based on this hold time.
Jan 12 07:02:47 pshc[1]: N: fru stat
Jan 12 07:03:27 pshc[1]: N: refresh -s
Jan 12 18:01:24 BATD[1]: N: Battery Refreshing cycle ends at this point.
Based on Engineering root cause analysis, there is a possibility that any
battery pack that was initially charged to 30% capacity and sat on the shelf for
a prolonged period of time may fail earlier than expected. The battery pack was
specified for a two year life cycle. Field reports indicate that Power Cooling
Units with suspect battery packs are failing between six months to one year.
All PCUs that fall within the following serial number range could demonstrate an
early life battery pack failure and, therefore, should be changed according to
this FCO:
Suspect Power Cooling Unit serial numbers:
Power Cooling Unit Battery Pack
Part Number Serial Numbers Date Code
300-1454 001000 - 012509 before 0004
300-1454 012808 - 013279 9951-0002
300-1454 016694 - 018091 9951-0002
300-1454 013624 - 014915 0027
Corrective action was implemented in Manufacturing by purging all suspect
battery packs via Worldwide Purge WP001#15897 issued on April 25, 2001.
The Battery only FRU was made available via ECO WO_21095 on June 26, 2001.
Corrective action was put in place in Enterprise Services via DPCO 233.A
on June 1, 2001, LEAP 1565 on June 28, 2001, and DPCO 288 released on
November 30, 2001 for Defective battery purge.
Suspect PCU Identification:
Reference the following StarOffice document in order to identify suspect
Power Cooling Units;
http://sdpsweb.EBay/FIN_FCO/FCO/FCO_A0183-1_Dir/t3_list_PCU_FCO.sdc
Note: To view document click on the above URL, then save to your local
disk using your Netscape 'file' button and select 'save as', then
open file locally using StarOffice.
Parts Affected:
July 22, 2002
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | UPON FAILURE
---
Replacement Time Estimate:
0.25 hours
Special Considerations:
This problem can be minimized by setting a longer interval between battery
refresh cycles (reference FIN I0677-1). Although resetting the refresh cycle
may extend the battery life, all suspect battery packs should be changed in
accordance with this FCO.
Corrective Action:
Upgrade suspect Power Cooling Units (Sun p/n 300-1454-01) by replacing the
Power Cooling Unit battery packs (370-3956-01). The battery packs can be
replaced by following the below instructions.
----------------------------------------------
PART 1 - StorEdge T3 PCU Replacement Procedure
----------------------------------------------
PRE-REQUISITES:
1.) First identify if the T3 contains PCUs within suspected serial number
range. Refer to the Part 5 below.
If YES, then continue on. If NO, stop. FCO implementation not required.
2.) Verify that all loop cables (for ES config) and MIAs are screwed down
tightly by using a small flathead screwdriver and tigntening each loop
cable. Be very careful not to disconnect any loop cable. If you notice
a loop cable that is not screwed in at all, notify customer.
3.) Verify all controllers and loop cards are in their prospective slots
securely by pushing on each card and verifying that all latches are
in the locked position.
4.) Verify that all PCUs are in their prospective slots securely by
pushing on each PCU and verifying that the PCU latches are in the
locked position.
5.) Type "fru stat" to check ALL T3 FRUs are in a healthy state
and that their LEDs are in their normal state before proceeding.
6.) Type "date" and "tzset" to check if the date and
timezone are correct. If not, use the "date" and "tzset" command to
set the date and timezone, respectively.
7.) Type "refresh -s" to check that no battery refresh is running before
proceeding. Also, check if the "Next Refresh" won't begin shortly after
executing this FCO. If yes, the "Next Refresh" should be re-scheduled to a
later time (24 hours). Refer to the Field Service Manual or Part 3 below
for changing the refresh time (BAT_BEG) in the file /etc/schd.conf.
If battery status is reported as "Low", this is ok as the purpose of this
FCO is to replace it.
8.) Type "proc list" to check that no drive reconstruction is running before
proceeding.
9.) It is recommended that the customer be at T3 firmware 1.17b when executing
this procedure and that this be done during a maintenance window to minimize
disruption to customer operation.
10.) Have 1 or 2 spare T3 controllers and PCUs, on site.
11.) Be aware of FIN I0745-1 relating to possible PCU midplane connector
damages. Have a spare chassis available if damages are discovered.
12.) Customer should be aware that performance will degrade during and
after execution of this FCO as new batteries will need to be charged
up after power on. Charging can take up to 12 hours (per battery) and
during this time write caching will be disabled.
PROCEDURES:
1.) Power off the PCU that is going to be replaced by pressing the power
switch.
NOTE - DO NOT POWER OFF MORE THAN ONE PCU AT A TIME FOR EITHER
ES OR WG CONFIGURATION. Powering off/removing a PCU will
cause the T3 cache to run in write-through mode. Make sure
that the AC LED (left) is AMBER and the PS LED (right)
is OFF.
2.) Disconnect power cord from the PCU.
3.) Push the PCU latches into the unlocked position and pull the unit out of the
disk tray. Wait 15 seconds and then verify that both controller online LEDs
are still GREEN. If any controller LED changes to non-solid GREEN (ie
OFF/AMBER/Flashing AMBER) then immediatly refer to the "PART 4 -
Troubleshooting" section (below) before continuing.
NOTE - DO NOT REPLACE MORE THAN ONE PCU AT A TIME FOR EITHER
ES OR WG CONFIGURATIONS.
CAUTION - Any PCU that is removed must be replaced within 30 minutes or the Sun
StorEdge T3 disk tray and all attached disk trays will automatically
shutdown and power off.
CAUTION - For partner pair configurations make sure that the loop cables have
significant length to spread apart so you can remove u1pcu1. Also
make sure that the loop cables, along with other cables connected to
the T3, are screwed in tightly so you do not inadvertantly knock them
off during removal/insertion.
4.) Using a flashlight, inspect the left and right sides of the
PCU midplane connector for possible cracks or damages (per
FIN I0745-1). If damages are found, notify customer immediately
that the chassis must be replaced.
5.) Install new PCU. Wait 15 seconds and then verify that both
controller online LEDs are still GREEN. If any controller LED changes
to non-solid GREEN (ie OFF/AMBER/Flashing AMBER) immediatly refer
to the "PART 4 - Troubleshooting" section below before continuing.
6.) Push the PCU latches into the locked position.
7.) Connect power cord to the PCU.
8.) Verify that the AC LED (left) is AMBER, indicating that AC power is present.
9.) Power on the PCU by pressing the power switch.
10.)Verify that both LEDs on the Power Cooling Unit are Green, indicating that
the unit is receiving power. Wait 15 seconds and then verify that both
controller online LEDs are still GREEN. If any controller LED changes to
AMBER immediatly refer to the "PART 4-Troubleshooting" section below before
continuing.
Note - The PS LED (right) may blink GREEN for a period of time.
(up to 12 hours for charging per battery while write caching
is disabled)
11.) Type "fru stat" to check if new PCU is recognized and functioning.
Battery might show up as "fault" as it is charging up.
Verify the Battery Warranty Date by typing "id read u(x)pcu(y)".
hostname:/:<1>id read u1pcu1
Revision: 0000
Manufacture Week: 00421999
Battery Install Week : 00222001 <----- week # when battery was installed
Battery Life Used : 0 days, 0 hours <----- usage since pcu inserted
Battery Life Span : 730 days, 12 hours
Serial Number : 003566 <-- used to identify suspected serial #
range
Battery Warranty Date: 20010322172349 <----- date & time when PCU switch
turn on
Battery Internal Flag: 0x00000000
Vendor ID : TECTROL-CAN
Model ID : 300-1454-01(50)
12.) Follow PART 2 (below) to replace the battery on the just pulled PCU.
This PCU will then replace the remaining PCU, in the T3.
--------------------------------------
PART 2 - Battery Replacement Procedure
--------------------------------------
PROCEDURES:
1.) Replacing the PCU battery:
a) Lay the PCU upside down on a work area.
b) Remove 4 Phillips screws (on top and right side) from the
battery plate.
CAUTION - BE CAREFUL NOT TO DROP ANY SCREWS INSIDE THE PCU.
THERE ARE HIGH POWERED COMPONENTS INSIDE THE PCU
THAT CAN BE SHORTED. IF YOU DROP A SCREW INSIDE
THE PCU MAKE SURE YOU RETRIVE IT BEFORE CONTINUING.
c) With the fan faceplate facing you, lift the right side of the
battery slightly and gently pull the battery assembly to the
right until the battery connector (on left) is unseated.
d) Install new battery assembly, assuring connector is fully
inserted.
NOTE - IF ANY PIN IS BROKEN OR BENT, THE POWER COOLING UNIT MUST
BE REPLACED.
e) Secure the 4 Phillips screws to battery plate.
CAUTION - BE CAREFUL NOT TO DROP ANY SCREWS INSIDE THE PCU. THERE ARE HIGH
POWERED COMPONENTS INSIDE THE PCU THAT CAN BE SHORTED. IF YOU DROP A
SCREW INSIDE THE PCU THEN MAKE SURE YOU RETRIVE IT BEFORE CONTINUING.
2.) PCU is now ready to replace the other PCU in the T3. Follow PART 1,
STEPS # 1-10, to replace the PCU.
3.) From the T3 CLI type ".bat -n u(x)pcu(y)", where u(x) is the unit # and
pcu(y) is the PCU location # of the just installed PCU. This will zero out
the 'Battery Warranty Date' field and set the 'Battery Install Week' based
on the T3 date setting. It also will zero out the 'Battery Internal Flag'
if it was set to 1, indicating low battery.
4.) Type ".id write busage u(x)pcu(y) 0 ". This will calculate the 'Battery
Warranty Date' and 'Battery Life Used'. This can be verified by typing
"id read u(x)pcu(y)".
5.) Repeat the above procedures to replace the battery on the removed
PCU and to install it back into a T3.
6.) Dispose of the battery in accordance with local laws.
----------------------------------------------
PART 3 - Refresh Schedule Adjustment Procedure
----------------------------------------------
PRE-REQUISITE:
This procedure changes the battery refresh cycle from 14 to 28 days to prolong
the life of the battery. Only perform this procedure if the T3 refresh cycle
is currently set at 14 days. Some customers who are running T3 firmware 1.17b
would already be running at 28 days, while others at lower rev T3 firmware
might have proactively changed the battery refresh cycle to 28 days per FIN
I0677-1.
Always check "/etc/schd.conf" for the 28 day battery refresh cycle before
proceeding.
PROCEDURES:
1.) Check the T3 file "/etc/schd.conf" to see if it shows "BAT_CYC 14". If
yes, then continue with the rest of this procedure. If it shows "BAT_CYC
28" then refresh cycle is already set to 28 days and this procedure can be
skipped.
2.) Check to see if the T3 is running a battery refresh by typing "refresh
-s" from the T3 CLI. If yes, type "refresh -k" then go to the next
step.
3.) Verify that your date and timezone are set correctly on the T3 for your
particular area.
usage: tzset [[+|-]hhmm] (Just type "tzset" if you want to know what
it is currently set to).
usage: date [yyyymmddhhmm[.ss]] (just type "date" if you want to know
what it is currently set to).
4.) Ftp from a workstation to the T3 that will have its scheduler adjusted.
a) Get /etc/schd.conf.
b) Edit the file in the following way;
BAT_BEG MM-DD-YYYY,hh-mm-ss (optional)
BAT_CYC 14 <-----change to BAT_CYC 28
c) Ftp to the T3 and put the modified schd.conf file back to the
/etc directory.
5.) Type "refresh -i" (Re-Initialize scheduler)
6.) Type "refresh -s" (Check date of next refresh, should be 28 days)
--------------------------------------------
PART 4 - Troubleshooting
--------------------------------------------
During the removal, insertion, or switching on of the PCU, there is a very small
chance where the T3 (ES or WG config) will reboot, and in the case of ES config
one T3 controller can be disabled. When this happens, the controller LED will
change state from a solid GREEN to either OFF (reboot started), AMBER (booting),
or Flashing AMBER (disabled).
It is important to run the extractor after the T3 boots up and to get the reset
log of the disabled controller. The extractor will, by default, get the reset
log of the remaining live controller. Give engineering the extractor and reset
log for analysis and note when the reboot occured, ie; at removal, insertion, or
power on.
Whether the disabled controller can be reused or not depends on any valid
information from the reset log. To get the reset log of the disabled
controller:
1) Remove the disabled controller from the T3.
2) Insert a new controller. Note: the new controller will boot up as alt
master role for ES config.
3) Take the removed controller back, install it in a spare T3 (single brick),
and let it boot up.
4) Via the telnet session or serial port, type "logger -dmprstlog" to dump the
reset log to the T3 syslog.
5) If the reset log shows a valid hardware problem (ex; cache parity error)
around the time the PCU was replaced, the controller should be sent
back via CPAS.
Example;
Jul 18 20:15:26 pshc[1]: N: logger -dmprstlog
Jul 18 20:15:26 pshc[1]: W: u1ctr SysFail Reset (7001) was initiated at
Cache memory parity error detected 20010626 163740
^^^^^^^^ ^^^^^^
/ /
/ /
yyyymmdd hr/min/sec
6) If the reset log shows other non-hardware related messages and that the
time of occurance is not around the time the PCU was replaced then the
controller can be deemed to be good. The problem is more related to
firmware than hardware.
Example;
Jul 13 22:03:26 pshc[1]: W: u1ctr Exception Reset (2004) was initiated at
Instruction Access exception 20001103 175513
^^^^^^^^ ^^^^^^
/ /
/ /
yyyymmdd hr/min/sec
-------------------------------------------
PART 5 - Identifying Suspected PCU/Battery
-------------------------------------------
All PCUs that fall within the following serial number range could demonstrate
early life battery pack failure and, therefore, should be changed according to
this FCO. Some PCU serial numbers within this range might already have a new
battery pack installed and so do not need to be replaced again. Steps to
identify if the battery pack needs to be replaced are outlined below.
Suspect Power Cooling Unit serial numbers:
Power Cooling Unit Battery Pack
PART NUMBER Serial Numbers Date Code
300-1454 001000 - 012509 before 0004
300-1454 012808 - 013279 9951-0002
300-1454 016694 - 018091 9951-0002
300-1454 013624 - 014915 0027
1) Before executing this FCO, the PCU serial number must first be identified
from the "id read u#pcu#" command if it falls within the range indicated
above.
a) If NO, FCO implementation is not required.
b) If YES, go to step #2 to see if the battery has already been replaced
2) To determine if a battery pack has been replaced, you will need to
look at the "Battery Install Week" from the "id read u#pcu#" command.
hostname:/:<1>id read u1pcu1
Revision : 0000
Manufacture Week : 00421999
Battery Install Week : 00222001 <--- week 22 = 5/28-6/1/01
Battery Life Used : 0 days, 0 hours
Battery Life Span : 730 days, 12 hours
Serial Number : 003566 <--- PCU serial #
Battery Warranty Date: 20010322172349
Battery Internal Flag: 0x00000000
Vendor ID : TECTROL-CAN
Model ID : 300-1454-01(50)
If "Battery Install Week" shows week 22 or later in 2001 calendar year,
the battery has already been replaced with a known good battery pack.
Corrective action was implemented in manufacturing by purging all suspect
battery packs via Worldwide Purge WP001#15897 issued on April 25, 2001.
Comments:
Billing Type:
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on how the
system was initially installed.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on the type of service contract.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
--------------------------------------------------------------------------
Implementation Footnote:
________________________
i) In case of Mandatory FCOs, Enterprise Services will attempt to contact
all known customers to recommend the part upgrade.
ii) For controlled proactive swap FCOs, Enterprise Services mission critical
support teams will initiate proactive swap efforts for their respective
accounts, as required.
iii) For Replace upon Failure FCOs, Enterprise Services partners will implement
the necessary corrective actions as and when they are required.
--------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
______________
* Access the top level URL of http://sdpsweb.EBay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
____________________
Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.Central/.
* From there, follow the hyperlink path of "SunService Documentation" and
click on "FIN & FCO attachments", then choose the appropriate folder,
FIN or FCO. This will display supporting directories/files for FINs or
FCOs.
Internet Access:
_______________
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
________
Send questions or comments to [email protected]
---------------------------------------------------------------------------