Document fcos/A0183-1


FCO #: A0183-1

SYNOPSIS: Sun Storage T3 Power Cooling Unit Battery Packs may experience early
          life failures.

DATE: Jan/22/2002

KEYWORDS: Sun Storage T3 Power Cooling Unit Battery Packs may experience early
          life failures.


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                               FIELD CHANGE ORDER
                  (For Authorized Distribution by SunService)


SYNOPSIS: Sun Storage T3 Power Cooling Unit Battery Packs may
          experience early life failures.
                     
Sun Alert: No
 
TOP FIN/FCO REPORT: Yes
 
PRODUCT_REFERENCE:  T3 Array Battery Pack

PRODUCT CATEGORY: Storage / Array

PRODUCTS AFFECTED:

Mkt_ID  Platform  Model       Description                 Serial Number
------	--------  -----       -----------		  -------------
Systems Affected
------- --------
-       Anysys      -      System Platform Independent          -

X-Options Affected
--------- -------
-         T3          ALL     T3 StorEdge Array         (See Below)

AFFECTED PARTS:

Part Number     Description                                Model
-----------     -----------                                -----
300-1454-01     PWR Supply, Purple 1, NIMH          
370-3956-01     Battery Pack, Purple 1, NIMH

(SCSI Devices)
Type    Vendor    Model     SerialNumber(Min)    SerialNumber(Max)    Firmware
----    ------    -------   ------------------   ------------------   --------
N/A

REFERENCES :
  DPCO: 233.A  RSL purge of affected PCU/Batteries
  DPCO: 288 Defective battery purge
  BugID: 4403799  
  ESC: 529253 
  ECO: WO_21095 Release of Battery FRU only
  LEAP: 1565  T3 Array Battery Refresh Cycle Change
  FIN: I0674-1 
  FIN: I0677-1
  WWStopShip: WP001#15897 
  Manual: 806-1062-11 Sun StorEdge T3 Disk Tray Installation Operation and
                      Service Manual
  Manual: 806-1063-11 Sun StorEdge T3 Disk Tray Installation Administration
                      Guide

PROBLEM DESCRIPTION :

Sun StorEdge T3 Arrays shipped prior to December 31, 2000 may experience early
life failures of their Power Cooling Unit (PCU) Battery Packs.

Early life failures of the T3 Array (Purple) Power Cooling Unit (PCU) Battery
Packs are contributing to higher than expected number of Power Cooling Unit
failures in the field.  Sun Engineering have determined that approximately 70%
of all PCU failures are due to early life battery pack failures.

Affected Batteries within the Power Cooling Unit were initially only charged to
30% capacity and subsequently sat on the shelf for a prolonged period.  During
this time the batteries went into a deep discharge.  A deep discharge can
induce
the lowest capacity cell to reverse it's polarity because of differences in the
capacity of each cell.  Polarity reversal during a deep discharge leads to gas
pressure building within the battery, until gas releases from the vent of the
positive terminal, in turn fluid leakage occurs, causing and the battery to
suffer severe damage.

The StorEdge T3 Array contains dual Power Cooling Units so that if one fails
the
other Power Cooling Unit takes over, and the failure is reported in the syslog
file.  When this happens the data cache is no longer used and instead of
writing
to cache the system writes to disk.  This is a safeguard against losing data in
the cache in the event of a subsequent power failure.

Following is an example of battery failure messages:

 Jan 11 15:38:10  BATD[1]: N: Battery Refreshing cycle starts from this point.
 Jan 11 15:38:11  LPCT[1]: N: u1pcu1: Refreshing battery
 Jan 11 15:38:14  LPCT[1]: N: u2pcu1: Refreshing battery
 Jan 11 15:38:17  LPCT[1]: N: u1pcu1: Battery not OK

 ** Expected message...system has detected drop in voltage from u1pcu1
    as a result of the discharge test.

 Jan 11 15:38:22  BATD[1]: N: u1pcu1: hold time was 12 seconds.
 Jan 11 15:38:25  BATD[1]: W: u1pcu1: Replace battery, hold time low.
 Jan 11 15:52:48  LPCT[1]: N: u2pcu1: Battery not OK

 ** u2pcu1 expected message....

 Jan 11 15:52:50  BATD[1]: N: u2pcu1: hold time was 878 seconds.

 ** u2pcu1 passed the test as this is about a 14 minute hold time.

 Jan 12 06:38:21  BATD[1]: W: u1pcu1: Replace battery, hold time low.
 Jan 12 06:38:21  BATD[1]: N: u1pcu1 Battery took too long to recharge.
 Jan 12 06:38:21  BATD[1]: N: u1pcu2:skips
 battery refresh because the other PCU u1pcu1 : PCU1 hold time low

 ** Expected we would skip the u1pcu2 refresh test since u1pcu1
    has failed.

 Jan 12 06:38:23  LPCT[1]: N: u2pcu2: Refreshing battery
 Jan 12 06:52:09  LPCT[1]: N: u2pcu2: Battery not OK
 Jan 12 06:52:12  last message repeated 1 time
 Jan 12 06:52:15  BATD[1]: N: u2pcu2: hold time was 833 seconds.

 ** u2pcu2 is ok based on this hold time.

 Jan 12 07:02:47  pshc[1]: N: fru stat
 Jan 12 07:03:27  pshc[1]: N: refresh -s
 Jan 12 18:01:24  BATD[1]: N: Battery Refreshing cycle ends at this point.

Based on Engineering root cause analysis, there is a possibility that any
battery pack that was initially charged to 30% capacity and sat on the shelf
for
a prolonged period of time may fail earlier than expected.  The battery pack
was
specified for a two year life cycle.  Field reports indicate that Power Cooling
Units with suspect battery packs are failing between six months to one year.

All PCUs that fall within the following serial number range could demonstrate
an
early life battery pack failure and, therefore, should be changed according to
this FCO:

         Suspect Power Cooling Unit serial numbers:

                        Power Cooling Unit  Battery Pack
        Part Number     Serial Numbers      Date Code

        300-1454        001000 - 012509    before 0004
        300-1454        012808 - 013279    9951-0002
        300-1454        016694 - 018091    9951-0002
        300-1454        013624 - 014915    0027

Corrective action was implemented in Manufacturing by purging all suspect
battery packs via Worldwide Purge WP001#15897 issued on April 25, 2001.
The Battery only FRU was made available via ECO WO_21095 on June 26, 2001.
Corrective action was put in place in Enterprise Services via DPCO 233.A
on June 1, 2001, LEAP 1565 on June 28, 2001, and DPCO 288 released on 
November 30, 2001 for Defective battery purge.

Suspect PCU Identification:

 Reference the following StarOffice document in order to identify suspect
 Power Cooling Units;

 http://sdpsweb.EBay/FIN_FCO/FCO/FCO_A0183-1_Dir/t3_list_PCU_FCO.sdc
 
Note: To view document click on the above URL, then save to your local
          disk using your Netscape 'file' button and select 'save as', then
          open file locally using StarOffice.

IMPLEMENTATION :

 ---
|   |   MANDATORY (Fully Pro-Active)
 ---

 ---
| X |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
 ---

 ---
|   |   UPON FAILURE
 ---

REPLACEMENT TIME ESTIMATE : 0.25 hours

PLANNED IMPLEMENTION COMPLETION DATE: July 22, 2002

SPECIAL CONSIDERATION :

This problem can be minimized by setting a longer interval between battery
refresh cycles (reference FIN I0677-1).  Although resetting the refresh cycle
may extend the battery life, all suspect battery packs should be changed in
accordance with this FCO.

CORRECTIVE ACTION :

Upgrade suspect Power Cooling Units (Sun p/n 300-1454-01) by replacing the
Power Cooling Unit battery packs (370-3956-01).  The battery packs can be
replaced by following the below instructions.

----------------------------------------------
PART 1 - StorEdge T3 PCU Replacement Procedure
----------------------------------------------

PRE-REQUISITES:

1.) First identify if the T3 contains PCUs within suspected serial number
    range.  Refer to the Part 5 below.

    If YES, then continue on.  If NO, stop. FCO implementation not required.

2.) Verify that all loop cables (for ES config) and MIAs are screwed down
    tightly by using a small flathead screwdriver and tigntening each loop
    cable. Be very careful not to disconnect any loop cable. If you notice
    a loop cable that is not screwed in at all, notify customer.

3.) Verify all controllers and loop cards are in their prospective slots
    securely by pushing on each card and verifying that all latches are
    in the locked position.

4.) Verify that all PCUs are in their prospective slots securely by
    pushing on each PCU and verifying that the PCU latches are in the
    locked position.

5.) Type "fru stat" to check ALL T3 FRUs are in a healthy state
    and that their LEDs are in their normal state before proceeding.

6.) Type "date" and "tzset" to check if the date and
    timezone are correct.  If not, use the "date" and "tzset"
command to
    set the date and timezone, respectively.

7.) Type "refresh -s" to check that no battery refresh is running
before
    proceeding.  Also, check if the "Next Refresh" won't begin shortly
after
    executing this FCO.  If yes, the "Next Refresh" should be
re-scheduled to a
    later time (24 hours).  Refer to the Field Service Manual or Part 3 below
    for changing the refresh time (BAT_BEG) in the file /etc/schd.conf.

    If battery status is reported as "Low", this is ok as the purpose
of this 
    FCO is to replace it.

8.) Type "proc list" to check that no drive reconstruction is running
before 
    proceeding.

9.) It is recommended that the customer be at T3 firmware 1.17b when executing
    this procedure and that this be done during a maintenance window to
minimize
    disruption to customer operation.

10.) Have 1 or 2 spare T3 controllers and PCUs, on site.

11.) Be aware of FIN I0745-1 relating to possible PCU midplane connector
     damages.  Have a spare chassis available if damages are discovered.

12.) Customer should be aware that performance will degrade during and
     after execution of this FCO as new batteries will need to be charged
     up after power on.  Charging can take up to 12 hours (per battery) and
     during this time write caching will be disabled.


PROCEDURES:

1.) Power off the PCU that is going to be replaced by pressing the power
    switch.

        NOTE - DO NOT POWER OFF MORE THAN ONE PCU AT A TIME FOR EITHER
               ES OR WG CONFIGURATION.  Powering off/removing a PCU will
               cause the T3 cache to run in write-through mode.  Make sure
               that the AC LED (left) is AMBER and the PS LED (right)
               is OFF.

2.) Disconnect power cord from the PCU.

3.) Push the PCU latches into the unlocked position and pull the unit out of
the
    disk tray.  Wait 15 seconds and then verify that both controller online
LEDs
    are still GREEN.  If any controller LED changes to non-solid GREEN (ie
    OFF/AMBER/Flashing AMBER) then immediatly refer to the "PART 4 -
    Troubleshooting" section (below) before continuing.

        NOTE    - DO NOT REPLACE MORE THAN ONE PCU AT A TIME FOR EITHER
                  ES OR WG CONFIGURATIONS.

CAUTION - Any PCU that is removed must be replaced within 30 minutes or the Sun
          StorEdge T3 disk tray and all attached disk trays will automatically 
          shutdown and power off.

CAUTION - For partner pair configurations make sure that the loop cables have
	  significant length to spread apart so you can remove u1pcu1.  Also
	  make sure that the loop cables, along with other cables connected to
	  the T3, are screwed in tightly so you do not inadvertantly knock them
	  off during removal/insertion.

4.) Using a flashlight, inspect the left and right sides of the
    PCU midplane connector for possible cracks or damages (per
    FIN I0745-1).  If damages are found, notify customer immediately
    that the chassis must be replaced.

5.) Install new PCU.  Wait 15 seconds and then verify that both
    controller online LEDs are still GREEN.  If any controller LED changes
    to non-solid GREEN (ie OFF/AMBER/Flashing AMBER) immediatly refer
    to the "PART 4 - Troubleshooting" section below before continuing.

6.) Push the PCU latches into the locked position.

7.) Connect power cord to the PCU.

8.) Verify that the AC LED (left) is AMBER, indicating that AC power is
present.

9.) Power on the PCU by pressing the power switch.

10.)Verify that both LEDs on the Power Cooling Unit are Green, indicating that
    the unit is receiving power.  Wait 15 seconds and then verify that both
    controller online LEDs are still GREEN.  If any controller LED changes to
    AMBER immediatly refer to the "PART 4-Troubleshooting" section
below before
    continuing.

    Note - The PS LED (right) may blink GREEN for a period of time.
           (up to 12 hours for charging per battery while write caching
           is disabled)

11.) Type "fru stat" to check if new PCU is recognized and functioning.
     Battery might show up as "fault" as it is charging up.

      Verify the Battery Warranty Date by typing "id read u(x)pcu(y)".

   hostname:/:<1>id read u1pcu1
   Revision: 0000
   Manufacture Week: 00421999
   Battery Install Week : 00222001  <----- week # when battery was
installed
   Battery Life Used    :   0 days, 0 hours  <----- usage since pcu
inserted
   Battery Life Span    : 730 days, 12 hours
   Serial Number        : 003566    <-- used to identify suspected serial #
                          range
   Battery Warranty Date: 20010322172349 <----- date & time when PCU
switch
                          turn on
   Battery Internal Flag: 0x00000000
   Vendor ID            : TECTROL-CAN
   Model ID             : 300-1454-01(50)

12.)  Follow PART 2 (below) to replace the battery on the just pulled PCU.
      This PCU will then replace the remaining PCU, in the T3.

--------------------------------------
PART 2 - Battery Replacement Procedure
--------------------------------------

PROCEDURES:

1.) Replacing the PCU battery:

     a) Lay the PCU upside down on a work area.

     b) Remove 4 Phillips screws (on top and right side) from the
        battery plate.

     CAUTION - BE CAREFUL NOT TO DROP ANY SCREWS INSIDE THE PCU.
               THERE ARE HIGH POWERED COMPONENTS INSIDE THE PCU
               THAT CAN BE SHORTED.  IF YOU DROP A SCREW INSIDE
               THE PCU MAKE SURE YOU RETRIVE IT BEFORE CONTINUING.

     c) With the fan faceplate facing you, lift the right side of the
        battery slightly and gently pull the battery assembly to the
        right until the battery connector (on left) is unseated.

     d) Install new battery assembly, assuring connector is fully
        inserted.

        NOTE - IF ANY PIN IS BROKEN OR BENT, THE POWER COOLING UNIT MUST
               BE REPLACED.

     e) Secure the 4 Phillips screws to battery plate.

CAUTION - BE CAREFUL NOT TO DROP ANY SCREWS INSIDE THE PCU.  THERE ARE HIGH
	  POWERED COMPONENTS INSIDE THE PCU THAT CAN BE SHORTED.  IF YOU DROP A
	  SCREW INSIDE THE PCU THEN MAKE SURE YOU RETRIVE IT BEFORE CONTINUING.

2.) PCU is now ready to replace the other PCU in the T3.  Follow PART 1,
    STEPS # 1-10, to replace the PCU.

3.) From the T3 CLI type ".bat -n u(x)pcu(y)", where u(x) is the unit #
and
    pcu(y) is the PCU location # of the just installed PCU.  This will zero out
    the 'Battery Warranty Date' field and set the 'Battery Install Week' based
    on the T3 date setting.  It also will zero out the 'Battery Internal Flag'
    if it was set to 1, indicating low battery.

4.) Type ".id write busage u(x)pcu(y) 0 ".  This will calculate the
'Battery
    Warranty Date' and 'Battery Life Used'.  This can be verified by typing
    "id read u(x)pcu(y)".

5.) Repeat the above procedures to replace the battery on the removed
    PCU and to install it back into a T3.

6.) Dispose of the battery in accordance with local laws. 


----------------------------------------------
PART 3 - Refresh Schedule Adjustment Procedure
----------------------------------------------

PRE-REQUISITE:

This procedure changes the battery refresh cycle from 14 to 28 days to prolong
the life of the battery.  Only perform this procedure if the T3 refresh cycle
is currently set at 14 days.  Some customers who are running T3 firmware 1.17b
would already be running at 28 days, while others at lower rev T3 firmware
might have proactively changed the battery refresh cycle to 28 days per FIN
I0677-1.

Always check "/etc/schd.conf" for the 28 day battery refresh cycle
before 
proceeding.

PROCEDURES:

1.) Check the T3 file "/etc/schd.conf" to see if it shows "BAT_CYC
14".  If
    yes, then continue with the rest of this procedure.  If it shows
"BAT_CYC
    28" then refresh cycle is already set to 28 days and this procedure can
be
    skipped.

2.) Check to see if the T3 is running a battery refresh by typing "refresh
    -s" from the T3 CLI.  If yes, type "refresh -k" then go to the
next 
    step.

3.) Verify that your date and timezone are set correctly on the T3 for your 
    particular area.

    usage: tzset [[+|-]hhmm] (Just type "tzset" if you want to know
what 
    it is currently set to).

    usage:  date [yyyymmddhhmm[.ss]] (just type "date" if you want to
know 
    what it is currently set to).

4.) Ftp from a workstation to the T3 that will have its scheduler adjusted.

    a) Get /etc/schd.conf.

    b) Edit the file in the following way;

        BAT_BEG MM-DD-YYYY,hh-mm-ss (optional)
        BAT_CYC 14  <-----change to BAT_CYC 28

    c) Ftp to the T3 and put the modified schd.conf file back to the
       /etc directory.

5.) Type "refresh -i" (Re-Initialize scheduler)

6.) Type "refresh -s" (Check date of next refresh, should be 28 days)

--------------------------------------------
PART 4 - Troubleshooting
--------------------------------------------

During the removal, insertion, or switching on of the PCU, there is a very
small
chance where the T3 (ES or WG config) will reboot, and in the case of ES config
one T3 controller can be disabled.  When this happens, the controller LED will
change state from a solid GREEN to either OFF (reboot started), AMBER
(booting),
or Flashing AMBER (disabled).

It is important to run the extractor after the T3 boots up and to get the reset
log of the disabled controller.  The extractor will, by default, get the reset
log of the remaining live controller.  Give engineering the extractor and reset
log for analysis and note when the reboot occured, ie; at removal, insertion,
or
power on.

Whether the disabled controller can be reused or not depends on any valid
information from the reset log.  To get the reset log of the disabled
controller:

1) Remove the disabled controller from the T3.

2) Insert a new controller.  Note:  the new controller will boot up as alt
   master role for ES config.

3) Take the removed controller back, install it in a spare T3 (single brick),
   and let it boot up.

4) Via the telnet session or serial port, type "logger -dmprstlog" to
dump the
   reset log to the T3 syslog.

5) If the reset log shows a valid hardware problem (ex; cache parity error)
   around the time the PCU was replaced, the controller should be sent
   back via CPAS.
     
   Example;

   Jul 18 20:15:26 pshc[1]: N: logger -dmprstlog
   Jul 18 20:15:26 pshc[1]: W: u1ctr SysFail Reset (7001) was initiated at
   Cache memory parity error detected               20010626 163740
                                                    ^^^^^^^^ ^^^^^^
                                                   /        /
                                                  /        /
                                             yyyymmdd   hr/min/sec

6) If the reset log shows other non-hardware related messages and that the
   time of occurance is not around the time the PCU was replaced then the
   controller can be deemed to be good.  The problem is more related to
   firmware than hardware.

   Example;
   
   Jul 13 22:03:26 pshc[1]: W: u1ctr Exception Reset (2004) was initiated at
   Instruction Access exception                     20001103 175513
                                                    ^^^^^^^^ ^^^^^^
                                                   /        /
                                                  /        /
                                            yyyymmdd   hr/min/sec



-------------------------------------------
PART 5 - Identifying Suspected PCU/Battery
-------------------------------------------

All PCUs that fall within the following serial number range could demonstrate
early life battery pack failure and, therefore, should be changed according to
this FCO.  Some PCU serial numbers within this range might already have a new
battery pack installed and so do not need to be replaced again.  Steps to
identify if the battery pack needs to be replaced are outlined below.

         Suspect Power Cooling Unit serial numbers:

                        Power Cooling Unit  Battery Pack
        PART NUMBER     Serial Numbers      Date Code

        300-1454        001000 - 012509    before 0004
        300-1454        012808 - 013279    9951-0002
        300-1454        016694 - 018091    9951-0002
        300-1454        013624 - 014915    0027

1)  Before executing this FCO, the PCU serial number must first be identified
    from the "id read u#pcu#" command if it falls within the range
indicated
    above.

    a)  If NO, FCO implementation is not required.

    b)  If YES, go to step #2 to see if the battery has already been replaced
        

2)  To determine if a battery pack has been replaced, you will need to
    look at the "Battery Install Week" from the "id read
u#pcu#" command.

           hostname:/:<1>id read u1pcu1
           Revision             : 0000
           Manufacture Week     : 00421999
           Battery Install Week : 00222001  <--- week 22 = 5/28-6/1/01
           Battery Life Used    :   0 days, 0 hours
           Battery Life Span    : 730 days, 12 hours
           Serial Number        : 003566    <--- PCU serial #
           Battery Warranty Date: 20010322172349
           Battery Internal Flag: 0x00000000
           Vendor ID            : TECTROL-CAN
           Model ID             : 300-1454-01(50)

If "Battery Install Week" shows week 22 or later in 2001 calendar year,
the battery has already been replaced with a known good battery pack.
Corrective action was implemented in manufacturing by purging all suspect
battery packs via Worldwide Purge WP001#15897 issued on April 25, 2001.


COMMENTS :

BILLING TYPE:

 Warranty: Sun will provide parts at no charge under Warranty
           Service. On-Site Labor Rates are based on how the
           system was initially installed.

 Contract: Sun will provide parts at no charge. On-Site Labor Rates
           are based on the type of service contract.

 Non Contract: Sun will provide parts at no charge. Installation by
               Sun is available based on the On-Site Labor Rates
               defined in the Price List.

--------------------------------------------------------------------------
Implementation Footnote:
________________________

i)   In case of Mandatory FCOs, Enterprise Services will attempt to contact
      all known customers to recommend the part upgrade.

ii)  For controlled proactive swap FCOs, Enterprise Services mission critical
     support teams will initiate proactive swap efforts for their respective
     accounts, as required.

iii) For Replace upon Failure FCOs, Enterprise Services partners will implement

     the necessary corrective actions as and when they are required.

--------------------------------------------------------------------------

All released FINs and FCOs can be accessed using your favorite network
browser as follows:

SunWeb Access:
______________

* Access the top level URL of http://sdpsweb.EBay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.

SunSolve Online Access:
_______________________

* Access the SunSolve Online URL at http://sunsolve.Central/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
____________________

Supporting documents for FIN/FCOs can be found on Edist.  Edist can be
accessed internally at the following URL: http://edist.Central/.

* From there, follow the hyperlink path of "SunService Documentation"
and
  click on "FIN & FCO attachments", then choose the appropriate
folder,
  FIN or FCO.  This will display supporting directories/files for FINs or
  FCOs.

Internet Access:
_______________

* Access the top level URL of https://infoserver.Sun.COM

--------------------------------------------------------------------------
General:
________

Send questions or comments to [email protected]

---------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.