Document Audience: | INTERNAL |
Document ID: | A0230-1 |
Title: | B1600 systems configured with one or more Sun Fire B10n Content Loadbalancing blades may experience a hardware ssue resulting in a "watchdog timeout" event. |
Copyright Notice: | Copyright © 2007 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Wed Jun 23 00:00:00 MDT 2004 |
___________________________________________________________________
*** Sun Confidential: Internal Use and Authorized VARs Only ***
__________________________________________________________________
This message including any attachments is confidential information
of Sun Microsystems, Inc. Disclosure, copying or distribution is
prohibited without permission of Sun. If you are not the intended
recipient, please reply to the sender and then delete this message.
__________________________________________________________________
FIELD CHANGE ORDER
(For Authorized Distribution by Enterprise Services)
FCO #: A0230-1
Status: inactive
Synopsis: B1600 systems configured with one or more Sun Fire B10n Content Loadbalancing blades may experience a hardware ssue resulting in a "watchdog timeout" event.Date: Jun/23/2004
SunAlert: No
Top FIN/FCO Report: No
Products Reference: Puma B10n Load Balancing Blade
Product Category: Server / System Component
Product Affected:
Systems Affected:
Mkt_ID Platform Model Description
------ -------- ----- -----------
- A44 ALL Sun Fire B1600
X-Options Affected:
Mkt_ID Platform Model Description
------ -------- ----- -----------
x8080A A44 All Sun Fire B10n Load Balancing Blade
Parts Affected:
Part Number Description
----------- -----------
540-5593-03(Or Less) Load Balancing Blade - B10n
(SCSI Devices)
Type Vendor Model SerialNumber(Min) SerialNumber(Max) Firmware
---- ------ ------- ----------------- ----------------- --------
N/A
References:
LEAP: 2557
ECO : WO_28736
Issue Description:
Sun Fire B1600 systems configured with one or more Sun Fire B10n Content
Loadbalancing blades may experience a hardware issue resulting in a "watchdog
timeout" event. If the blade is configured in a high availability failover
configuration the standby unit will take over.
Error messages can be observed either at the SC or at the B10n console. At the
SC console "watchdog timeout" events could be observed if the Network Processor
Unit (NPU) were to fail.
Login:LOM event: Offset: +0h2m31s host watchdog timeout modified
LOM event: Offset: +0h3m43s host FAULT: watchdog triggered
LOM event: Offset: +0h3m43s host reset
LOM event: Offset: +0h3m43s Svc_Reqd LED state change: ON
Below is an initialization error. The BSC INIT fails and then subsequently,
the error messages are sent to the console corresponding to each initialization
failure up until a system reset is issued by the SC:
*/ Copyright ) 2003 Sun Microsystems, Inc.
Copyright 1984-2001 Wind River Systems, Inc.
Booting SunFire B10n Blade
Bootrom Build Date: Oct 16 2003, 20:00:53
Press any key to choose configuration file option...
0
Press any key to choose boot image...
0
auto-booting...
Booting Image /RFA0/BOOTIMAGE/boot_image_1 ...3134320
Initializing RDRAM ... Done
Initializing SDRAM ECC ... Done
Initializing BSC Interface ... ERROR[-1]:BSC Initialize failed
muxDevLoad failed for device entry 0!
muxDevLoad failed for device entry 1!
Invalid device "tffs=0,00"
Driver not initialized: Not starting applications
LOM event: Offset: +1h7m2s host reset
At the B10n console, the following messages might occur if the NPU was not
responding:
RDRAM: Rambus configuration failed
Initialization of Lookup Pool Failed
Initialization of Lookup table Failed
Driver not initialized: Not starting applications
IF POST or diagnostics are run, the following are messages that may be seen.
Regarding Diags/Post Error Prints, a few are listed. However, these would be
specific to the devices, and would let the user know which module failed
reading/writing to a particular register (name and address of the register is
displayed):
diag_gmac.c:641: DiagPrintf("%16s: *ERROR*", basePtr[i].regName);
diag_misc.c:7488: DiagPrintf("ERROR READING BOARD CONFIGURATION DATA.\n");
diag_misc.c:8071: printf("***ERROR*** Bad SPD rev level %02d for
device %02d\n",
diag_omac.c:982: DiagPrintf("%16s: *ERROR*", regTablePtr[i].regName);
diag_pio.h:278: ERROR --> RTC NVRAM overflow!!!
diag_ppe.c:3754: DiagPrintf("ERROR: ICC load failed.\n");
diag_ppe.c:3765: DiagPrintf("ERROR: Overall ICC image
wouldn't load cleanly, aborting scan.\n");
diag_ppe.c:3784: DiagPrintf("ERROR: ICC load of PHINT failed.\n");
diag_rdram.c:176 DiagPrintf("\n\nERROR reading
device 0x%02X register 0x%02X (%s), aborting.\n",
diag_rdram.c:1802: DiagPrintf("\n\nERROR writing device 0x%02X
register 0x%02X (%s), aborting.\n",
diagnostics.c:981: {"ERRORLOG", diagErrorLog, FALSE},
diagnostics.c:1198: DiagPrintf("COMMAND COMPLETED, ERROR.\n");
diagnostics.c:1248: DiagPrintf("COMMAND COMPLETED, ERROR.\n");
diagnostics.c:1258: DiagPrintf("COMMAND COMPLETED, ERROR.\n");
diagnostics.c:1268: DiagPrintf("COMMAND COMPLETED, ERROR.\n");
diagnostics.c:1317: DiagPrintf(" DIAG> ERRORLOG [T|V|N|F]
Display (T|V), chk cnt (N) or flush (F).\n\n");
diagnostics.c:1547: DiagPrintf("COMMAND COMPLETED, ERROR.\n");
diagnostics.c:2449: DiagPrintf("ERROR: POST Exiting with Unkown
ChipType %d, 6 is expected.\n", diagApiRec.hostChipType);
errorlog.c:386: "ERROR: %02x%02x%02d%02d%02d %02d%02d %d %02d
%08x%08x %08x%08x %08x%08x %08x%08x",
errorlog.c:401: "ERROR: #%03d.%03d, %02d-%s-%02d,%02d:%02d, Agent %d,
%s.\n %s\n Parameters: %08x%08x, %08x%08x\n
%08x%08x, %08x%08x\n",
errorlog.c:415: sprintf(buffer, "ERROR: Agent %d, %s %s\n",
In order to determine the dash level of the Blade, run the service controller
console command "showfru sX" (where X is the slot number of the B10n);
sc>showfru s6
SEGMENT: SD
/ManR/UNIX_Timestamp32: Thu Apr 17 23:24:13 UTC 2003
/ManR/Fru_Description: SUNW,Sun Fire B10n, IQ4, RD512MB, VR4, SD512MB
/ManR/Manufacture_Loc: Milpitas, CA, USA
/ManR/Sun_Part_No: 5405593
/ManR/Sun_Serial_No: 000005
/ManR/Vendor_Name: Solectron
/ManR/Initial_HW_Dash_Level: 01
/ManR/Initial_HW_Rev_Level: 01
/ManR/Fru_Shortname: SF B10n
An internal only link to the customer list showing all external and internal
customers on different tabs can be viewed via below URL;
http://sdpsweb.central/FIN_FCO/FCO/FCO_A0230-1_Dir/CustomerList.sxc
Root cause has determined that a critical component, the Network Processor
Unit (NPU) (Sun Part number 100-7643-01), has a potential long term
reliability failure mode based on high temperature storage tests by the
manufacturer at 150 degrees Celcius.
Corrective action was made available on March 26, 2004 via ECO# WO_28736 by
releasing the new part 540-5593-04. Corrective action was made available in
Sun Services on March 31, 2004 via LEAP# 2557 by changing the Minimal
Acceptable Level (MAL) from 540-5593-03 to 540-5593-04.
Parts Affected:
June 30, 2006
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | UPON FAILURE
---
Replacement Time Estimate:
0.5 hours
Special Considerations:
Please mark all Defective Material Tags with "FCO A0230-1 - Do Not Screen".
Corrective Action:
At all external customers per the above Customer List, proactively do the
following:
- replace all 540-5593-03 (or below) with 540-5593-04 (or above)
Sun internal systems should only be implemented reactively (Upon Failure).
IMPORTANT: Please follow the process steps below when replacing the
540-5593.
Installing an upgraded B10 board at the customer site
=====================================================
The upgraded B10n blade has the following:
------------------------------------------
1. The 1.0 BSC firmware
2. Two B10n boot images - version 1.2.3 and 1.1.2_diag.
The default boot image is 1.2.3.
3. The B10n bootrom, version 1.2.3
To export the configuration from the old board:
-----------------------------------------------
1. Go to the /RFA0 directory
puma{admin}# cd /
2. Tar the CONFIG directory:
puma{admin}# tar lbconfig.tar CONFIG
3. Export the config tar file:
puma{admin}# export file
The FTP server address:
The source directory path: type [cr] to use current directory:
(null) source path, using current directory
The source file name: lbconfig.tar
The destination directory path:
The destination file name: lbconfig.tar
The user name:
The user password:
export file succeed!
To import the configuration to the upgraded board:
--------------------------------------------------
1. Poweroff the old board and take it out. Plug in the upgraded board.
2. The board comes up with an empty configuration with the B10n 1.2
application image running.
3. Configure the network interface. Optionally, configure the management
VLAN (if applicable).
4. Go to the /RFA0 directory
puma{admin}# cd /
5. Import the old (1.0/1.1.x) configuration.
puma{admin}# import file
The FTP server address:
The source directory path:
The source file name: lbconfig.tar
The destination directory path:
(null) path, using current directory...
The destination file name: lbconfig.tar
The user name:
The user password:
import file succeed!
6. Untar the configuration file.
puma{admin}# untar lbconfig.tar
7. Reboot the B10n blade to get the imported configuration.
puma{admin}# reboot
NOTE: To run traffic with B10n 1.2 application image, the blade server
module has to be updated to version 1.2.
To update the B100s blade server module to version 1.2:
-------------------------------------------------------
1. Download the 1.2 version of the blade server module software from the
following site:
http://wwws.sun.com/software/download/network.html
2. Unzip the file:
# /usr/bin/unzip SunFire_B10n-1_2-Solaris-ServerModule.zip
3. Install the blade server module software packages:
# cd /Solaris_8/Packages
# pkgadd -d .
4. Restart the blade server module:
# /etc/init.d/clbctl stop
# /etc/init.d/clbctl start
Comments:
None
------------------------------------------------------------------------------
Billing Type:
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on how the
system was initially installed.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on the type of service contract.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
--------------------------------------------------------------------------
Implementation Footnote:
________________________
i) In case of Mandatory FCOs, Sun Services will attempt to contact
all known customers to recommend the part upgrade.
ii) For controlled proactive swap FCOs, Sun Services mission critical
support teams will initiate proactive swap efforts for their respective
accounts, as required.
iii) For Replace upon Failure FCOs, Sun Services partners will implement
the necessary corrective actions as and when they are required.
--------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunSolve Internal Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
_______________
* Access the top level URL of https://spe.sun.com
FIN/FCO Homepage Access:
_________________________
* Access the top level URL of http://sdpsweb.Central/FIN_FCO/index.html
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
To submit either a FIN or FCO refer to the following URLs for templates
and instructions;
* For FCO: http://pronto.central/fco.html
* For FIN: http://pronto.central/fin.html
--------------------------------------------------------------------------
General:
________
Send questions or comments to [email protected]
---------------------------------------------------------------------------