Document Audience: | INTERNAL |
Document ID: | A0155-1 |
Title: | Installation of certain SBus cards in slot 1 of E10000 having older I/O Mezzanine Boards has been found to cause unpredictable behavior, including undetected data corruption. |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Wed Jan 12 00:00:00 MST 2000 |
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD CHANGE ORDER
(For Authorized Distribution by SunService)
FCO #: A0155-1
Status: inactive
Synopsis: Installation of certain SBus cards in slot 1 of E10000 having older I/O Mezzanine Boards has been found to cause unpredictable behavior, including undetected data corruption.Date: Jan/12/00
Keywords:
Installation of certain SBus cards in slot 1 of E10000 having older I/O Mezzanine Boards has been found to cause unpredictable behavior, including undetected data corruption
Top FIN/FCO Report: Yes
Products Reference: E10000 I/O Mezzanine Board
Product Category: Server / System Board
Product Affected:
Systems Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- E10000 ALL Sun Enterprise 10000 Server (see comments)
- HPC10000 ALL Sun HPC Server (see comments)
- SSP ALL System Service Processor -
X-options Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
X2730A - - Sun Enterprise 10000 SBus I/O Board -
Parts Affected:
Part Number Description Model
----------- ----------- -----
501-4349-XX Sun Enterprise 10000 SBus I/O Board -
501-6525-xx Mislabeled E10000 SBus I/O Board -
(SCSI Devices)
Type Vendor Model SerialNumber(Min) SerialNumber(Max) Firmware
---- ------ ------- ------------------ ------------------ --------
References:
ECO: WO_12425
DPCO: 157
Patch Number: 104853-05, 105684-07, 108345-01
BugId: 4046986, 4049704, 4243882, 4091053, 4157729, 4258577
FIN: I0405-3
DOC: 805-2917-14 Sun Enterprise 10000 System Service Guide
Issue Description:
Installation of a SunSwift or SunFastEthernet, Gigabit Ethernet, or Sun FC-AL
SBus cards in slot 1 has been found to cause unpredictable behavior in an SBus
card which may be in slot 0 (for example, SOC or (U)DWIS). Depending on the
type of SBus card in slot 0, this behavior can exhibit itself as resets,
offlines, and other reported errors as well as data corruption errors that
can go undetected by the system.
SunSwift, SunFastEthernet, Gigabit Ethernet 1.1 or Sun FC-AL SBus cards,
including the SunSwift hme/fas combo card (X1018A), the Sun FastEthernet 2.0
(X1059A) card, the SBus Gigabit Ethernet card (X1045A) and the SOC+ FCAL SBus
card (X6730A) should not be installed in SBus slot 1 on an Enterprise 10000
system board which has a 501-4349-xx SBus I/O Mezzanine Board installed.
SunSwift, SunFastEthernet, Gigabit Ethernet 1.1, or Sun FC-AL SBus cards
should be installed ONLY in SBus slot 0 of E10000 system boards having the
501-4349-xx SBus I/O Mezzanine Board installed. See FIN I0405-3 for details.
The workaround for this has been to restrict the placement of the above
mentioned SBus cards to only slot 0 of the SBus. This becomes a problem if
the customer needs to install one of the restricted cards but critical devices
are already connected to all of the available slots labeled slot 0.
Example Error message;
WARNING: /sbus@75,0/QLGC,isp@0,10000 (isp0):
ISP: Firmware cmd timeout
WARNING: /sbus@75,0/QLGC,isp@0,10000 (isp0):
Fatal error, resetting interface
isp0: State dump from isp registers and driver:
mailboxes(0-5): 0x4001, 0x4953, 0x5020, 0x2020, 0x1, 0x1
bus: isr= 0x6, icr= 0x0, conf0= 0x1, conf1= 0x0
cdma: count= 0, addr= 0x0, status= 0x2, conf= 0x0, fifo_status= 0x40
dma: count= 0, addr= 0x0, status= 0x2, conf= 0x0
risc: R0-R7= 0x1e, 0xa5e3, 0x1d, 0x3f32, 0x0, 0x457, 0x30 0x18
risc: R8-R15= 0x5b78, 0x4bd, 0x470, 0x1000, 0x472, 0x1000, 0x10 0x0
risc: PSR= 0xf000, IVR= 0x10ef, PCR=0x1000, RAR0=0x30, RAR1=0x5d7e
risc: LCR= 0x1, PC= 0x457, MTR=0xffff, EMB=0x0, SP=0x5cfe
request(in/out)= 29/1, response(in/out)= 27/27
request_ptr(current, base)= 0x71320780 (0x71320040)
response_ptr(current, base)= 0x71324700 (0x71324040)
period/offset: 25/8 25/8 25/8 25/8 25/8 25/8 25/8 12/8
period/offset: 12/8 12/8 12/8 12/8 12/8 12/8 12/8 12/8
Patch #104853-05 for SSP 3.0, patch #105684-07 for SSP 3.1, and SSP3.1.1
provide a check in OBP for unsafe SBus card placement for the SunSwift,
SunFastEthernet and Gigabit Ethernet 1.1 cards. The Sun FC-AL SBus card
is not checked. This check will not allow the system to boot if an unsafe
configuration is detected. SSP 3.2 does not incorporate this check.
Error messages of this type will be seen if software detects an invalid
configuration;
ERROR: sbus slot 1 on board 1 SYSIO 0 contains: SUNW,hme SUNW,fas
This configuration may cause data corruption.
(and)
Cannot boot: Configuration error.
sbus slot 1 on board 1 SYSIO 0 contains: SUNW,hme SUNW,fas
Manufacturing phased-in a new I/O Mezzanine Board (Sun p/n 501-4478-01) via
ECO# WO_WO_12425 on January 26th, 1998.
Implementation:
---
| X | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | UPON FAILURE
---
Replacement Time Estimate:
125 - 170 mins (per 4 to 7 board E10K system)
[average is 2.5 hours for a 5 board system]
Special Considerations:
Implementation of this FCO will be carried out in a phased and prioritized
manner. Check with your Geo FCO representative for procedure. It is very
important that you check with your Geo/Country representative for material
availability before scheduling FCO activity as well as reporting completion
of all implementation activities to your Geo/Country representative.
Corrective Action:
NOTE! Only Sun Authorized Service personnel are authorized to perform
the following maintenance actions on E10000 systems. SSP 3.0
has been EOLed. SSPs should be upgraded to SSP 3.1 or higher.
1(a). Determine locations of the I/O Mezzanine boards to be replaced.
To determine which SBus I/O Mezz boards are installed in a running system,
use the board_id command on the main SSP. This may be done via remote access
if available at customer's sites. This can be safely used on running domains
with a 10 second sleep between board_id commands.
Cut-n-save (between the Cut Here markings) the below script into a file
and run the executable as user "ssp". All system boards should be powered
on, and a 10 second sleep between board_id commands is recommended.
--- Cut Here --- --- Cut Here --- --- Cut Here --- --- Cut Here ---
#!/bin/sh
#
# io_bd_rev.sh
# Check for 501-4349 or 501-6525 I/O boards installed in E10000.
#
# Run as user "ssp".
# All system boards should be powered on.
# A 10 second sleep between board_id commands is recommended.
#
# initialize
REV_CK1=0
REV_CK2=0
rm -f /tmp/rev_ck.out 1>/dev/null 2>/dev/null
# get the revisions
echo "Checking board \c"
for b in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
echo "$b...\c"
echo `uname -n`":SB:"$b": "`board_id -b io -n $b` >> /tmp/rev_ck.out
sleep 10
done
echo "done."
echo
# find the down-rev boards
REV_CK1=`grep -c "501-4349" /tmp/rev_ck.out`
REV_CK2=`grep -c "501-6525" /tmp/rev_ck.out`
# print out the list of down-rev boards
if [ `expr $REV_CK1 + $REV_CK2` -gt 0 ]; then
echo Down-rev I/O Boards:
grep "501-4349" /tmp/rev_ck.out | awk '{print $1, $3, $4, $5}'
grep "501-6525" /tmp/rev_ck.out | awk '{print $1, $3, $4, $5}'
else
echo No down-rev I/O Boards found.
fi
echo
echo Total number of down-rev I/O Boards: `expr $REV_CK1 + $REV_CK2`
# clean-up
rm /tmp/rev_ck.out
# end of io_bd_rev.sh
--- Cut Here --- --- Cut Here --- --- Cut Here --- --- Cut Here ---
The output of the above script will list the system boards with
down-rev Part Number (P/N 501-4349-XX or 501-6525-XX) I/O Mezz boards,
and will list the total of these Boards in the system, for example:
ssp2:domain2% ./io_bd_rev.sh
Checking board 0...1...2...3...4...5...6...7...8...9...10...11...12...13...14...15...done.
Down-rev I/O Boards:
ssp2:SB:0: Part Number 501-4349-04
ssp2:SB:1: Part Number 501-4349-04
ssp2:SB:8: Part Number 501-4349-04
ssp2:SB:15: Part Number 501-4349-50
ssp2:SB:9: Part Number 501-6525-04
ssp2:SB:10: Part Number 501-6525-04
Total number of down-rev I/O Boards: 6
1(b). The eepr command from within redx can be used with recent
recordstop or arbstop dumps (ones that reflect the current system
configuration). This method will require at least one dump file
from each domain in order to read the Part Number information from
all of the I/O Boards.
2. Order the required number of replacement I/O Boards and schedule
time with the customer for this maintenance.
3. Determine if the I/O Boards to be replaced are on system boards
that can be DR detached. For more details on requirements for DR,
see the following:
The E10000 User Guides at:
http://marvin.west/pubs/starfire_user/
Sections:
Dynamic Reconfiguration
Alternate Pathing
Solaris Installation and Release Notes (includes DR install)
The RAS Companion:
http://marvin.West.Sun.COM/pubs/ras_companion
4(a). If using DR to remove system boards.
4.1. On the SSP, add the following line to the .postrc file for
the domain (Note that the .postrc file may be located in the
/export/home/ssp directory and will affect all domains or it
may be located in the /var/opt/SUNWssp/etc// directory. For more information enter: hpost -?postrc or: man
postrc).
level 64
4.2. Set up the system board for DR detach by switching any active
AP networks or disk paths, dissociating any mirrors, removing
and offlining disks, and other necessary tasks to make the system
board available for detach.
4.3. Start the DR process on the SSP, as user "ssp", either with
Hostview or by entering: dr on the command line.
4.4. Drain the system board.
dr> drain
4.5. Complete the detach of the system board.
dr> complete_detach
4.6. Power off the system board
ssp% power -off -sb
4.7. Remove the system board.
4.8. Replace P/N 501-4349-XX I/O Boards with P/N 501-4478-XX according
to the procedures in the Sun Enterprise 10000 System Service Guide,
P/N 805-2917-14.
Replacement of an I/O Mezzanine Board will require that the
system board be removed from a running domain and powered off, if
the system board cannot be DR detached, the domain will need to
be shutdown. Approximate time needed to replace an I/O Board is
20 minutes, after the system board has been powered off.
4.9. Re-insert the system board, re-connect the I/O cables,
and power it on.
4.10. Init the attach of the system board.
dr> init_attach
4.11. After completion of init_attach, verify that all components are
configured into the domain, check the hpost logs for any failures
and repair as necessary.
4.12. Complete the attach of the system board.
dr> complete_attach
4.13. Repeat steps 2 - 12 for each system board that has an I/O
board to be replaced.
4.14. When all the I/O boards have been replaced, remove or comment
out the level 64 entry in the .postrc file.
4.15. Upgrade SSP software to eliminate boot restrictions as
follows:
SSP Version OBP Patch
----------- ---------
3.0 Upgrade to SSP 3.1 or higher
3.1 105684-08
3.1.1 108345-01
3.2 N/A
4(b). If DR is not being used.
4.1. As user "root" on the domain, shutdown the domain.
4.2. As user "ssp" on the SSP, power off all the boards in the domain.
ssp% power -off -sb
4.3. Replace P/N 501-4349-XX I/O Boards with P/N 501-4478-0X according
to the procedures in the Sun Enterprise 10000 System Service Guide,
P/N 805-2917-14.
4.4. As user "ssp" on the SSP, power on all the system boards in the
domain.
ssp% power -on -sb
4.5. Run a -l64 bringup on the domain, and bringup to OBP.
ssp% bringup -l64 -A off
4.6. After completion of the bringup, verify that all components are
configured into the domain, check the hpost logs for any failures
and repair as necessary.
4.7. Boot the system (Note OBP patch level and boot restrictions).
4.8. Repeat steps 1 - 7 for each domain that has I/O Boards to be
replaced.
4.9. When all the I/O boards have been replaced, remove or comment
out the level 64 entry in the .postrc file.
4.10. After all the domains have had the I/O Boards replaced,
upgrade SSP software to eliminate boot restrictions as follows:
SSP Version OBP Patch
----------- ---------
3.0 Upgrade to SSP 3.1 or higher
3.1 105684-08
3.1.1 108345-01
3.2 N/A
5. Return and scrap the 501-4349-XX and 501-6525-XX I/O Mezz boards.
Comments:
A small number of 501-4349-xx I/O Boards were incorrectly programmed with
the Part Number 501-6525-xx. If this I/O Board part number is detected it
should be processed as part number 501-4349-xx.
Billing Type:
Warranty: Sun will provide parts and on-site labor at no charge
during normal working hours.
Contract: Sun will provide parts and on-site labor at no charge
during normal working hours.
Non Contract: Sun will provide parts and on-site labor at no charge
during normal working hours.
--------------------------------------------------------------------------
Implementation Footnote:
________________________
i) In case of Mandatory FCOs, Enterprise Services will attempt to contact
all known customers to recommend the part upgrade.
ii) For controlled proactive swap FCOs, Enterprise Services mission critical
support teams will initiate proactive swap efforts for their respective
accounts, as required.
iii) For Replace upon Failure FCOs, Enterprise Services partners will implement
the necessary corrective actions as and when they are required.
--------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
______________
* Access the top level URL of http://sdpsweb.EBay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
____________________
Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.Central/.
* From there, follow the hyperlink path of "SunService Documentation" and
click on "FIN & FCO attachments", then choose the appropriate folder,
FIN or FCO. This will display supporting directories/files for FINs or
FCOs.
Internet Access:
_______________
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
________
Send questions or comments to [email protected]
---------------------------------------------------------------------------