Document Audience: | INTERNAL |
Document ID: | A0245-1 |
Title: | Sun Fire V440 and Netra 440 systems using a specific networking configuration may unexpectedly reset. |
Copyright Notice: | Copyright © 2007 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Fri Jan 14 00:00:00 MST 2005 |
----------------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
----------------------------------------------------------------------------
*** Sun Confidential: Internal Use and Authorized VARs Only ***
________________________________________________________________________
This message including any attachments is confidential information
of Sun Microsystems, Inc. Disclosure, copying or distribution is
prohibited without permission of Sun. If you are not the intended
recipient, please reply to the sender and then delete this message.
________________________________________________________________________
FIELD CHANGE ORDER
(For Authorized Distribution by Enterprise Services)
FCO #: A0245-1
Status: inactive
Synopsis: Sun Fire V440 and Netra 440 systems using a specific networking configuration may unexpectedly reset.Date: Jan/14/05
SunAlert: 57618
Top FIN/FCO Report: Yes
Products Reference: Sun Fire V440, Netra 440
Product Category: Server/System Component
Product Affected:
Systems Affected:
Mkt_ID Platform Description
------ -------- -----------
- A42 Sun Fire V440
- N42 Netra 440
X-Options Affected:
Mkt_ID Platform Model Description
------ -------- ----- -----------
n/a
Parts Affected:
Part Number Description
----------- -----------
540-5418-06 (or below) Motherboard FRU, Sun Fire V440, A42
[501-6344-09 (or below) Motherboard] (see *Note below)
540-5919-05 (or below) Motherboard FRU, Netra 440, N42
[501-6344-09 (or below) Motherboard]
*Note: A number of Sun Fire V440 systems manufactured between November
1 and approximately January 21, 2005 have FRU information programmed
indicating they are 501-6344-09 motherboard, however they have the fix
integrated at the Fab level as follows:
540-5418-06 (FRU)
- 501-6344-09 with a deviation label "WO_30188"
- 270-6344-07 (Fab Level)
The systems above will have output from the prtfru(1M) command
indicating they are 501-6344-09. All motherboards with this revision
should be physically inspected for a sticker label indicating
"WO_30188". If this label is present, then this motherboard is not
affected and should not be replaced. Conversely, if the sticker label
is not present and the motherboard is identified as 501-6344-09, then
it is affected and should be replaced.
Identification of Non-affected Systems
--------------------------------------
The new board was "phase-in" into manufacturing starting November 1,
2004. This means that between November 1, 2004 approximately through
January 21, 2005 systems were built having a mixture of affected and
non-affected motherboards. Systems manufactured approximately after
January 21, 2005 are not affected.
Boards with the following dash level markings are not affected:
For Sun Fire V440:
540-5418-07 (FRU)
- 501-6344-10 (Manufacturing part number)
- 270-6344-07 (Fab level)
or
540-6336-01 (FRU)
- 501-6910-01 (Manufacturing part number)
- 270-6344-07 (Fab level)
For Netra 440:
540-5919-06 (or later) [F] Motherboard Assy, Netra 440, N42
-501-6910-01 (or later) Motherboard
To identify an affected system, use the prtfru(1M) command or
physically look at the motherboard part number information. Example
output from the prtfru(1M) command is shown below:
/frutree/chassis/MB?Label=MB
/frutree/chassis/MB?Label=MB/system-board (container)
SEGMENT: SD
/ManR
/ManR/Fru_Description: ASSY,A42,MOTHERBOARD
/ManR/Manufacture_Loc: Sriracha,Chonburi,Thailand
/ManR/Sun_Part_No: 5016344 <----------
/ManR/Sun_Serial_No:
/ManR/Vendor_Name: Celestica
/ManR/Initial_HW_Dash_Level: 10 <---------
/ManR/Initial_HW_Rev_Level: 51
/ManR/Fru_Shortname: A42_MB
/SpecPartNo: 885-0060-09
The lines noted above (<---- ) should be concatenated to form the full
manufacturing part number. In the example above, the motherboard part
number should be read as 501-6344-10 and therefore is not affected. If
the output of the prtfru(1M) command showed the motherboard part number
to be 501-6910-01 (or later), it would also be not affected.
References:
Sun Alert: 57618
URL for Sun Alert:
http://sunsolve.central.sun.com/search/document.do?assetkey=1-26-57618-1
BugID: 5039862
ESC: 551088
FIN: I1099
ECOs: WO_29854, WO_29913, WO_30189, WO_29866, and WO_30188 (deviation)
LEAPs: 2674, 2677, 2683, 2565
Issue Description:
A very small percentage of Sun Fire V440 and Netra 440 systems may
experience a "Fatal Reset" during system bus (data) transfers. This
condition only occurs when there is system bus signal activity
coincidence with a specific PCI bus signal activity occurring on the
first onboard Ethernet interface net0 (usually ce0).
A full description of this issue, port location picture, and slide
presentation are available at
http://onestop/qco/v440/index_v440.shtml
If the described issue occurs, the system will reset and the following
error message appears on the console.
Fatal Error Reset
SC Alert: Host System has Reset
The system then reboots. No core files are generated and the reset
output will not be logged to the "/var/adm/messages" file.
If it is suspected that the system is experiencing this issue, change
the OBP variables as follows to provide more verbose output in the
event of another occurrence.
Note: The OBP settings below are only recommended to verify whether
the system is experiencing this issue and should not be used long
term. Once the failure is verified then the parameters should be set
back to their original values (make a note of these before changing).
The settings below provide more verbose output:
diag-switch? true
post-trigger none
obdiag-trigger none
When the parameters above are set, the error message will include some
additional
information indicating the reset reason as "PBM FATAL", with a PCI
IO-Bridge register output similar to:
ha019 console login:
Fatal Error Reset
SC Alert: Host System has Reset
@(#)OBP 4.10.10 2003/08/29 06:25 Sun Fire V440
Clearing TLBs
Loading Configuration
Membase: 0000.0033.0000.0000
MemSize: 0000.0000.4000.0000
Init CPU arrays Done
Init E$ tags Done
Setup TLB Done
MMUs ON
Scrubbing Tomatillo tags... 0 1
Block Scrubbing Done
Find dropin, Copying Done, Size 0000.0000.0000.5ca0
PC = 0000.07ff.f000.4c88
PC = 0000.0000.0000.4d28
Find dropin, (copied), Decompressing Done, Size 0000.0000.0006.6700
ttya initialized
System Reset: (PBM FATAL)
JBUS-PCI bridge
JBUS-PCI bridge
slave Error Register: 8000000000001000
This has been root caused to increased signal noise found to be related to
inconsistent materials thickness on a portion of the system board. See
Escalation 551088 for more information and the slide presentation as noted
above.
The technical fix was a redesigned system board phased into production in
October 2004. The redesign includes an update to the layout of board
signals to eliminate noise transfers between signals.
Corrective action was made available by Sun manufacturing via the above
listed ECOs.
Corrective action was made available by Sun Services via the above
listed LEAPs.
Parts Affected:
January 31, 2006
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
--- (proactive for systems covered by Gold or above entitlement)
---
| | UPON FAILURE
---
Replacement Time Estimate:
1 hour or more
Special Considerations:
The dates below are estimates for when each timezone will be materially
ready to begin support of this FCO.
PCI Ethernet card (part 501-5902):
US/Canada mid-January 2005
Ltn America mid-January 2005
EMEA mid-January 2005
APAC/Japan mid-January 2005
Sun Fire V440 motherboard:
US/Canada early-March 2005
Ltn America early-March 2005
EMEA early-March 2005
APAC/Japan early-March 2005
The above dates will be updated as new information is made available.
Corrective Action:
IMPORTANT! Before starting any remediation activity, please review all
documentation via the onestop URL below for a full summary of the
available remediation strategies, including both Implementation and
Technical FAQs which may be updated from time to time.
In addition, the onestop URL has a global list of Sun Fire V440 and Netra 440
systems under Gold or Platinum service contract as of 31 Jan 2005. (Note: this
list might point to systems that are unaffected by the issue described in
FCO A0245)
http://onestop/qco/v440/index_v440.shtml
Hardware Remediation Strategies
-------------------------------
For this Controlled Proactive FCO, present the following two options for
hardware remediation. Both offer a permanent solution to the issue.
1) PCI GigaSwift Ethernet Card Installation (see details below)
2) System Board (Motherboard) Replacement (see details below)
The following applies ONLY to affected systems that are currently
covered by Sun Gold or above entitlement.
1) Your customer agrees to accept the PCI Ethernet card as the permanent
and final resolution.
2) Replace the system board as soon as one becomes available. (PCI card
option does not meet your customer's business need.) Your customer
needs a new motherboard as a priority.
3) Your customer accepts the PCI Ethernet card as an interim solution
and would like to be queued for receiving a new motherboard when
availability becomes less constrained.
4) Replace the motherboard only, but your customer agrees to be queued
on a low-priority basis.
As a reminder, a workaround is provided in Sun Alert 57618 which is to
only use the second Ethernet port net1 (ce1) provided the system
configurations and applications only require a single network port.
Any proactive hardware replacement requests for systems not covered by
Gold or above entitlement should be directed to the FCO Tiger Team Lead
(John Schoenfeld) for approval.
*****
Please work with your GEO TZ FCO representatives for scheduling
priority motherboard replacements and keep track of your pending
deployment steps.
*****
PCI GigaSwift Ethernet Card Installation
----------------------------------------
If your customer has an available 66MHz slot the following PCI Ethernet
card can be ordered at no cost (based on the schedule above). This PCI
Ethernet card provides a permanent solution and should be the first
suggested fix to the customer. Please note that this solution is upon
customer agreement. Your customer may choose to use this card as a
temporary solution until a new motherboard is available.
* 501-5902 Sun GigaSwift Ethernet 1.0 UTP (Copper)
This card is tested and supported and provides full gigabit network
replacement functionality. Lower performance may be experienced if a
33MHz slot is used.
The schedule for availability for the above PCI Ethernet card is
provided above in the Special Considerations section.
When parts are available use normal logistics processes for ordering.
IMPORTANT INSTALLATION NOTES:
Before installing the PCI Ethernet card, it is highly recommended to
disable net0 (ce0).
To completely disable net0 (ce0) from the system, use the following
commands to install an NVRAM script at the OBP "ok" prompt:
ok nvedit
0: probe-all install-console banner
1: " /pci@1c,600000/network@2" $delete-device drop
2:
^C
Type "Ctrl-C" to exit nvedit as shown above. Then continue with:
ok nvstore
ok setenv use-nvramrc? true
use-nvramrc? = true
ok reset-all
After the system resets, net0 should not be visible by OBP (i.e. you
should not see a path to net0 [/pci@1c,600000/network@2] when you run
"show-devs" from OBP). And the net0 device should not be seen by
Solaris (e.g. prtconf or prtpicl commands).
Due to Solaris instance numbering, if this is done before initial
Solaris installation, net1 may not be assigned to the ce1 instance but
instead to "ce0". This needs to be verified before the above PCI card
installation by examining the instance assignments and correlating them
to the physical device paths in the /etc/path_to_inst file.
For example use the grep(1) command to ensure the device path
"/pci@1f,700000/network@1" is matched to ce1 and not ce0 as shown below:
host% grep network /etc/path_to_inst
"/pci@1c,600000/network@2" 0 "ce"
"/pci@1f,700000/network@1" 1 "ce"
The second output line above shows the device path "/pci@1f,700000/network@1"
is matched to net1 (ce1).
This example shows the path_to_inst of a system that had Solaris
installed prior to disabling the net0 port, as it includes an entry for
the net0 port at physical address "/pci@1c,600000/network@2". A system
that had the onboard net0 port disabled prior to Solaris installation
would not have this device entry, and would have a net0 instance
assigned to a different port - either onboard net1 port or a PCI card
port. Match the physical device path with the instance number to
determine which net# (ce#) to use for network configuration and usage.
Also before a Solaris reinstallation, take note of the Solaris logical
naming (instance numbers in the device tree), for "net0" and "ce0".
The net0 and net1 as discussed above refer to the RJ45 ports labeled
"0" and "1" physical port labels on the back of the system.
The best PCI slot in which to install the PCI Ethernet card is slot 5.
Slot 5 is a 66MHz-capable PCI slot that is on the same bus as net0.
You can insert the card into PCI slot 2 or 4, because both of these are
also 66MHz-capable slots, but the remaining slot (2 or 4) should NOT
contain a 33MHz card. Slots 2 and 4 share the same bus, and the bus
can only go as fast as the slowest card. Lower performance may be
experienced if the 33MHz Slot 0, 1 and 3 are used.
HINT: System administrators may want to physically mark or plug the
disabled first Ethernet port after disabling so that it is not used
in the future.
System Board (Motherboard) Replacement
--------------------------------------
Use the guidelines provided above in "Hardware Remediation Strategies"
before arranging to replace the motherboard.
This issue is addressed with the following parts or later revisions :
For Sun Fire V440:
540-5418-07 [F] Motherboard Assy, Sun Fire V440, A42
-501-6344-10 Motherboard
or
540-6336-01 [F] Motherboard Assy, Sun Fire V440, A42
-501-6910-01 Motherboard
For Netra 440:
540-5919-06 [F] Motherboard Assy, Netra 440, N42
-501-6910-01 Motherboard
Comments:
Below are 3 additional procedures to consider:
1) Sun Advanced Lights Out Manager 1.5 (ALOM):
On Sun Fire V440 and Netra 440 servers you must upgrade to ALOM
firmware 1.5 or later (501-6346-07) before installing the new
motherboard (540-6336). Older revisions of the ALOM firmware are not
able to identify the new motherboard part number (FRU-id) correctly
and as a result the system will fail to pass OBP self-test. See the
following URL:
http://www.sun.com/servers/alom.html
2) New motherboards (F540-6336/MB501-6910 *only*) have Diag-On
implemented via Patch 115846-05, Hardware/PROM: Sun Fire V440 and
Netra 440 Flash PROM Update - OBP 4.16.1.
For additional information visit http://www.sun.com/documentation
817-6957-10 OpenBoot PROM Enhancements for Diagnostic Operation.
3) If you are planning to service a Sun Fire V440 system, please read
FIN I1156-1 prior to your customer visit and take the appropriate
action. This FIN is not related to the issue described in this FCO.
However, it describes an important Sun field action when servicing
recently manufactured Sun Fire V440 systems.
________________________________________________________________________
NOTE: FCO Tracking Instructions for Radiance/SPWeb:
--------------------------------------------------
If a Radiance case involves the application of an FCO to solve a customer
issue, please complete the following steps in Radiance/SPWeb prior to
closing the case:
o Select "Field Change Order" in the REFERENCE TYPE field.
o Enter FCO ID number in the REFERENCE ID field.
For example; A0222-1.
If possible, include additional details in the REFERENCE SUMMARY field
(ie. Upgrade complete, customer declined, etc.)
________________________________________________________________________
Implementation Notes
--------------------
In case of "Mandatory" FCOs, Sun Services will attempt to contact
all known customers to recommend proactive implementation.
---
For "Controlled Proactive" FCOs, Sun Services mission critical
support teams will initiate proactive implementation efforts for
their respective accounts, as required. This implementation type
is used to proactively remediate systems covered by Gold or above
contract entitlement.
The CIC process must be used for proactive hardware replacement
requests for systems not covered by a Gold or above contract
entitlement when an FCO is classified as "Controlled Proactive".
---
For "Upon Failure" FCOs, Sun Services and partners will implement
the necessary corrective actions as the need arises.
The CIC process must be used for proactive hardware replacement
requests when an FCO is classified as "Upon Failure".
----------------------------------------------------------------
Billing Information
-------------------
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on specified
Warranty deliverables for the affected product.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on service contract deliverables.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
________________________________________________________________________
All FCO documents are accessible via Internal SunSolve. Type "sunsolve"
in a browser and follow the prompts to Search Collections.
For questions on this document, please email:
[email protected]
The FIN and FCO homepage is available at:
http://sdpsweb.central/FIN_FCO/index.html
For more information on how to submit a FCO, go to:
http://pronto.central/fco.html
To access the Service Partner Exchange, use:
https://spe.sun.com
________________________________________________________________________