Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003321.1
Update Date:2012-07-30
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003321.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: POST: IBIST Failures  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
204608


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server - Version: Not Applicable and later    [Release: N/A and later]
Sun Fire E25K Server - Version: Not Applicable and later    [Release: N/A and later]
All Platforms

Symptoms

POST reports an IBIST failure. Some examples:

#1:

stage ibist: Interconnect BIST...
AXQ-RMX IBIST...
ERR: IBIST error: AXQ EX3 RMX C0 Exp 0x0aaaaaaaa Obs 0x03c345555 XOR 0x0969effff.
FAIL EXB EX3: IBIST failure
Primary service FRU is EXB EX3.
Secondary service FRU is CSB C0 or the logic centerplane.


#2:

stage ibist: Interconnect BIST...
ERR: IBIST error: DMX C1/D0 SDI EX4/S3 Error bits = 0x1555554. FAIL EXB EX4: IBIST failure

Cause

IBIST is the Interconnect built-in-self-test between two ASICs.

One of the ASICs acts as the master driving preset/programmable bit patterns, and the other ASIC receives the patterns and then echoes them back. If the echoed pattern received by the master does not match the original pattern, the test fails.

In the first example above, AXQ EX3 is the master and RMX0 is the slave. The AXQ EX3 is expecting the pattern 0x0aaaaaaaa, but 0x03c345555 is received. 0x0aaaaaaaa XOR 0x03c345555 = 0x0969effff shows the bits in error.

Solution

Please contact Oracle Support and raise a Service Request if this issue is seen.  Additionally, please have a Solaris Explorer output file available from the Main System Controller at the time of raising the Service Request.


Internal Instructions:

All IBIST failures within close proximity must be considered when deciding the appropriate FRU. If there's only a single failure, as shown above, it is logical to replace what POST suggests as the primary FRU:   EX3 in this example.

However, if multiple IBIST failures are present, they must be considered holistically. For example, suppose SDI2 on 4 expanders all report IBIST failures to a given DMX. Taken together, this would call the DMX (i.e., the centerplane) into question as it is unlikely that multiple expanders would fail.

Finally, improper board seating or damaged centerplane pins are possible causes for IBIST failures. If a service action involving a suspect FRU was recently conducted, check for seating issues and/or pin/connector damage.

If the IBIST failure is an AXQ<-->RMX0 error and the customer is still running SMS 1.2, first confirm that POST SunPatch 112488-10 (or higher) is applied to the system. And upgrade to 1.6 ASAP.

- Summary of part number and patch ID's
SMS 1.2 112488-10
- References and bug IDs
4704614

- Additional background information:
For details on what IBIST tests are available, refer to the online documentation in 'redx'.
redx>   ibist
Under no circumstances should IBIST be executed on a component supporting a running domain. It will crash all domains relying on that component. Furthermore, if IBIST is run manually, the component must be power cycled after completion to return the ASIC(s) to a known, clean state. Refer to bug 4743556 for an example of why.

- Keywords
15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K, POST, IBIST



Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback