Asset ID: |
1-72-1017539.1 |
Update Date: | 2012-07-30 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1017539.1
:
Sun Fire[TM] 12K/15K/E20K/E25K, Rio External Loopback Test failure
Related Items |
- Sun Fire E25K Server
- Sun Fire E20K Server
- Sun Fire 12K Server
- Sun Fire 15K Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
- .Old GCS Categories>Sun Microsystems>Servers>High-End Servers
|
PreviouslyPublishedAs
228682
Applies to:
Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server
All Platforms
Symptoms
After an upgrade to System Management Services 1.5 or higher, you may see a "Rio External Loopback Test failure" reported on one or several HsPCI[+] IO board during Hpost (level >= 16).
Example:
Man Ether IO7 Rio External Loopback Test failure
Rio External Loopback Test Man Ether IO7 results
Post attempted to loopback 8 packets of 128 bytes between the
IO board and the SC.
The test failed after exhausting all 8 retries.
There were 9 hard failures, either a non data error occurred
or no packets were received by the IO board.
There were 0 failures where one or more of the packets was not
fully transferred.
The most packets transferred in any attempt was 0 and those
packets contained 0 bytes in error.
The mand driver reported that it received 0 packets and echoed 0
packets back, with 0 errors in and 0 errors out.
Man Ether Hub IO7 reports:
0 collision-free packets from the SC
1 collisions from the SC
72 collision-free packets from the IO board
0 collisions from the IO board
73 collision-free packets from the network
1 collisions on the interface
FAIL Man Ether IO7: EpiRIOR1_sc_tfunc(): Test FAILED
Primary service FRU is Slot IO7.
Secondary service FRU is EXB EX7, Current main SC, or the logic centerplane.
INTERNAL: xcpdn_parse_devname(Man Ether IO7) returns -1
[...]
I/O_Brds: IOC P1/Bus/Adapt IOC P0/Bus/Adapt
Slot Gen Type P1 B1/10 B0/10 P0 B1/eb10 B0/10 (e=ENet, b=BBC)
IO07: P hsPCI P p _p p _p P p fP_p p _p
Cause
Depending on the Slot 1 configuration of the domain, this may prevent the domain from booting or some devices (disks, network) from being accessible.
Note: this does not mean that the HsPCI[+] board does not function properly from an IO perspective but that a test failure has been detected on the RIO or hub on the IO Board.
This may be due to a known behaviour of SMS 1.5 and above.
Solution
Contact your service provider for further analysis
and guidance.
Product
Sun Fire E25K Server
Sun Fire E20K Server
Sun Fire 15K Server
Sun Fire 12K Server
Internal Section
Troubleshooting steps:
- At time of failing over to the spare SC, the former main will be reset. If the EIS checklist has been followed, then the diag level of the MAIN SC will be pmax-epmax and the SPARE pmax-epvmax. This means that extended tests should already have been executed. If for any reason the diag level is not as described, then set it according to the EIS checklist and reboot the SC to run the extended tests.
- It is possible to have detailed statistics about the RIO External Loop Back Test by adding the "report_mand_rio_stats" directive in the .postrc of the associated domain(s) and run the setkeyswitch again.
- The problem is considered as persistent if the test failed for multiple HsPCI[+] IO boards and the same multiple HsPCI[+] IO boards fail every setkeyswitch.
- If the problem is persistent, relief can be provided by adding the "no_mand_access" directive to the .postrc file of the affected domain(s). This directive prevents POST running the RIO External Loopback Test thus avoiding the problem. This solution has been successfully applied for years now on hundreds of systems.
- If the Hpost succeeds after applying "no_mand_access" directive, the issue is resolved.
- If the Loopback test still fails with "no_mand_access" in place, please escalate to the next support level.
A troubleshooting flowchart process is available here.
Relief/Workaround
Add the "no_mand_access" directive in the .postrc of the associated domain(s) and run the setkeyswitch again. This directive prevents POST from running this test.
Internal Comments
The test consists of MAND echoing the incoming packets from LPOST. Hpost is responsible for instructing lpost to drive the echo test and, when LPOST is done, for consolidating MAND and LPOST test results.
References:
BugID
4467547 - POST should implement RIO I/O test functions
BugID
4399574 - MAND should provide RIO I/O functions for POST tests
BugID
6463931 - Rio External Loopback Test failure during SMS 1.5 hpost
See also the "report_mand_rio_stats" and "no_mand_access" .postrc directives (Sun Fire 15K POST .postrc Command Reference is available
here).
Previously Published As 83174
Attachments
This solution has no attachment