Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1018832.1 : Sun Fire[TM] Midframe servers: POST fails during IOPOST, marking all I/O Boards (IBs) as bad.
PreviouslyPublishedAs 230625 Symptoms All I/O Boards (IBs) are marked as bad during IOPOST. This can be misleading while diagnosing the right FRU Resolution Sometimes all I/O Boards (IBs) are marked as bad because of a faulty CPU running IOPOST. The CPU itself running POST is bad, which unfortunately goes undetected by LPOST (POST for the CPU itself). See a snippet from the console logs below. Replacing the SB containing the rogue CPU will resolve the issue. Note the following from the console logs : ------------------------------------------- a) SB4/P0 is the processor running the IOPOST b) SB4/P0 marks IB6/P0 and IB6/P1 - the two IO controllers on IB6 as "Failed" c) SB4/P0 marks IB8/P0 and IB8/P1 - the two IO controllers on IB8 as "Failed" d) SB4/P0 is actually the bad CPU. Since the CPU itself is faulty, it cannot reliably test the IBs, marking the controllers on the IBs as failed. e) SB4/P0 goes undetected during its own Self Test (called LPOST) f) It is highly unlikely that all of the IO controllers (IB6/P0, IB6/P1, IB8/P0 and IB8/P1) are bad. Console logs : -------------- {/N0/SB4/P0} ERROR: TEST=PCI IO Controller Functional Tests,SUBTEST=PCI IO Controller DMA loopback Tests ID=152.2 {/N0/SB4/P0} Component under test: /N0/IB6/P0 PCI IOC {/N0/SB4/P0} Data Access Error from address 00000000.08000820. AFSR = 00000002.00000094 {/N0/SB4/P0} Secondary AFAR 00000000.08000820, Secondary AFSR = 00000002.00000094 {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 01 63 00000099.80000606 00000000.0001ca48 00000000.0001ca4c {/N0/SB4/P0} (CE) Correctable system data ECC error {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 02 32 00000044.80001504 00000000.0000f1e0 00000000.0000f1e4 {/N0/SB4/P0} 01 63 00000099.80000606 00000000.0001ca48 00000000.0001ca4c {/N0/SB4/P0} (CE) Correctable system data ECC error {/N0/SB4/P0} (TO) Time-out from system bus {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 03 34 00000091.80001507 00000000.00014d80 00000000.00014d84 {/N0/SB4/P0} 02 32 00000044.80001504 00000000.0000f1e0 00000000.0000f1e4 {/N0/SB4/P0} 01 63 00000099.80000606 00000000.0001ca48 00000000.0001ca4c {/N0/SB4/P0} AFSR = 00000000.00000000 {/N0/SB4/P0} AFAR = 00000000.08000820 {/N0/SB4/P0} IMMU SFSR = 00000000.00000000 {/N0/SB4/P0} DMMU SFSR = 00000000.00700009 {/N0/SB4/P0} DMMU SFAR = 00000000.08000820 {/N0/SB4/P0} PState = 00000000.00000015 {/N0/SB4/P0} Dispatch Control =00000000.0000103f {/N0/SB4/P0} Data Cache Unit Control =0000ce00.0000000e {/N0/SB4/P0} Safari Config. = 0aaa0028.20200006 {/N0/SB4/P0} EState = 00000000.00000000 {/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27 {/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights reserved. {/N0/SB4/P0} Use is subject to license terms. {/N0/SB4/P0} Running PCI IO Controller Basic Tests {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 02 32 00000044.80001503 000007ff.f0007cc0 000007ff.f0007cc4 {/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec 000007ff.f0009bf0 {/N0/SB4/P0} (TO) Time-out from system bus {/N0/SB4/P0} (PRIV) Privileged code access error(s) {/N0/SB4/P0} (ME) Multiple Errors of the same type occurred {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 03 32 00000099.80001502 000007ff.f0006a58 000007ff.f0006a5c {/N0/SB4/P0} 02 32 00000044.80001503 000007ff.f0007cc0 000007ff.f0007cc4 {/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec 000007ff.f0009bf0 {/N0/SB4/P0} (TO) Time-out from system bus {/N0/SB4/P0} (PRIV) Privileged code access error(s) {/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27 {/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights reserved. {/N0/SB4/P0} Use is subject to license terms. {/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27 {/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights reserved. {/N0/SB4/P0} Use is subject to license terms. {/N0/IB6/P0} Failed <--- !! {/N0/IB6/P1} Failed <--- !! Sep 10 11:05:24 he101 Domain-A.SC: Excluded unusable, unlicensed, failed or disabled board: /N0/IB6 Copying IO prom to Cpu dram ................................... {/N0/SB4/P0} Running PCI IO Controller Basic Tests {/N0/SB4/P0} Jumping to memory 00000000.00000020 [00000010] {/N0/SB4/P0} System PCI IO post code running from memory {/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:28 {/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights reserved. {/N0/SB4/P0} Use is subject to license terms. {/N0/SB4/P0} Subtest: PCI IO Controller Register Initialization for aid 0x1c {/N0/SB4/P0} Running PCI IO Controller Functional Tests {/N0/SB4/P0} Subtest: PCI IO Controller IOMMU TLB Compare Tests for aid 0x1c {/N0/SB4/P0} Subtest: PCI IO Controller IOMMU TLB Flush Tests for aid 0x1c {/N0/SB4/P0} Subtest: PCI IO Controller DMA loopback Tests for aid 0x1c {/N0/SB4/P0} ERROR: TEST=PCI IO Controller Functional Tests,SUBTEST=PCI IO Controller DMA loopback Tests ID=152.2 {/N0/SB4/P0} Component under test: /N0/IB8/P0 PCI IOC {/N0/SB4/P0} Data Access Error from address 00000000.08000820. AFSR = 00000002.00000094 {/N0/SB4/P0} Secondary AFAR 00000000.08000820, Secondary AFSR = 00000002.00000094 {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 01 63 00000099.80000605 00000000.0001c8b4 00000000.0001c8b8 {/N0/SB4/P0} (CE) Correctable system data ECC error {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 02 32 00000044.80001503 00000000.0000f1e0 00000000.0000f1e4 {/N0/SB4/P0} 01 63 00000099.80000605 00000000.0001c8b4 00000000.0001c8b8 {/N0/SB4/P0} (CE) Correctable system data ECC error {/N0/SB4/P0} (TO) Time-out from system bus {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 03 34 00000091.80001506 00000000.00014d80 00000000.00014d84 {/N0/SB4/P0} 02 32 00000044.80001503 00000000.0000f1e0 00000000.0000f1e4 {/N0/SB4/P0} 01 63 00000099.80000605 00000000.0001c8b4 00000000.0001c8b8 {/N0/SB4/P0} AFSR = 00000000.00000000 {/N0/SB4/P0} AFAR = 00000000.08000820 {/N0/SB4/P0} IMMU SFSR = 00000000.00000000 {/N0/SB4/P0} DMMU SFSR = 00000000.00700009 {/N0/SB4/P0} DMMU SFAR = 00000000.08000820 {/N0/SB4/P0} PState = 00000000.00000015 {/N0/SB4/P0} Dispatch Control =00000000.00000000 {/N0/SB4/P0} Data Cache Unit Control =00000000.0000000c {/N0/SB4/P0} Safari Config. = 0aaa0028.20200006 {/N0/SB4/P0} EState = 00000000.00000000 {/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27 {/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights reserved. {/N0/SB4/P0} Use is subject to license terms. {/N0/SB4/P0} Running PCI IO Controller Basic Tests {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec 000007ff.f0009bf0 {/N0/SB4/P0} (TO) Time-out from system bus {/N0/SB4/P0} (PRIV) Privileged code access error(s) {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 02 32 00000099.80001502 000007ff.f0006a58 000007ff.f0006a5c {/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec 000007ff.f0009bf0 {/N0/SB4/P0} (TO) Time-out from system bus {/N0/SB4/P0} (PRIV) Privileged code access error(s) {/N0/SB4/P0} tl tt tstate tpc tnpc {/N0/SB4/P0} 03 32 00000099.80001502 000007ff.f0006a58 000007ff.f0006a5c {/N0/SB4/P0} 02 32 00000099.80001502 000007ff.f0006a58 000007ff.f0006a5c {/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec 000007ff.f0009bf0 {/N0/SB4/P0} (TO) Time-out from system bus {/N0/SB4/P0} (PRIV) Privileged code access error(s) {/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27 {/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights reserved. {/N0/SB4/P0} Use is subject to license terms. {/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27 {/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights reserved. {/N0/SB4/P0} Use is subject to license terms. {/N0/IB8/P0} Failed <--- !! {/N0/IB8/P1} Failed <--- !! Sep 10 11:05:47 he101 Domain-A.SC: Excluded unusable, unlicensed, failed or disabled board: /N0/IB8 Sep 10 11:05:47 he101 Domain-A.SC: No usable Io board in domain. setkeyswitch operation did not complete Relief/Workaround Disable the System Board (SB) containing the CPU running IOPOST (that fails IOPOST), so we move IOPOST to run on a different CPU. This can be achieved by using the "disablecomponent" command from the system controller interface (SC-App) Alternatively, disabling the processor itself using the "disablecomponent" command is a valid workaround too. Product Sun Fire V1280 Server Sun Fire 6800 Server Sun Fire 4810 Server Sun Fire 4800 Server Sun Fire 3800 Server IOPOST, IB6/P0, Failed, DMA, Functional, Controller Previously Published As 72039 Change History Date: 2009-11-25 User Name: Josh Freeman Action: Refreshed Comment: No changes made - refreshed per ESG Content Team effort. Date: 2004-02-10 User Name: 109197 Action: Update Canceled Comment: *** Restored Published Content *** Issue under review for possible seperate Infodoc Version: 0 Date: 2003-12-12 User Name: 77740 Action: Rejected Comment: Paul Attachments This solution has no attachment |
||||||||||||
|