Asset ID: |
1-75-1009927.1 |
Update Date: | 2011-02-25 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1009927.1
:
Sun SPARC(R) Enterprise Mx000 (OPL) Servers : Configuration Errors
Related Items |
- Sun SPARC Enterprise M9000-32 Server
- Sun SPARC Enterprise M5000 Server
- Sun SPARC Enterprise M9000-64 Server
- Sun SPARC Enterprise M4000 Server
- Sun SPARC Enterprise M8000 Server
|
Related Categories |
- GCS>Sun Microsystems>Servers>OPL Servers
|
PreviouslyPublishedAs
213609
Applies to:
Sun SPARC Enterprise M4000 Server
Sun SPARC Enterprise M5000 Server
Sun SPARC Enterprise M8000 Server
Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A]
Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A]
All Platforms
Purpose
This document is aimed at helping users identify possible issues with their platform when configuration errors are detected.
Last Review Date
August 10, 2010
Instructions for the Reader
A Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting Details
Minimal System Config:
- M4000
- CPU in Slot 0
- Memory in Slot 0
- M5000 - With a single IOU
- IOU in Slot 0
- CPU in Slot 0
- Memory in Slot 0
- M5000 - With both IOUs, IOU1 is required to access internal disks 2 and 3
- CPU in Slot 0 and 2
- Memory in Slot 0 and 4
- M8000/9000
- Two CPUs (must be in positions 0 and 1).
- All Memory in Group A (16 DIMMs).
Examples of simple configuration mistakes:
Steps to FollowNo Memory or CPU in a Physical System Board
In order to use the IO in a Physical System Board there must also be CPU and Memory installed.
In this example the customer has an M5000 with two CPU and Memory boards and both IOUs installed. The customer reported that they were unable to see the PCI cards installed in IOU1 from either the ok prompt or Solaris.
XSCF> showstatus
No failures found in System Initialization.
XSCF> showhardconf -u
SPARC Enterprise M5000 COL2-FF2; Memory_Size:32 GB;
+-----------------------------------+------------+
| FRU | Quantity |
+-----------------------------------+------------+
| MBU_B | 1 |
| CPUM | 2 | << Two CPU Boards
| Freq:2.150 GHz; | ( 4) |
| MEMB | 2 | << Two Memory Boards
| MEM | 16 |
| Type:2B; Size:2 GB; | ( 16) | << 16 2Gig DIMMs
| DDC_A | 4 |
| DDC_B | 2 |
| IOU | 2 | << Two IO Boards
| DDC_A | 2 |
| DDC_B | 2 |
| DDCR | 2 |
| XSCFU | 1 |
| OPNL | 1 |
| PSU | 4 |
| FANBP_C | 1 |
| FAN_A | 4 |
+-----------------------------------+------------+
However, when looking in more detail at the showhardconf output we can see that the CPUs and Memory are in Slots 0 and 1. To access all the IO required CPU/Memory in Slot 1 to be moved to CPU Slot 2 and Memory Slot 4.
XSCF> showhardconf
SPARC Enterprise M5000 COL2-FF2;
+ Serial:BCF072503H; Operator_Panel_Switch:Service;
+ Power_Supply_System:Dual; SCF-ID:XSCF#0;
+ System_Power:On;
Domain#0 Domain_Status:Running;
MBU_B Status:Normal; Ver:0201h; Serial:BF07210VFK ;
+ FRU-Part-Number:CF00501-7670 02 /501-7670-02 ;
+ Memory_Size:32 GB;
CPUM#0-CHIP#0 Status:Normal; Ver:0201h; Serial:PP0647H909 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#0-CHIP#1 Status:Normal; Ver:0201h; Serial:PP0647H909 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#0 Status:Normal; Ver:0201h; Serial:PP071202C2 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#1 Status:Normal; Ver:0201h; Serial:PP071202C2 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
MEMB#0 Status:Normal; Ver:0101h; Serial:BF072311X7 ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-5d0015e2;
+ Type:2B; Size:2 GB;
MEM#0B Status:Normal;
.... removing the rest of the DIMM info
MEMB#1 Status:Normal; Ver:0101h; Serial:BF072311ML ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-5d0015da;
+ Type:2B; Size:2 GB;
MEM#0B Status:Normal;
.... removing the rest of the DIMM info
DDC_A#0 Status:Normal;
DDC_A#1 Status:Normal;
DDC_A#2 Status:Normal;
DDC_A#3 Status:Normal;
DDC_B#0 Status:Normal;
DDC_B#1 Status:Normal;
IOU#0 Status:Normal; Ver:0101h; Serial:BF072412BC ;
+ FRU-Part-Number:CF00541-2240 02 /541-2240-02 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
PCI#1 Name_Property:SUNW,qlc;
PCI#2 Name_Property:SUNW,qlc;
IOU#1 Status:Normal; Ver:0101h; Serial:BF072518HF ;
+ FRU-Part-Number:CF00541-2240 02 /541-2240-02 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
XSCFU Status:Normal,Active; Ver:0101h; Serial:BF07140FBU ;
+ FRU-Part-Number:CF00501-7672 02 /501-7672-02 ;
... cut the rest of the output
A Physical System Board must have at least memory and CPU to be functional.
In this example an M5000 is fully populated with CPU, however, the memory had been installed in PSB0 Slots 0 and 1. As a result the CPUs in PSB1 are Deconfigured. To use all four CPU Boards the two Memory should have been installed in Slots 1 and 4.
XSCF> showstatus
MBU_B Status:Normal;
* CPUM#2-CHIP#0 Status:Deconfigured;
* CPUM#2-CHIP#1 Status:Deconfigured;
* CPUM#3-CHIP#0 Status:Deconfigured;
* CPUM#3-CHIP#1 Status:Deconfigured;
XSCF> showhardconf
SPARC Enterprise M5000 COL2-FF2;
+ Serial:BCF0726048; Operator_Panel_Switch:Locked;
+ Power_Supply_System:Single; SCF-ID:XSCF#0;
+ System_Power:Off;
Domain#0 Domain_Status:Powered Off;
MBU_B Status:Normal; Ver:0201h; Serial:BF07140EPK ;
+ FRU-Part-Number:CF00501-7670 02 /501-7670-02 ;
+ Memory_Size:32 GB;
CPUM#0-CHIP#0 Status:Normal; Ver:0201h; Serial:PP072300VA ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#0-CHIP#1 Status:Normal; Ver:0201h; Serial:PP072300VA ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#0 Status:Normal; Ver:0201h; Serial:PP0705017M ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#1 Status:Normal; Ver:0201h; Serial:PP0705017M ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#2-CHIP#0 Status:Deconfigured; Ver:0201h; Serial:PP06533939 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#2-CHIP#1 Status:Deconfigured; Ver:0201h; Serial:PP06533939 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#3-CHIP#0 Status:Deconfigured; Ver:0201h; Serial:PP06533940 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#3-CHIP#1 Status:Deconfigured; Ver:0201h; Serial:PP06533940 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
MEMB#0 Status:Normal; Ver:0101h; Serial:BF072311NE ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-d31f0646;
+ Type:2B; Size:2 GB;
... delete the other seven DIMMs
MEMB#1 Status:Normal; Ver:0101h; Serial:BF072311NW ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-d31f0673;
+ Type:2B; Size:2 GB;
... removing the other seven DIMMs
DDC_A#0 Status:Normal;
DDC_A#1 Status:Normal;
DDC_A#2 Status:Normal;
DDC_A#3 Status:Normal;
DDC_B#0 Status:Normal;
DDC_B#1 Status:Normal;
IOU#0 Status:Normal; Ver:0101h; Serial:BF072412CR ;
+ FRU-Part-Number:CF00541-2240 02 /541-2240-02 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
... Removing the rest of the output.
On the M4000 and M5000 the only supported configs are 1, 2 and 4 Memory Boards in a PSB. In this M5000 example six Memory boards have been spread across the two PSBs all the memory DIMMs from the third Memory board are shown as deconfigured.
XSCF> showstatus
MBU_B Status:Normal;
MEMB#2 Status:Normal;
* MEM#0A Status:Deconfigured;
* MEM#0B Status:Deconfigured;
* MEM#1A Status:Deconfigured;
* MEM#1B Status:Deconfigured;
* MEM#2A Status:Deconfigured;
* MEM#2B Status:Deconfigured;
* MEM#3A Status:Deconfigured;
* MEM#3B Status:Deconfigured;
MEMB#6 Status:Normal;
* MEM#0A Status:Deconfigured;
* MEM#0B Status:Deconfigured;
* MEM#1A Status:Deconfigured;
* MEM#1B Status:Deconfigured;
* MEM#2A Status:Deconfigured;
* MEM#2B Status:Deconfigured;
* MEM#3A Status:Deconfigured;
* MEM#3B Status:Deconfigured;
XSCF> showhardconf -u
SPARC Enterprise M5000 M5000; Memory_Size:48 GB;
+-----------------------------------+------------+
| FRU | Quantity |
+-----------------------------------+------------+
| MBU_B | 1 |
| CPUM | 4 |
| Freq:2.150 GHz; | ( 8) |
| MEMB | 6 |
| MEM | 48 |
| Type:1A; Size:1 GB; | ( 48) |
| DDC_A | 4 |
| DDC_B | 2 |
| IOU | 2 |
| DDC_A | 2 |
| DDC_B | 2 |
| DDCR | 2 |
| XSCFU | 1 |
| OPNL | 1 |
| PSU | 4 |
| FANBP_C | 1 |
| FAN_A | 4 |
+-----------------------------------+------------+
Configuration Error reported after using "setupfru"
When configuring domain the "configuration error was detected" error message may appear when there is a hardware issue with the machine.
XSCF> setupfru -x 1 sb 0
Operation has completed. However, a configuration error was detected.
Looking into the issue we see that there are no error reports, and the status is reported as normal.
XSCF> showstatus
No failures found in System Initialization.
XSCF> showlogs error
XSCF>
Using `showhardconf` we can see that in this case the issue is there is no memory in the platform.
XSCF> showhardconf -u
SPARC Enterprise M5000 COL2-FF2; Memory_Size:0 GB;
+-----------------------------------+------------+
| FRU | Quantity |
+-----------------------------------+------------+
| MBU_B | 1 |
| CPUM | 1 |
| Freq:2.150 GHz; | ( 2) |
| MEMB | 1 |
| DDC_A | 4 |
| DDC_B | 2 |
| IOU | 1 |
| DDC_A | 1 |
| DDC_B | 1 |
| DDCR | 1 |
| XSCFU | 1 |
| OPNL | 1 |
| PSU | 2 |
| FANBP_C | 1 |
| FAN_A | 4 |
+-----------------------------------+------------+
Checking `showlogs event` reports any configuration issues.
XSCF> showlogs event
May 17 00:16:10 PDT 2007 no CPU on XSB#00-0
May 17 00:16:10 PDT 2007 no MEM on XSB#00-0
May 17 00:16:10 PDT 2007 no CPU on XSB#01-0
May 17 00:16:10 PDT 2007 no MEM on XSB#01-0
May 17 00:32:09 PDT 2007 no MEM on XSB#00-0
May 17 00:33:52 PDT 2007 no MEM on XSB#00-0
Configuration Error reported after using "setupfru"
When configuring domain the "configuration error was detected" error message may appear when there is a hardware issue with
the machine.
XSCF> setupfru -x 4 sb 1
Operation has completed. However, a configuration error was detected.
Looking into the issue we see that there are no error reports, and the status is reported as normal.
XSCF> showstatus
No failures found in System Initialization.
XSCF> showlogs error
Using `showhardconf` we can see that, in this case, CMU#1 is a 2 CPUs (slots 0 and 1) CMU with the group A populated with 16 DIMMs. Configuring this CMU in quad mode causes 8 DIMMs to have no associated CPUs (slots 2 and 3)
CMU#1 Status:Normal; Ver:0101h; Serial:PP074802GW ;
+ FRU-Part-Number:CA06620-D002 C1 /371-2214-03 ;
+ Memory_Size:16 GB;
CPUM#0-CHIP#0 Status:Normal; Ver:0801h; Serial:PP091400BD ;
+ FRU-Part-Number:CA06620-D044 B1 /375-3580-02 ;
+ Freq:2.520 GHz; Type:32;
+ Core:4; Strand:2;
CPUM#1-CHIP#0 Status:Normal; Ver:0801h; Serial:PP091302J4 ;
+ FRU-Part-Number:CA06620-D044 B1 /375-3580-02 ;
+ Freq:2.520 GHz; Type:32;
+ Core:4; Strand:2;
MEM#00A Status:Normal;
+ Code:ce0000000000000001M3 93T2950EZA-CE6 4145-45569c2b;
+ Type:1A; Size:1 GB;
[...]
MEM#33A Status:Normal;
+ Code:ce0000000000000001M3 93T2950EZA-CE6 4145-4754f6c3;
+ Type:1A; Size:1 GB;
Checking `showlogs event` reports any configuration issues.
Jun 19 22:10:41 KST 2009 SB configuration changed (quad-XSB mode)
Jun 19 22:10:44 KST 2009 no CPU on XSB#01-2
Jun 19 22:10:45 KST 2009 no CPU on XSB#01-3
and `showboards` reports the quad-XSB as "unmount"
XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD
---- - -------- ----------- ---- ---- ---- ------- -------- ----
01-0 SP Available y n n Passed Normal n
01-1 SP Available y n n Passed Normal n
01-2 SP Unavailable y n n Unmount Normal n
01-3 SP Unavailable y n n Unmount Normal n
Previously Published As 89561
Attachments
This solution has no attachment