Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1495513.1
Update Date:2012-10-08
Keywords:

Solution Type  Problem Resolution Sure

Solution  1495513.1 :   Exadata: X4270M2 storage cell fails to boot and is seen as X4170M2  


Related Items
  • Sun Fire X4270 Server
  •  
  • Sun Fire X4270 M2 Server
  •  
  • Sun Fire X4170 Server
  •  
  • Sun Fire X4170 M2 Server
  •  
  • Sun Fire X4275 Server
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
Related Categories
  • PLA-Support>Database Technology>Engineered Systems>Oracle Exadata>DB: Exadata_EST
  •  




Created from <SR 3-6040265201>

Applies to:

Exadata Database Machine X2-2 Hardware - Version All Versions to All Versions [Release All Releases]
Sun Fire X4170 M2 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4170 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4270 M2 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4270 Server - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Symptoms

Storage cell will fail to boot.Does not detect any disks and ILOM output will indicate the X4270M2 (storage Cell) is a X4170M2 (Compute Node).

Cause

Collect ILOM snapshot and check FRU (Field Replacable Unit) output contained in ipmi directory/collection.The  Product name indicates X4170M2

FRU Device Description : MB (LUN 0, ID 4)
Board Product         : ASSY,MOTHERBOARD,X4170/X4270,M2
Board Serial          : 0328MSL-1144BA2FU6
Board Part Number     : 511-1213-07 
Board Extra           : 02
Board Extra           : X4170/X4270_M2
Product Manufacturer  : ORACLE CORPORATION  
Product Name          : SUN FIRE X4170 M2 SERVER 
Product Part Number   : 000-0000-00
Product Serial        : 0000000000
Product Extra         : 080020FFFFFFFFFFFFFF002128E87EAC


In addition NONE of the following devices are detected.


FRU Device Description : PS0 (LUN 0, ID 63)
 Device not present (Requested sensor, data, or record not found)

FRU Device Description : PS1 (LUN 0, ID 64)
 Device not present (Requested sensor, data, or record not found)

FRU Device Description : DBP (LUN 0, ID 210)
 Device not present (Requested sensor, data, or record not found)

FRU Device Description : PDB (LUN 0, ID 211)
 Device not present (Requested sensor, data, or record not found)

FRU Device Description : PADCRD (LUN 0, ID 222)
 Device not present (Requested sensor, data, or record not found)

FRU Device Description : FB (LUN 0, ID 212)
 Device not present (Requested sensor, data, or record not found)


This problem can be misleading as it can also be caused by a failing PDB (Powed Distribution Board) ,which will cause a X4170M2 to be reported as X4270M2 .See solution to determine if this is PDB or Disk Backplane.

Solution

 Connect to the Serial MGMT console and power cycle the System.The check the displayed output.

 

The following can be seen.

1. Check for Error reading Reg 0x6 from MAX7133 @ 0x44, Port 4

2. Now check the I2C Probe Tests and look for the  MAX7xxx device and Adress 0x44 on port 4 .It can be seen the only match is for the Disk Backplane test

 

 

U-Boot 1.1.4

Custom AST2100 U-Boot 3.0 (Dec 21 2010 - 17:10:15) r61398

DRAM:  119 MB
Flash bank 0 at 10000000 has 32MB in 256 sectors (chipSize 1<<25, ratio 1, bufSz 1024).
Flash: 32 MB
readonly: RO_K_SP=52 (LYNX_PLUS_AST2100)
readonly: RO_SP_PLATFORM=lynxplus
readonly: RO_IMAGE0_ADDR_HINT=0xa0000
readonly: RO_permenv_build=r61398 (Dec 21 2010)
VUART1 at port 0x03f8, SerIRQ[4] disabled
VUART2 at port 0x02f8, SerIRQ[3] disabled
Protecting U-Boot flash sectors; monitor_base=100a0000.
Error reading Reg 0x6 from MAX7133 @ 0x44, Port 4
Unable to select correct mux port
Unable to select correct mux port
Unable to select correct mux port
Unable to select correct mux port
Unable to select correct mux port
board_findGpioNum(): ERROR, 'BIOS_TOP_BLOCK_LOCK' does not match any pin.
board_findGpioNum(): ERROR, 'SP_PECI_ENABLE' does not match any pin.
H/W:   Lynxplus Service Processor; SOC: AST2100 Rev. 02 ('A3')
  PWC_SP_Broken_OD = 0;  ARM restart caused by: power-on
  The host is OFF(S5) (hostWantsPwr=0, powerGood=0,
        allowPwrOn=0|0, outOfReset=0, fatalError=0).
  Reset straps=0x8c819180, def. H-PLL=264 MHz, CPU/AHB=2:1, boot CS0# normal speed
  PCI w/VGA noVBIOS;  NOR 38ns/byte;  DRAM clock is M-PLL: 264 MHz (DDR2-528)
  DRAM: 128MB data - 8MB VGA, 32-bit noECC, 2 BA 10 CA, CL=4 BL=4 ap=1, 61440 us refr, DQSipv=0x2020202
Board Revision - 8d
Date: 2012-09-24 (Monday)    Time:  9:01:24
Reading FRUID...Valid CRC.
ethaddr=00:21:28:E7:11:94
eth1addr=00:21:28:E7:11:95
Net:   MAC1 PHY not ready faradaynic#0, faradaynic#1
Enter Diagnostics Mode ['q'uick/'n'ormal(default)/e'x'tended(manufacturing mode)] .....   0
Diagnostics Mode - NORMAL
<DIAGS> Memory Data Bus Test ... PASSED
<DIAGS> Memory Address Bus Test ... PASSED
I2C Probe Test - Motherboard
        Bus     Device                          Address Result
        ===     ============================    ======= ======
         2                 Sys FRUID (U3003)    0xA0    PASSED
         2                Power CPLD (U3301)    0x4E    PASSED
         2          CPU0 Fault LED's (U3001)    0x40    PASSED
         2          CPU1 Fault LED's (U3002)    0x42    PASSED
         2            PCA9555 (Misc) (U3005)    0x44    PASSED
         2                 DIMM IMAX (U3102)    0x12    PASSED
         6          Bank Panel Led's (U2701)    0xC6    PASSED
         6               DS1338(RTC) ( U803)    0xD0    PASSED
         6        Temp Sensor1(LM75) (U3011)    0x90    PASSED
         6        Temp Sensor2(LM75) (U3012)    0x92    PASSED
         6        Temp Sensor3(LM75) (U3010)    0x94    PASSED

I2C Probe Test - Chassis(2U HYDE24)
  PDB Board
        Bus     Device                          Address Result
        ===     ============================    ======= ======
         1               PCA9548 Mux (U0202)    0xE0    Start and Send Device Address can't get ACK back
I2C read: I/O error
FAILED
         1                 PDB FRUID (U0203)    0xAA    Start and Send Device Address can't get ACK back
I2C read: I/O error
FAILED
         1                   MAX7313 (U0201)    0x40    Start and Send Device Address can't get ACK back
I2C read: I/O error
FAILED
         1                   MAX7315 (U1001)    0x46    Start and Send Device Address can't get ACK back
I2C read: I/O error
FAILED

  Unified Fan Module
        BUS     Port    DEVICE                  Address Result
        ===     ====    ====================    ======= ======
        1        2      FT 0 FRUID   (U0203)    0xAC    FAILED
Unable to select correct mux port
Start and Send Device Address can't get ACK back
I2C read: I/O error

  24 Disk Backplane
        BUS     Port    DEVICE                  Address Result
        ===     ====    ====================    ======= ======
        1        4      BP MAX7313   (   U8)    0x44    FAILED
Unable to select correct mux port
Start and Send Device Address can't get ACK back
I2C read: I/O error

  LSI Daughter Card
        BUS     Port    DEVICE                  Address Result
        ===     ====    ====================    ======= ======
        1        4      EXP FRUID    (  U07)    0xA0    FAILED
Unable to select correct mux port
Start and Send Device Address can't get ACK back
I2C read: I/O error

  Connector Board
        BUS     Port    DEVICE                  Address Result
        ===     ====    ====================    ======= ======
        1        4      CONN FRUID    (  U05)   0xA2    FAILED
Unable to select correct mux port
Start and Send Device Address can't get ACK back
I2C read: I/O error

 

 

Therefore the failing device is the Disk Backplane and will require replacing.

 

This technique can also be used to check for other failing devices by matching the failing error address, device and port.

This event may also be seen on X4170/X4170M2 .They will correctly show their product type but will exhibit the error address 0x44

This event may also be seen on X4275 which is displayed as X4170 but also displays the 0x44 failing address.

References

<NOTE:461479.1> - RMAN Duplicate Database From RAC ASM To RAC ASM
<NOTE:452868.1> - RMAN 'Duplicate From Active Database' Feature in 11G
<NOTE:1274322.1> - Oracle Sun Database Machine X2-2/X2-8 High Availability Best Practices

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback