Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1492035.1
Update Date:2012-10-01
Keywords:

Solution Type  Problem Resolution Sure

Solution  1492035.1 :   Failed T1000 / T2000 DIMM  


Related Items
  • Sun Fire T2000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: Tx000
  •  
  • .Old GCS Categories>Support>KM>Content>Documentation
  •  


When a T2000 is set to Max Diagnostics with older firmware, DIMM's will show as false positive Failures

In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-6210603671>

Applies to:

Sun Fire T2000 Server - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.
sc> showsc
Advanced Lights Out Manager CMT v1.7.10

parameter value
--------- -----
if_network true
if_connection telnet
if_emailalerts false
if_snmp false
netsc_dhcp false
netsc_ipaddr 0.0.0.0
netsc_ipnetmask 255.255.255.0
netsc_ipgateway 0.0.0.0
mgt_mailhost
mgt_mailalert
mgt_snmptraps none
mgt_traphost
sc_customerinfo
sc_escapechars #.
sc_powerondelay false
sc_powerstatememory false
sc_clipasswdecho true
sc_cliprompt sc
sc_clitimeout 0
sc_clieventlevel 2
sc_backupuserdata true
sc_diag_mode enabled
diag_trigger power-on-reset error-reset
diag_verbosity normal
diag_level min <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< This is the one we need to have set as MIN NOT MAX
diag_mode normal
sys_autorunonerror false
sys_autorestart reset
sys_eventlevel 2
ser_baudrate 9600
ser_parity none
ser_stopbits 1
ser_data 8
netsc_enetaddr 00:14:4f:1e:0e:94
sys_enetaddr 00:14:4f:1e:0e:8c
sc>

Symptoms

Customer states that his system is constantly detecting failed DIMMs.

Changes

ALOM CMT Variable Settings for Reset Scenario or power cycle

Variable         Default                 Option

diag_mode     normal                 service
diag_level      min                      max
diag_trigger    power-on-reset     error-reset

 

Cause

Root Cause

On the T1000/T2000, when POST encounters a single CE, the associated DIMM is declared faulty and half of system's memory is deconfigured and unavailable for Solaris.

Since PSH (Predictive Self-Healing) is the primary means for detecting errors and diagnosing faults on the Niagara platforms, this policy is too aggressive.

Three ALOM CMT configuration variables, diag_mode, diag_level, and
diag_trigger, control whether the system runs firmware diagnostics in response
to system reset events. When the diag_level is set to MAX,

DIMM's will fail during POST, with the settings shown below.

diag_mode normal
diag_level max
diag_trigger power-on-reset

Solution

Check and insure that the diag_level is set to " min "

With the FW release 6.3.0 (or later), the default setting of diag_level is min.

diag_mode normal
diag_level min
diag_trigger power-on-reset

 

References

<NOTE:1001026.1> - Sun Fire T1000 and T2000 DIMMs with CEs (correctable errors) are being unnecessarily flagged by POST as faulty.

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback