Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1344584.1
Update Date:2011-11-10
Keywords:

Solution Type  Technical Instruction Sure

Solution  1344584.1 :   How to Calculate the Correct Failing Memory Address from EDAC on AMD Based Systems.  


Related Items
  • Sun Blade 8000 System
  •  
Related Categories
  • PLA-Support>Sun Systems>x64>Server>SN-x64: MISC-SERVER
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>x64 Servers
  •  


Sometimes it is necessary to obtain correct address from Linux

Created from <SR 3-4091122191>

Applies to:

Sun Blade 8000 System - Version: Not Applicable and later   [Release: N/A and later ]
Information in this document applies to any platform.

Goal

How to obtain a failing address from the following output to use with the "herd -e" command for systems running Linux:

Jul 11 10:52:41 hostname kernel: EDAC MC3: CE page 0xc8c996, offset 0xa48, grain 8, syndrome 0x8088, row 2, channel 0, label "": k8_edac
Jul 11 10:52:41 hostname kernel: EDAC MC3: CE - no information available: k8_edac Error Overflow set
Jul 11 10:52:41 hostname kernel: EDAC k8 MC3: extended error code: ECC chipkill x4 error
Jul 11 10:52:42 hostname kernel: EDAC k8 MC3: general bus error: participating processor(local node response), time-out(no timeout)
memory transaction type(generic read), mem or i/o(mem access), cache level(generic)
Jul 11 10:52:42 hostname kernel: EDAC MC3: CE page 0x1062b94, offset 0x0, grain 8, syndrome 0xe0ed, row 2, channel 0, label "": k8_edac
Jul 11 10:52:42 hostname kernel: EDAC MC3: CE - no information available: k8_edac Error Overflow set
Jul 11 10:52:42 hostname kernel: EDAC k8 MC3: extended error code: ECC chipkill x4 error
Jul 11 10:52:43 hostname kernel: EDAC k8 MC3: general bus error: participating processor(local node response), time-out(no timeout)
memory transaction type(generic read), mem or i/o(mem access), cache level(generic)
Jul 11 10:52:43 hostname kernel: EDAC MC3: CE page 0x107dc95, offset 0x40, grain 8, syndrome 0x2021, row 2, channel 0, label "": k8_edac
Jul 11 10:52:43 hostname kernel: EDAC k8 MC3: extended error code: ECC chipkill x4 error





Solution


The address is given as a 4k page number plus offset.

To obtain failing address to use with "herd -e" command concatenate offset to page number.

Note however EDAC does not add trailing zeroes to page offset so that must be done before the two values are concatenated. The page offset must be a 3 hex digit number before it is concatenated with the page number to form the failing address.

See examples below:
Jul 11 10:52:41 hostname kernel: EDAC MC3: CE page 0xc8c996, offset 0xa48

  Failing address = 0xc8c996 + 0xa48 = 0xc8c996a48

Jul 11 10:52:42 hostname kernel: EDAC MC3: CE page 0x1062b94, offset 0x0

  Failing address = 0x1062b94  + Ox000  =  0x1062b94000

Jul 11 10:52:43 hostname kernel: EDAC MC3: CE page 0x107dc95, offset 0x40

  Failing address = 0x107dc95 + 0x040 = 0x107dc95040
  Failing address = 0x1062b94 + 0x040 = 0x1062b94040

The above resultant failing addresses can be used with the "herd -e" command to identify a failing DIMM pair.



Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback