Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1019337.1 : Introduction to cache-line retirement feature for Ultrasparc-IV+ (USIV+) processors
PreviouslyPublishedAs 238627 Description This document is intended to explain the Level 2/Level 3 (L2/L3) cache-line retirement feature available for US-IV+ CPU's. What is cache-line retirement? Cache-line retirement is an enhancement to Solaris that allows for disabling of a CPU's cache-line. In a similar technique to MPR (Memory Page Retirement), cache-line retirement allows for retirement of a small piece of L2/L3 cache, without interruption to the operating system. For Solaris[TM] 10 Operating System, this technology utilizes the new mem_cache driver to allow disabling of the cache index and cache way. For Solaris[TM] 9 Operating System, we use the Soft Error Rate Discriminator (SERD) engine to determine when to retire the cache-line. The cache index/way is similar to pages of main memory. This process uses the Diagnosis Engine or FMA to make a cache index/way “unavailable”, or in more familiar terms – retirement of that cache-line. Steps to Follow Cache-line retirement details. What are the benefits of cache-line retirement? Currently, when FMA detects a correctable error within L2/L3 cache, it applies them to the SERD, and off-lines the processor. The cache-line retirement feature will result in the cache index and way being disabled rather then the entire processor. By only disabling cache index and way, the CPU will continue to function normally with no down time or performance impact. In the rare case the amount of retired cache-lines have exceeded the cpu's set threshold, the cpu will then be off-lined. Overall, cache-line retirement provides significant improvements in RAS features for USIV+ CPU's and it's associated cache. Cache-line retirement provides the final resolution for Sun Alert <Document: 1000495.1> . Availability Solaris 9 Kernel patch 122300-28 Solaris 10 Kernel patch 137111-02 Examples Example of when a cache-line is retired in Solaris 9 (from /var/adm/messages). This example was taken from a Sunfire V490 May 8 03:11:45 testmachine SUNW,UltraSPARC-IV+: [ID 711633 kern.notice] NOTICE: L2_CACHE_DATA: cpu 6: Retired cache index 4199 way 1 due to event at bit 30 May 8 03:11:45 testmachine No action required. Example of when a cpu is off-lined due to too many cache-line retirements in Solaris 9 (from /var/adm/messages). This example was taken from a Sunfire V490 May 11 14:40:18 testmachine SUNW,UltraSPARC-IV+: [ID 503843 kern.notice] NOTICE: L2_CACHE_TAG: cpu 0: Retiring CPU since we have already retired 3 ways at cache index 0x3e8 May 11 14:40:18 testmachine Recommended-Action: Service action required May 11 14:40:18 testmachine SUNW,UltraSPARC-IV+: [ID 123177 kern.notice] NOTICE: [AFT1] CPU0 offlined May 11 14:40:18 testmachine SUNW,UltraSPARC-IV+: [ID 307609 kern.notice] NOTICE: [AFT1] CPU16 offlined due to events detected by another CPU on the same chip Solaris 10 examples. The following is the SUNW-MSG-ID of cache-line faults, taken from fmdump output: http://sun.com/msg/SUN4U-8007-FQ http://sun.com/msg/SUN4U-8007-GC http://sun.com/msg/SUN4U-8007-HH http://sun.com/msg/SUN4U-8007-JD fault.cpu.ultraSPARC-IVplus.l2cachedata-line fault.cpu.ultraSPARC-IVplus.l3cachedata-line fault.cpu.ultraSPARC-IVplus.l2cachetag-line fault.cpu.ultraSPARC-IVplus.l3cachetag-line Product Sun Fire 6800 Server Sun Fire 4800 Server Sun Fire V1280 Server Sun Fire E2900 Server Sun Fire E4900 Server Sun Fire E6900 Server Sun Fire 15K Server Sun Fire 12K Server Sun Fire E20K Server Sun Fire E25K Server Sun Fire V490 Server Sun Fire V890 Server Sun Netra 1290 Server Internal Comments Internal section only Facts about cache-line retirement
CPU, error, Level 2, l2, level 3, l3, cache, correctable event, disabled, offline, enhancement, usiv+ Attachments This solution has no attachment |
||||||||||||
|