Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000878.1
Update Date:2011-03-07
Keywords:

Solution Type  Sun Alert Sure

Solution  1000878.1 :   Multi-bit ECC Errors Can Disable the ECC Interrupt on Sun StorEdge 6920  


Related Items
  • Sun Storage 6920 System
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Data Loss
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
201159


Product
Sun StorageTek 6920 System

Bug Id
<SUNBUG: 6254606>

Date of Resolved Release
15-JUL-2005

Impact

On a Sun StorEdge 6920 array, firmware on the Storage Resource Card (SRC) contains code that may incorrectly disable interrupt handlers in the case of a specific sequence of multi-bit memory errors. This could allow incorrect data to be written to permanent storage.


Contributing Factors

This issue can occur on the following platform:

  • Sun StorEdge 6920 Arrays with System Processor (SP) version 2.0.4 and earlier

The system processor version can be determined from "/etc/release" with the following command (run on the SP):

    [sp0]# cat /etc/release
    Solaris 9 s9_58shwpl3 SPARC
    ...
              Sun StorEdge(tm) 6920
                 Version: 2.0.2
            Build Date: July  2, 2004
    ...

Symptoms

If StorADE generates the following error message, this is an indication that the SRC card may not be working properly and/or writes to memory may fail:

    LOG_ALERT (POST: 0-0)  SRC2 slot:3, CPU:1, Sp Local Memory
    Detected Multi-Bit ECC Error: Disabling Interrupt BEWARE

Workaround

There is no workaround. Please see the Resolution section.


Resolution

This issue is addressed on the following platform:

  • Sun StorEdge 6920 Arrays with System Processor (SP) version 2.0.5 and later

Notes:

  1. An immediate upgrade for Sun StorEdge 6920 systems to SP version 2.0.5 should be scheduled in order to avoid this issue.
  2. StorADE is used to upgrade all 6920 components to version 2.0.5. For more information on all 6920 procedures, please see the 2.0.5 Release Notes at http://www.sun.com/products-n-solutions/hardware/docs/pdf/817-5229-14.pdf.


Previously Published As
101790
Internal Comments


Normally, when handling a multi-bit ECC error, the storage processor experiencing the problem is reset.



In rare circumstances, if a particular multi-bit ECC error occurs, the SE6920 disables the interrupt handler. Subsequent reads or writes to defective memory may result in incorrect data being written to permanent storage because the interrupt handler will no longer reset the DSP when multi-bit errors are detected.


Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Escalation ID
1-8002404

Internal Sun Alert Kasp Legacy ID
101790

Internal Sun Alert & FAB Admin Info
Critical Category: Data Loss
Significant Change Date: 2005-07-15
Avoidance: Upgrade
Responsible Manager: [email protected]
Original Admin Info: [WF 18-Jul-2005, Dave M; correction to BugID]
[WF 15-Jul-2005, Dave M; sending for release]
[WF 14-Jul-2005 Dave M; all revisions final, send for review]
[WF 24-Jun-2005 Dave M; draft created]

Product_uuid
67794720-356d-11d7-8ef2-ce2ac2bc9136|Sun StorageTek 6920 System

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback