Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1369835.1
Update Date:2012-02-09
Keywords:

Solution Type  Sun Alert Sure

Solution  1369835.1 :   Solaris 10 SPARC Kernel Patch 137137-09 May Cause Erroneous PCIEX-8000-KP Reports During PCIE Correctable Events  


Related Items
  • Solaris SPARC Operating System
  •  
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
  • Sun Hardware - Generic
  •  
  • Oracle Solaris Express
  •  
  • Sun SPARC Enterprise M3000 Server
  •  
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • .Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
  Description
  Likelihood of Occurrence
  Possible Symptoms
  Workaround or Resolution
  Patches
  Modification History
  References


Applies to:

Sun Microsystems > Operating Systems > Solaris Operating System
Oracle Solaris Express - Version: 2010.11 to 2010.11   [Release: 11.0 to 11.0]
Solaris SPARC Operating System - Version: 10 10/09 U8 to 10 10/09 U8   [Release: 10.0 to 10.0]
Sun SPARC Enterprise M3000 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Enterprise M4000 Server - Version: Not Applicable and later    [Release: N/A and later]
Information in this document applies to any platform.
_____________________



Date of Resolved Release: 21-Oct-2011
____________________________________

Description


An issue with the Fault Management Architecture (FMA) in Solaris 10 SPARC kernel patch 137137-09 and certain Solaris 11 Express builds, may cause erroneous PCIEX-8000-KP reports during PCIE correctable events. These erroneous reports may result in unnecessary hardware replacements.

Likelihood of Occurrence


This issue can occur in the following releases:

SPARC Platform
  • Solaris 10 with patch 137137-09 and without patch 147705-01
  • Solaris 11 Express based upon builds snv_87 through snv_170
Note 1: All SPARC platforms with PCI-E I/O Expansion Slots are impacted by this issue.

Note 2: Solaris 8, Solaris 9, and Solaris on the x86 platform are not impacted by this issue.

Note 3: Solaris 11 Express distributions may include additional bug fixes above and beyond the build from which it was derived. The base build can be derived as follows:
   $ uname -v
snv_151

If the output is of the format 151.x.x.x, then the build installed is snv_151.


Possible Symptoms


When patch 137137-09 is installed, or a system is upgraded to a release that includes this patch or to an affected Solaris 11 Express build, FMA may report correctable errors not previously observed on the system. Eventually suspect devices may be reported faulty if Soft Error Rate Discrimination (SERD) thresholds are exceeded.

If the described issue occurs, the following message will be seen on the system console:
    SUNW-MSG-ID: PCIEX-8000-KP, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Mar 29 21:03 PDT 2011
PLATFORM: SUNW,SPARC-Enterprise , CSN: -, HOSTNAME: -
SOURCE: eft, REV: 1.16
EVENT-ID: af46a1fb-a712-617b-cab3-fc57b79a1dd9
DESC: Too many recovered bus errors have been detected, which indicates a problem with the specified bus
or with the specified transmitting device. This may degrade into an unrecoverable fault.

Refer to http://sun.com/msg/PCIEX-8000-KP for more information.

AUTO-RESPONSE: One or more device instances may be disabled

IMPACT: Loss of services provided by the device instances associated with this fault

REQ-ACTION: If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure
to replace the affected device.

Use fmadm(1M) faulty to identify the device or contact Oracle for support.

    # fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 29 21:22:03 af46a1fb-a712-617b-cab3-fc57b79a1dd9 PCIEX-8000-KP Major

Host : xyz1
Platform : SUNW,SPARC-Enterprise Chassis_id : xyz2400L

Fault class : fault.io.pciex.device-interr-corr max 15%
fault.io.pciex.bus-linkerr-corr max 8%
Affects : dev:////pci@12,600000/network@0,3
dev:////pci@12,600000/network@0
dev:////pci@12,600000/network@0,1
dev:////pci@12,600000/network@0,2
dev:////pci@12,600000
faulted but still in service

FRU : "iou#1-pci#3" (hc://:product-id=SUNW,SPARC-Enterprise:chassis-id=xyz2400L:server-id=xyz1305/chassis=0/ioboard=1/hostbridge=1/pciexrc=0
/pciexbus=2/pciexdev=0) max 15%
"iou#1-pci#3" (hc:///component=iou#1-pci#3) 8%
faulty

Description: Too many recovered bus errors have been detected, which indicates
a problem with the specified bus or with the specified transmitting device. This may degrade into an unrecoverable fault.
Refer to http://sun.com/msg/PCIEX-8000-KP for more information.


Response: One or more device instances may be disabled.

Impact: Loss of services provided by the device instances associated with this fault

Action: If a plug-in card is involved check for badly-seated cards or
bent pins. Otherwise schedule a repair procedure to replace the
affected device. Use fmadm(1M) faulty to identify the device or contact Oracle for support.

Then execute the fmstat(1M) command to determine if a SERD threshold has been exceeded.

Note: The output seen when encountering this issue will vary depending upon the patch level and affected SERD threshold as follows:

For patch 142909-17 or patch 147440-01:

    # fmstat -s -m eft
NAME >N T CNT DELTA STAT
serd.io.device.nonfatal_bdllp@... >6 2h 3 ...
serd.io.pciex.corrlink-bus_bdllp@... >6 2h 3 ...

or
serd.io.device.nonfatal_btlp@... >6 2h 3 ...
serd.io.pciex.corrlink-bus_btlp@ >6 2h 3 ...

or

serd.io.device.nonfatal_re@... >6 2h 3 ...
serd.io.pciex.corrlink-bus_re@... >6 2h 3 ...

For patch 141444-09:


# fmstat -s -m eft
NAME >N T CNT DELTA STAT
serd.io.device.nonfatal_corr@.. >6 2h 3 ...
serd.io.pciex.corrlink-bus@... >6 2h 3 ...

For patch 137137-09 or patch 139555-08:

    # fmstat -s -m eft
NAME >N T CNT DELTA STAT
serd.io.pciex.corrlink@.. >6 2h 3 ...

Workaround or Resolution


There is no workaround for this issue.

This issue is resolved in the following releases:

SPARC Platform
  • Solaris 10 with patch 147705-01 or later
  • Solaris 11 Express based upon builds snv_171 or later
Note:  After installing the Solaris 10 patch, PCIEX-8000-KP faults should be cleared using the fmadm(1M) command.
    # fmadm repaired <EVENT-ID >
where the event-id is obtained from the output from the "fmadm faulty" command as shown in the symptoms section above.

Patches

<SUNPATCH 147705-01>

Modification History

21-Oct-2011: Date of Resolved Release
03-Nov-2011: Updated product field to include version for Hot Topics
09-Feb-2012: Updated to include specific Product attribution

Internal Notes:

This regression was caused by the putback for CR 6510830.
This was taken into patch 137137-02, but the only revision of this patch available to customers is 137137-09.

In Solaris 11 Express 2010.11 this issue is resolved in SRU12. The SRU installed on a customer system may be determined by running the following command:
    # pkg info entire | grep Summary

Summary: entire incorporation including Support Repository
Update (Oracle Solaris 11 Express 2010.11 SRU 11). (....)

Please send technical questions to the following email:
[email protected]
and copy the Responsible Engineer/Contributor listed below.

Internal Contributor/Submitter: [email protected]
Internal Eng Responsible Engineer: [email protected]
Internal Services Knowledge Engineer: [email protected]
Internal Eng Business Unit Group: Systems RPE
Internal Escalation ID: 3-3313205201 3-3313205201 3-3842072191
Internal Resolution Patches: 147705-01

References

<SUNPATCH 147705-01>
<SUNBUG 7051331>

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback