Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Troubleshooting Sure Solution 1321278.1 : Sun Enterprise[TM] 10000: Troubleshooting Domain Panics
In this Document
Applies to:Sun Enterprise 10000 Server - Version: Not Applicable to Not Applicable - Release: N/A to N/AInformation in this document applies to any platform. PurposeThis document provides troubleshooting information for various panics commonly seen on E10000 domains.Last Review DateMay 11, 2011Instructions for the ReaderA Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting DetailsCannot allocate IOMMU TSB arraysSymptom: Boot device: /sbus@40,0/SUNW,qfe@0,8c00000 File and args:Resolution: The system is trying to boot Solaris 7, but Solaris 2.6 is
specified in the domain_config file. Correct the domain_config file.
Fast Data Access MMU MissSymptom: KERNEL dropped into OBP due to following trap at trap level = 1Resolution:
There are some likely possibilities: 1. inetboot file in /tftpboot on the SSP is incorrect. 2. 400/8MB processors involved and boot image does not have the latest kernel patch 3. dr-max-mem set too large or incorrectly on Solaris 2.5.1 4. A hardware problem. Run an hpost -l32 or hpost -l64 on the domain. bringup -D on can also done.
lock_set_spl: 70222069 lock held and only one CPUSymptom: Rebooting with command: boot net -vResolution: The CPUs in the domain being booted have an 8MB cache size, and the
patch level of Solaris being booted is choking on this. Use the OBP
command limit-ecache-size.
munged memory listSymptom: Boot device: /sbus@64,0/SUNW,hme@0,8c00000 File and args: - installResolution: The system is trying to boot Solaris 2.6, but Solaris 2.5.1 is
specified in the domain_config file. Correct the domain_config file. Async data error at tl1Symptom: System panics with Async data error at tl1. Resolution: This is generally indicative of an E-cache parity error on a CPU. The SPARC Architecture Manual writes: An asynchronous data error occurred on a data access. Examples: an ECC error occurred while writing data from a cache store buffer to memory, or an ECC error occurred on an MMU hardware table walk. The panic string will report the failing CPU.
Replace the CPU reported.
Ecache SRAM Data Parity ErrorEcache Writeback Data Parity ErrorUE Error: Ecache Copyout on CPUyySymptom: System panics with one of the following Ecache SRAM Data Parity Error Ecache Writeback Data Parity Error UE Error: Ecache Copyout on CPUyy Resolution:
These are E-cache parity error panics caused by a CPU. Click here for details on which CPU needs replacement.
kstat_q_exit: qlen == 0Symptom: System panics with kstat_q_exit: qlen == 0. Resolution: Check if EMC disk is attached to the domain. It is possible for Solaris to overflow EMC's queues. EMC has a restriction on tag queue depth and suggests reducing the default sd throttle. To reduce the throttle, add set sd:sd_max_throttle=20 in /etc/system. Attachments This solution has no attachment |
||||||||||||
|