Document Audience: | INTERNAL |
Document ID: | I0841-1 |
Title: | Sun Fire (3800/4800/4810/6800) Servers with very large storage configurations or large driver .conf files may encounter panics or hangs during bootup |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2002-06-20 |
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0841-1
Synopsis: Sun Fire (3800/4800/4810/6800) Servers with very large storage configurations or large driver .conf files may encounter panics or hangs during bootupCreate Date: Jun/20/02
Keywords:
Sun Fire (3800/4800/4810/6800) Servers with very large storage configurations or large driver .conf files may encounter panics or hangs during bootup
SunAlert: Yes
Top FIN/FCO Report: Yes
Products Reference: Sun Fire 3800/4800/4810/6800
Product Category: Server / Service
Product Affected:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- S8 - Sun Fire 3800 -
- S12 - Sun Fire 4800 -
- S12i - Sun Fire 4810 -
- S24 - Sun Fire 6800 -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- - - - -
Parts Affected:
Part Number Description Model
----------- ----------- -----
- - -
References:
BugId: 4660795 - OBP virtual-memory translation buffer for Solaris
can truncate the list
ESC: 535993 - 6800 panic on boot with Hitachi 9960 disk.
535960 - F/G+/ system panic due to SD.conf file.
Sun Alert: 44348
Issue Description:
Sun Fire 3800/4800/4810/6800 servers may become unbootable due to the
inability to manage large numbers of virtual memory translations in the
OBP. When this occurs, the system may hang or panic while booting,
making the system unusable.
This issue can occur with any Sun Fire system with firmware 5.12.6 or
lower and a system configuration which requires a large amount of
translation table entries (TTE) early in the boot process. These will
generally be systems with large Storage Area Network (SAN) units with a
large number of LUNs in their storage configurations, systems with a
very large .conf (sd, ssd) file for the driver used at boot time, or in
some cases a moderately sized configuration with kernel memory auditing
enabled.
Typically, the bug will be encountered when the driver responsible for
controlling the boot device also controls a large number of other
devices in the storage configuration. This issue has been seen on
systems with large Storage Area Network (SAN) units with a large number
of LUNs.
The most common symptom is a "BAD TRAP: type=31" panic message since
the underlying cause of the panic is the use of an untranslatable
address. The rest of the panic message will vary depending on which
subsystem was executing when the bad pointer was referenced.
To determine the system firmware version:
From Solaris on one of the platform's domains:
/usr/platform/sun4u/sbin/prtdiag -v | grep OBP
Example output (showing a vulnerable system):
OBP 5.12.5 09/26/01 15:46
Or from the platform System Controller:
showboards -p proms
Example output (showing a vulnerable system):
Component Device Type Version Date Time
--------- ------ ---- ------- ---- ----
SSC0 ScApp 5.12.5 09/26/2001 15:51
SSC0 Info 5.12.5 09/26/2001 15:51
/N0/IB6 SBBC 0 iPOST 5.12.5 09/26/2001 15:47
/N0/IB6 SBBC 0 Info 5.12.5 09/26/2001 15:48
/N0/SB0 SBBC 0 POST 5.12.5 09/26/2001 15:47
/N0/SB0 SBBC 0 OBP 5.12.5 09/26/2001 15:47
/N0/SB0 SBBC 0 Info 5.12.5 09/26/2001 15:47
/N0/SB0 SBBC 1 POST 5.12.5 09/26/2001 15:47
/N0/SB0 SBBC 1 OBP 5.12.5 09/26/2001 15:47
/N0/SB0 SBBC 1 Info 5.12.5 09/26/2001 15:47
/N0/IB8 SBBC 0 iPOST 5.12.5 09/26/2001 15:47
/N0/IB8 SBBC 0 Info 5.12.5 09/26/2001 15:48
/N0/SB2 SBBC 0 POST 5.12.5 09/26/2001 15:47
/N0/SB2 SBBC 0 OBP 5.12.5 09/26/2001 15:47
/N0/SB2 SBBC 0 Info 5.12.5 09/26/2001 15:47
/N0/SB2 SBBC 1 POST 5.12.5 09/26/2001 15:47
/N0/SB2 SBBC 1 OBP 5.12.5 09/26/2001 15:47
/N0/SB2 SBBC 1 Info 5.12.5 09/26/2001 15:47
/N0/SB4 SBBC 0 POST 5.12.5 09/26/2001 15:47
/N0/SB4 SBBC 0 OBP 5.12.5 09/26/2001 15:47
/N0/SB4 SBBC 0 Info 5.12.5 09/26/2001 15:47
/N0/SB4 SBBC 1 POST 5.12.5 09/26/2001 15:47
/N0/SB4 SBBC 1 OBP 5.12.5 09/26/2001 15:47
/N0/SB4 SBBC 1 Info 5.12.5 09/26/2001 15:47
An affected system may hang or panic at boot time. If the system panics,
a typical stack trace will look like:
die(31,10407710,31002089000,0,3,c4488003) + 4
[savfp=0x10406c11,savpc=0x1002b584]
trap(31002088000,1,6,0,10407710,0) + 8dc
[savfp=0x10406d51,savpc=0x10019ee8]
+ 640
prom_rtt(48,1044da08,10,2000,2000,c8)
[savfp=0x10406fb1,savpc=0x1002f654]
page_freelist_coalesce(1044f224,10052ad0,0,1043c328,0,1043c328) + c
[savfp=0x10407071,savpc=0x100295d4]
startup_vm(0,10450de8,0,2000,2000,0) + 1cc
[savfp=0x10407171,savpc=0x1002820c]
startup(7d,edd00028,40,183e000,2000,ffffffffffffffff) + 2c
[savfp=0x10407221,savpc=0x100a8578]
main(1041d800,2000,10407ec0,10408030,fff2,10050df0) + 4
[savfp=0x104072f1,savpc=0x10006fa0]
_start(10006e38,1044ecd0,1044ecd0,1044ecd0,1049d8f8
Note that this stack trace is representative, and a specific failure
may not result in the exact same stack trace, depending on how and
where the dropped TTE is missed.
The OBP uses a statically sized memory buffer to pass a list of Memory
Management Unit (MMU) translation table entries to Solaris during the
boot process. If a driver required during the boot process has a
sufficiently large .conf file, OBP may run out of space in the static
buffer, and will then silently drop any remaining entries in the list.
Solaris will then panic and/or hang during boot when it attempts to
reference a virtual address that, from Solaris' point of view, has no
TTE available.
The problem will be fixed in firmware release 5.13.0 and later. Until
this patch is available, follow the workaround provided below.
Implementation:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
Corrective Action:
The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned problem.
Use this workaround until firmware version 5.13.0 becomes available:
1. Examine the customer's driver .conf files and determine if they
can be trimmed, to reduce the memory requirements of the driver.
Typically, the sd or ssd drivers will be the most useful to examine.
This may be accomplished by removing any unneeded entries from the
problem file.
2. If it is not possible to reduce the size of the .conf file due to
the customer's system configuration, reconfigure the system to boot
from a different type of device (for instance, if the system is
booting from an ssd device, reconfigure the system to boot from an
sd device).
Comments:
None
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------