Date of Workaround Release: 09-Dec-2009
Date of Resolved Release: 29-Oct-2010
_________________________________
Description
A cooling issue has been found to occur on certain systems where, over time, increased temperature can contribute to system panic or reset, causing loss of availability to applications or to the system as a whole.
Occurrence
This issue can occur on the following platforms:
SPARC Platform
- Sun Fire v1280/E2900
- Netra 1280/1290
without System Controller Application (ScApp) 5.20.14 patch ID 114527-15 and proper system maintenance.
Note: Proper system maintenance in this case refers to air filters that may not have been properly cleaned or serviced over the life of the product, contributing to increased system temperature and liklihood of this condition.
Symptoms
A typical error scenario may include one or more of the following error messages in the System Controller's (SC) log files ('showlogs -v' or from the console):
Path broken between CBH and SDC:SB#
Device voltage problem: /N0/SB#
Attempt to power up /N0/SB# failed
/N0/SB#, sensor status, outside acceptable limits
(where # is the board number).
Another signature for this error is a stream of messages like:
CpuSafariGroup.flushConsoleBuffer: No Board Power: SB0.sbbc0.sram.0
along with a low voltage error on the 3.3V supply on the affected board:
lom> showenvironment
Slot Device Sensor Value Units Age Status
------- ---------- ------------ ------ --------- ------- ------
...snip...
/N0/SB0 Board 0 1.5 VDC 0 1.50 Volts DC 60 sec OK
/N0/SB0 Board 0 3.3 VDC 0 0.47 Volts DC 60 sec *** ERROR LOW ***
The 3.3V sensor showing less than 1 volt is definitive for this issue.
Workaround
The following additional cooling actions may increase the life of the components over time, but will not eliminate the potential for component failure, as additional cooling may not have enough time to significantly alter the life expectancy of those components. Perform the following actions:
STEP 1: Obtain output from the following commands from each system to collect baseline temperature readings (from the lom prompt on the System Controller). Archive and retain this information for your records
lom> showhostname
hostname: system_name
lom> showdate
Mon Aug 31 22:14:55 CEST 2009
lom> showenv -ltuvw
System Controller Board
Slot Device Sensor Min LoWarn Value HiWarn Max Units Age Status
...
<OUTPUT TRUNCATED, BUT ALL COMPONENTS WILL BE LISTED>.
STEP 2: Improve ambient air temperature levels as much as possible for all Sun Fire E2900, V1280 and Netra 1280, 1290 systems by performing the following steps, if applicable:
- Position additional vented floor tiles or perform other ventilation changes to reduce ambient air temperature.
- Reposition the system to a cooler ambient air temperature environment by relocating to a different location, alter rack mount location, etc.
- Reduce ambient air temperature level via increased cooling if the environment can support this.
- Validate that any empty board slots have the proper filler panel installed to assure correct chassis airflow.
STEP 3: Clean, remove or replace filters:
For Sun Fire V1280, and E2900 systems:
- Remove the left input air filter to increase the chassis airflow. Note: filters that have not been cleaned previously may stick to the door.
- If 1950MHz CPUs are installed, the left air filter should already be removed.
- Page 1 of the Sun Fire E2900/V1280 and Netra 1280 Systems Filter Installation Guide provides details on removing the filter.
-Install System Controller Application (ScApp) Firmware 5.20.17 patch 114527-18 to increase chassis airflow through an increase in fan speed. After the firmware is installed, reboot the SC and keyswitch the domains as recommended in the firmware installation instructions (which effectively reboots the system).
For Netra 1280/1290 systems:
- Clean or replace the left input air filter. Note: filters that have not been cleaned previously may stick to the door.
- Air Filter kits are Sun order number X6806A-Z.
- Filters should be inspected and cleaned or replaced as necessary every 3-6 months as per the Periodic Maintenance instructions in the Service Manual for 1280 and for 1290 in section C-1.
- The Sun Fire E2900/V1280 and Netra 1280 Systems Filter Installation Guide provides details on replacing the filter.
- Install System Controller Application (ScApp) Firmware 5.20.17 patch 114527-18 to increase chassis airflow through an increase in fan speed. After the firmware is installed, reboot the SC and keyswitch the domains as recommended in the firmware installation instructions (which effectively reboots the system).
STEP 4: As close as possible to 24 hours later, repeat STEP 1 to obtain post-change temperature readings. If the system is significantly busier or less busy for this second reading than the first, temperature differences may also be significant. Try to take this reading when system state and data center environment is nearly identical to the baseline for most accurate measurement of improvement (ie; same time of day). Archive and retain this post-change information for your records for future comparison.
Filter removal (Sun Fire E2900 and v1280) and Replacement (Netra 1280 and 1290) should reduce average board temperatures (Individual server temperature results may vary). Archive and retain the "showenvironment" data in the event it needs to be referred to in the future to provide a benchmark of the impact of this procedure.
This issue is addressed in the following release:
- System Controller Application (ScApp) 5.20.14 patch ID 114527-15 or later (and proper system maintenance as described above)
Customers are advised to upgrade to the latest ScApp version when possible.
Patches
<SUNPATCH:114527-15>
History
Modification History
29-Oct-2010: No further Engineering activity, issue is Resolved
15-Jun-2011: Maintenance Update - no major changes to content
12-Oct-2011: Maintenance Update - no major changes to content
09-Dec-2011: Maintenance Update - no major changes to content
20-Jan-2012: Corrected ScApp patch/version to 5.20.14
16-May-2012: Update Symptoms section for additional info
Support Personnel: See Field Action Bulletin(FAB) 270169 regarding system
specific information for Sun Fire E2900/V1280 Systems and Netra
1280/1290 Systems in <Document:1021064.1>
See also (Internal only) <Document:1354590.1>
For additional questions regarding this issue, email the
[email protected] email alias. This is an internal only
support alias.
For questions regarding this document, email:
[email protected] and copy the
Internal Contributor/submitter and Resp. Engineer
Internal Contributor/submitter
[email protected], [email protected]
Internal Eng Responsible Engineer
[email protected], [email protected]
Internal Services Knowledge Engineer
[email protected]
Internal Eng Business Unit Group
Systems Group - Enterprise Systems, Systems Group - Netra Systems and @Networking
Internal Escalation ID: 1-554249604
Internal Resolution Patches: 114527-18
References
@ <BUG:6864515> - JAVA REQ CNT - LOGIN SCREEN TO BE CONSISTENT WITH THE LOGIN FOR .NET CLIENT
<NOTE:1021064.1> - Preventing potential system outages through proactive cooling improvements on Sun Fire Entry Level servers.
Attachments
This solution has no attachment