Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003853.1
Update Date:2010-07-20
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003853.1 :   Sun Fire[TM] 12/15/20/25K server found with DC Breaker in the 'off' position  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
205408


Symptoms
Either during onsite inspection, or through a customer call, it is noted that a DC (Direct Current) breaker on a Sun Fire[TM] 15K is unexpectedly in the 'off' position.



Resolution
BACKGROUND:
Older Sun Fire[TM] 12K and 15K servers had exposed breakers that could be accidentally operated. It was not uncommon in the early days to find a server with a breaker that was powered off for no apparent reason. This was due to the ease with which a cable could drop down and flick off a breaker.

This has bread an unhealthy trust in the belief that if a breaker is found in the 'off' position, that it can simply be flipped back on.

The newer 15K's, 20 and 25K's have a shroud that protects the breakers from accidental operation, however, it's still possible to operate one of these switches without the knowledge that it has been switched off if the technician or customer are not completely focused on the task.

The DC (Direct Current) breakers high end servers are designed to protect the platform from short circuits within the platform.

They will turn off under one of a number of different conditions.
- Power draw on the circuit is too great, indicating a problem, such as a short, inside the platform. (This is of course their primary purpose)
- An operator manually switches the breaker. (Never recommended whilst a platform is operational)
- Someone accidentally flicks the breaker with a cable or hand or some other article.
- The last time the system was serviced, the technician or customer did not turn it back on.

Each of these possibilities MUST be considered when a DC breaker is found in the OFF positions.

If the breaker has operated due to a short in the frame, simply turning the breaker back on could actually cause a COMPLETE platform outage. Don't do this, or we could end up with a REALLY unhappy customer.

One more important fact to note is that the DC breakers are NOT instrumented. ie: The SC does not know directly if a DC breaker has flipped off. As a result of this, there are NO messages in the SC's logs relating to the breaker itself, although there WILL be messages relating to the SC's inability to communicate to the components that are located in the slot to which that DC breaker supplies power.

In a platform where the entire system is not fully utilized, there is a good chance that this issue may go unnoticed for days, weeks, or even months.

RESOLUTION:

To remedy this situation, without unexpected interruption to the platform, the following action plan should be followed.

- Request a PLATFORM OUTAGE. (ie: All domains down).
This plan will require at LEAST 3 hours.
- Bring down all domains. Keyswitch off all domains. Poweroff all boards.
- Halt SC's. Power off the entire system, including operating the AC breakers on the power supplies
- Turn ON the DC breaker that had been found in the 'off' position.
NOTE: We can do this with confidence. If there is a short within the frame, when we re-apply AC power, the breaker will simply flip off.- - - Apply power to the AC power supplies
- Observe the DC breaker in question. (do not bring and domains up yet)

- If the breaker remains ON, then it is most likely that there is no real problem and that the breaker was operated accidentally, or deliberately by the customer or field service representative.
- poweron the components for which the DC breaker supplies power.
If there are no problems, bringup domain that uses these components.

- If it flips OFF, we could well have a short within the frame, however, the possibility exists that this is being cause by a component within the system.
- Halt both SC's
- Poweroff the AC breakers again (poweroff entire platform)
- Remove the components to which this breaker supplies power.
- CLOSELY inspect the power connector for damage.
- Use a flashlight to inspect the power centreplane and also inspect the expander for damage, and attempt to locate and components that have exploded or burnt.
If damage is evident at the connector level, it is likely that the component that is damaged *AND* the board into which it plugs (ie: Power Centreplane) will need to be replaced. If this is the case, a much longer outage will be required. (Up to 12 hours)
If it is noted that there is ONLY damage on the board level, and not on the connector into which it plugs, replace the damaged component and re-start this action plan at the point where we poweroff the entire system, including AC power.

- If after component replacement, the breaker again opens, we have a short within the frame, power chassis or power centreplane. Continue below at INTERNAL SHORT

INTERNAL SHORT
- In the unlikely scenario that we have an internal short, we will need a much longer outage, and will need to spend some considerable time determining how to best locate and resolve the problem. In this case, it would be recommended that we involve the SAM, NSSE or RSSE, in addition to PTS. Escalate to PTS to engage someone with appropriate DC troubleshooting skills and work on an action plan.

If an internal short is evident:
- re-assemble the system
- leave the suspect breaker in the OFF position
- Place brightly colored tape over the breaker instructing that the breaker NOT be operated.
- Advise PTS that it appears that we have an internal short and that we need to work through the next phase.



Relief/Workaround

As the high end servers are architected with a great deal of redundancy, the remainder of the platform will operate without issue with this fault present.

As such, no workaround is required.



Additional Information
To familiarize yourself with the location of the DC-breakers, please take a look at the Sun Fire[TM] 15K - Service Views in the Sun[TM] Systems Handbookm available at:

http://sunsolve2.central.sun.com/handbook_internal/Systems/SunFire15K/component.front_open.html.
The breakers are small, black & white, at the lower left part of the chassis above the AC power supplies PS1, PS2, PS4 and PS5.



Product
Sun Fire E25K Server
Sun Fire E20K Server
Sun Fire 15K Server
Sun Fire 12K Server

Internal Comments
For examples of what lead to the creation of this doc, See PTS Escalations:


1-2680096 (Case 10567118 -- Internal Short)
1-8666013 (Case 64562580 -- No issue, likely accidental operation)
12k, 15k, 20k, 25k, Sun Fire, high end, breaker, DC, DC-DC, expander
Previously Published As
81918

Product_uuid
d842dd03-059b-11d8-84cb-080020a9ed93|Sun Fire E25K Server
1404a2d3-059a-11d8-84cb-080020a9ed93|Sun Fire E20K Server
29e4659c-0a18-11d6-9fa1-e67bbc033df8|Sun Fire 15K Server
077fd4c5-df8f-4320-ad69-7d01603a674d|Sun Fire 12K Server

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback