Asset ID: |
1-71-1019147.1 |
Update Date: | 2012-07-30 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1019147.1
:
Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000 Servers: Fan/fantray temperature and failure behavior
Related Items |
- Sun SPARC Enterprise M9000-64 Server
- Sun SPARC Enterprise M9000-32 Server
- Sun SPARC Enterprise M5000 Server
- Sun SPARC Enterprise M3000 Server
- Sun SPARC Enterprise M4000 Server
- Sun SPARC Enterprise M8000 Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
- .Old GCS Categories>Sun Microsystems>Servers>OPL Servers
|
PreviouslyPublishedAs
235941
Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000 Servers: Fan/fantray temperature and failure behavior
In this Document
Goal
Solution
M3000
M4000 & M5000
M8000 & M9000
Applies to:
Sun SPARC Enterprise M4000 Server
Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable and later [Release: N/A and later]
Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable and later [Release: N/A and later]
Sun SPARC Enterprise M8000 Server
Sun SPARC Enterprise M5000 Server
All Platforms
Sun SPARC Enterprise(R) Mx000 Servers: Fan/fantray temperature and failure behavior
Goal
This document describes the fan / fan tray redundancy, the behavior of
the systems in case of fan / fan tray failure and the fan speed control
depending on inlet temperature.
Solution
This section describes the environment and failure behavior.
The showenvironment command is used to display various temperatures and fan speeds.
XSCF> showenvironment
XSCF> showenvironment temp
XSCF> showenvironment Fan
The showaltitude command is used to display the altitude setting.
XSCF> showaltitude
Note: In the following tables, threshold temperatures are temperatures where an action is taken, the sign ">" means greater or equal and the sign "<" means smaller or equal in this context.
M3000
M3000 systems have a single inlet temperature sensor on the Operator Panel (OPNL)
M3000 systems have the exhaust temperature sensor on the Motherboard (MBU). The CPU chip has a temperature sensor.
Fans on M3000 systems have 10 speeds: level 1 .. level 10
Fan failure behavior- Failure of a FAN_A, and if the other FAN_A in the cooling group is operational, then this other fan's speed is raised to full speed (level 10) and the speed of all other fans on the platform is raised to high speed (level 9).
- If the failure is due to a failure of a fan in a PSU, and if all the other fans in the PSUs of the cooling group are operational, then the other fan of this PSU has its speed raised to full speed and the speed of all other fans on the platform is raised to high speed.
- If the second fan in a cooling group becomes non-operational, then XSCF shuts down the domain and powers off the platform.
M3000 Cooling Groups
|
CG#1
|
CG#2
|
Fans
|
FAN_A#0 FAN_A#1
|
Fan in PSU#0 Fan in PSU#1
|
Hardware
|
MBU DIMMs PCI slots
|
DVD HDDs PSUs
|
M3000
|
Typical
fan speed
|
Fan
speed
|
Standby
|
level1
|
level2
|
level3
|
Level4 *
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
RPM FAN_A#
|
0
|
3500
|
3550
|
3600
|
3700
|
3800
|
4100
|
4400
|
5000
|
5800
|
6600
|
RPM Fan PSU
|
3400
|
6000
|
6000
|
6000
|
6000
|
6200
|
6350
|
6550
|
6700
|
6900
|
12000 |
* Level 4 is the initial value after power on
M3000
|
Minimum
fan speed below which a fan is declared failed.
|
Fan
speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
RPM FAN_A#
|
N/A
|
2130
|
2160
|
2190
|
2220
|
2290
|
2460
|
2670
|
3000
|
3500
|
3950
|
RPM Fan PSU
|
N/A
|
4800
|
4800
|
4800
|
4800
|
4960
|
5120
|
5280
|
5440
|
5600
|
9440 |
M3000
|
Fan
speed relation to inlet temperature (°C)
500 m or below
|
Fan
speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold
temp.
|
Domain power off
|
< 19
|
> 20 <
21
|
> 22 <
23
|
> 24 <
25
|
> 26 <
27
|
> 28 <
29
|
> 30 <
31
|
> 32 <
33
|
> 34
|
*
|
Inlet
overtemperature set: > 38 Inlet overtemperature reset: <
35 |
M3000
|
Fan
speed relation to inlet temperature (°C)
501 m to 1000 m
|
Fan
speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold
temp.
|
Domain power off
|
< 17
|
> 18 <
16
|
> 20 <
21
|
> 22 <
23
|
> 24 <
25
|
> 26 <
27
|
> 28 <
29
|
> 30 <
31
|
> 32
|
*
|
Inlet
overtemperature set: > 36 Inlet overtemperature reset: <
33 |
M3000
|
Fan
speed relation to inlet temperature (°C)
1001 m to 1500 m
|
Fan
speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold
temp.
|
Domain power off
|
< 15
|
> 16 <
14
|
> 18 <
16
|
> 20 <
21
|
> 22 <
23
|
> 24 <
25
|
> 26 <
27
|
> 28 <
29
|
> 30
|
*
|
Inlet
overtemperature set: > 34 Inlet overtemperature reset: <
31
|
M3000
|
Fan
speed relation to inlet temperature (°C)
1501m to 3000 m
|
Fan
speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold
temp.
|
Domain power off
|
< 13
|
> 14 <
12
|
> 16 <
14
|
> 18 <
16
|
> 20 <
21
|
> 22 <
23
|
> 24 <
25
|
> 26 <
27
|
> 28
|
*
|
Inlet
overtemperature set: > 32 Inlet overtemperature reset: <
29 |
* Full speed is never set based on temperature, it is only used in case of a fan failure in the cooling group.
M3000
|
Overtemperature
behaviour (°C)
|
Overtemperature
status
|
Overtemperature
|
Overtemperature
warning
|
Overtemperature
fail
|
condition
inlet temperature
|
see
tables above
|
-
|
-
|
condition CPU
temperature
|
set:
> 77 reset: < 64
|
set:
> 82
|
set:
> 102
|
condition MBU
temperature
|
set:
> 64 reset: < 48
|
set:
> 69
|
set:
> 79
|
Action
|
Set
all fans to high speed
|
Shutdown
domain then power off platform
|
emergency
power off platform
|
ereport
|
chassis.env.temp.ot@ <location>
|
chassis.env.temp.otw@ <location>
|
chassis.env.temp.otf@ <location> |
M4000 & M5000
M4000 & M5000 systems have a single inlet temperature sensor on the Operator Panel (OPNL) FRU.
There is an exhaust temperature sensor on each IOU.
Each CPU has a temperature sensor.
Fan failure behavior- Failure of a FAN_A, and if the other FAN_A in the cooling group is operational, then this other fan's speed is raised to full speed and the speed of all other fans on the platform is raised to high speed.
- If the failure is due to a failure of a fan in a PSU, and if all the other fans in the PSUs of the cooling group are operational, then the other fan of this PSU has its speed raised to full speed and the speed of all other fans on the platform is raised to high speed.
- If the second fan in a cooling group becomes non-operational, then XSCF sends a shutdown request to all domains in the system and powers off the system.
- FAN_B are specific for M4000 platform.
M4000
Cooling Groups
|
CG#1
|
CG#2
|
CG#3
|
Fans
|
FAN_A#0 FAN_A#1
|
Fan
in PSU#0 Fan in PSU#1
|
FAN_B#0 FAN_B#1
|
Hardware
|
MBU CPUM#0 CPUM#1 MEB#0..4
|
PSUs IOU
|
XSCFU HDD DVDU TAPEU |
M5000
Cooling Groups
|
CG#1
|
CG#2
|
CG#3
|
CG#4
|
Fans
|
FAN_A#0 FAN_A#1
|
FAN_A#2 FAN_A#3
|
Fan
in PSU#0 Fan in PSU#1
|
Fan
in PSU#2 Fan in PSU#3
|
Hardware
|
1/2
MBU CPUM#0 CPUM#1 MEB#0..3 HDD#0..3
|
1/2
MBU CPUM#2 CPUM#3 MEB#4..7 TAPEU DVDU
|
PSU#0 PSU#1 IOU0 XSCFU
|
PSU#2 PSU#3 IOU1 |
M4000 M5000
|
Typical
fan speed
|
Fan
speed
|
Standby
|
Low
*
|
middle
|
high
|
full
|
RPM Fan PSU
|
2400
|
3600
|
4200
|
5200
|
8400
|
RPM FAN_A#
|
0
|
3200
|
4200
|
5300
|
5900
|
RPM FAN_B#
|
5000
|
10000
|
10000
|
10000
|
12000 |
* low is the initial value after power on
M4000 M5000
|
Minimum
fan speed below which a fan is declared failed.
|
Fan
speed
|
Standby
|
low
|
middle
|
high
|
full
|
RPM Fan PSU
|
1520
|
2680
|
3160
|
4200
|
6400
|
RPM FAN_A#
|
N/A
|
2560
|
3280
|
4080
|
4880
|
RPM FAN_B#
|
3000
|
7520
|
7520
|
7520
|
9440
|
M4000 M5000
|
Fan
speed relation to inlet temperature 500 m or below (with air
filters installed, lower all threshold temperatures by 3 °C)
|
Fan
speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold
temp.
|
Domains powered off
|
< 23
|
> 25 <
28
|
> 30
|
*
|
Inlet
overtemperature set: > 38 Inlet overtemperature reset: <
35 |
M4000 M5000
|
Fan
speed relation to inlet temperature (°C)
501 m to 1000 m (with air filters installed,
lower all threshold temperatures by 3 °C)
|
Fan
speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold
temp.
|
Domains powered off
|
< 21
|
> 23 <
26
|
> 28
|
*
|
Inlet
overtemperature set: > 36 Inlet overtemperature reset: <
33
|
M4000 M5000
|
Fan
speed relation to inlet temperature (°C)
1001 m to 1500 m (with air filters installed, lower all
threshold temperatures by 3 °C)
|
Fan
speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold
temp.
|
Domains powered off
|
< 19
|
> 21 <
24
|
> 26
|
*
|
Inlet
overtemperature set: > 34 Inlet overtemperature reset: <
31
|
M4000 M5000
|
Fan
speed relation to inlet temperature (°C)
1501m to 3000 m (with air filters installed, lower all
threshold temperatures by 3 °C)
|
Fan
speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold
temp.
|
Domains powered off
|
< 17
|
> 19 <
22
|
> 24
|
*
|
Inlet
overtemperature set: > 32 Inlet overtemperature reset: <
29 |
* Full speed is never set based on temperature, it is only used in case of a fan failure in the cooling group.
M4000 M5000
|
Overtemperature
behaviour (°C)
|
Overtemperature
status
|
Overtemperature
|
Overtemperature
warning
|
Overtemperature
fail
|
condition
inlet temperature
|
see
tables above
|
-
|
-
|
condition CPU
temperature
|
set:
> 79 reset: < 71
|
set:
> 82
|
set:
> 104
|
condition IOU
temperature
|
set:
> 60 reset: < 49
|
set:
> 65
|
set:
> 75
|
Action
|
Set
all fans to high speed
|
Shutdown
all domains then power off platform
|
Emergency
power off platform
|
ereport
|
chassis.env.temp.ot@ <location>
|
chassis.env.temp.otw@ <location>
|
chassis.env.temp.otf@ <location> |
M8000 & M9000
M8000 & M9000 systems have a single inlet temperature sensor on the Sensor (SNSU) FRU.
M8000 systems have exhaust temperature sensors for each CMU.
M9000 systems have exhaust temperature sensors for each CMU and each XBU.
Each CPU has a temperature sensor.
M8000 Cooling
Groups
|
CG#1
|
CG#2
|
Fan
Trays
|
FAN_A#0 FAN_A#1 FAN_A#2 FAN_A#3 FAN_B#0 FAN_B#1
|
FAN_B#2 FAN_B#3 FAN_B#4 FAN_B#5 FAN_B#6 FAN_B#7
|
Hardware
|
CMU#0 CMU#1 CMU#2 CMU#3 DDC_A#0 DDC_A#1 XSCFU
|
IOU#0 IOU#1 IOU#2 IOU#3
|
M9000 Cooling
Groups
|
CG#1
|
CG#2
|
CG#3
|
Fan
Trays
base
cabinet
|
FAN_A#0 FAN_A#1 FAN_A#2 FAN_A#3
|
FAN_A#4 FAN_A#5 FAN_A#6 FAN_A#7 FAN_A#8 FAN_A#9
|
FAN_A#10 FAN_A#11 FAN_A#12 FAN_A#13 FAN_A#14 FAN_A#15
|
Hardware
base
cabinet
|
XBUs CLCKUs XSCFUs IOU#0 IOU#2 IOU#4 IOU#6
|
CMU#0 CMU#1 CMU#2 CMU#3 IOU#1 IOU#3
|
CMU#4 CMU#5 CMU#6 CMU#7 IOU#5 IOU#7
|
Fan
Trays
expansion
cabinet
|
FAN_A#20 FAN_A#21 FAN_A#22 FAN_A#23
|
FAN_A#24 FAN_A#25 FAN_A#26 FAN_A#27 FAN_A#28 FAN_A#29
|
FAN_A#30 FAN_A#31 FAN_A#32 FAN_A#33 FAN_A#34 FAN_A#35
|
Hardware
expansion
cabinet
|
XBUs CLCKUs XSCFUs IOU#8 IOU#10 IOU#12 IOU#14
|
CMU#8 CMU#9 CMU#10 CMU#11 IOU#9 IOU#11
|
CMU#12 CMU#13 CMU#14 CMU#15 IOU#13 IOU#15 |
Fan failure behavior:
M8000 and M9000- The fan trays have 2 fans (FAN_B) or 3 fans (FAN_A), the fans in the fan trays are N+1 redundant, which means after the first fan failure the fan tray should be replaced as soon as possible to avoid any domain/platform outage. Fan trays are not redundant.
- The fans in the PSUs do not have their speed controlled by XSCF. The fan speed on these fans is controlled by a microcontroller internal to the PSU.If a fan in a PSU fails, then the PSU is powered off and deconfigured.
- If there are insufficient operational PSUs to power the platform, then the platform is powered down.
M8000 specific- A failure in cooling group #1 or cooling group #2 will affect the entire platform.
- If second fan in a specific fantray becomes non-operational or if the fan tray itself fails it then XSCF will not permit to power up the platform. If the failure happens while the platform is powered up, all fan speed in the platform will be raised to high, domains will not be shut down, only warning messages will be issued. In this situation, the system relies on the overtemperature behavior described later in this document.
M9000 specific- A failure in cooling group #1 will affect the entire platform. A failure in cooling group #2 or cooling group#3 will affect only their specific FRUs, hence the domains being cooled by that cooling group.
- If second fan in a specific fantray becomes non-operational or if the fan tray itself fails it then XSCF will not permit to power up the platform or specific FRUs. If the failure happens while the platform is powered up, all fan speed in the platform will be raised to high, domains will not be shut down, only warning messages will be issued. In this situation, the system relies on the overtemperature behavior described later in this document.
M8000 M9000
|
Typical
fan speed
|
Fan
speed
|
Normal
|
High
|
RPM FAN_A# FAN_B#
|
3700
|
5500
|
M8000 M9000
|
Minimum
fan speed below which a fan is declared failed.
|
Fan
speed
|
Normal
|
High
|
RPM FAN_A# FAN_B#
|
3100
|
4200
|
M8000 M9000
|
Fan
speed relation to inlet temperature (°C)
1500 m or below
|
Fan
speed
|
Normal
|
High
|
threshold
temp.
|
< 24
|
> 27
|
Inlet
overtemperature set: > 36 Inlet overtemperature reset: <
32
|
M8000 M9000
|
Fan
speed relation to inlet temperature (°C)
1501 m to 2000 m
|
Fan
speed
|
Normal
|
High
|
threshold
temp.
|
< 22
|
>25
|
Inlet
overtemperature set: > 34 Inlet overtemperature reset: <
30
|
M8000 M9000
|
Fan
speed relation to inlet temperature (°C)
2001 m to 2500 m
|
Fan
speed
|
Normal
|
High
|
threshold
temp.
|
< 20
|
> 23
|
Inlet
overtemperature set: > 32 Inlet overtemperature reset: <
28
|
M8000 M9000
|
Fan
speed relation to inlet temperature (°C)
2501 m to 3000 m
|
Fan
speed
|
Normal
|
High
|
threshold
temp.
|
< 18
|
> 21
|
Inlet
overtemperature set: > 30 Inlet overtemperature reset: <
26
|
M8000 M9000
|
Overtemperature
behaviour (°C)
|
Overtemperature
status
|
Overtemperature
|
Overtemperature
warning
|
Overtemperature
fail
|
condition
inlet temperature
|
see
tables above
|
-
|
-
|
-
|
-
|
condition CPU
temperature
|
set:
> 79 reset: < 71
|
set:
> 82
|
-
|
set:
> 104
|
-
|
condition CMU
temperature
|
set:
> 61 reset: < 49
|
set:
> 66
|
-
|
set:
> 76
|
-
|
condition XBU
temperature
|
set:
> 61 reset: < 49
|
-
|
set:
> 66
|
-
|
set:
> 76
|
Action
|
Set
all fans to high speed
|
Shutdown
domains in CG then power off all hardware in CG
|
Shutdown
all domains then power off platform
|
emergency
power off all hardware in CG
|
emergency
power off platform
|
ereport
|
chassis.env.temp.ot@<location>
|
chassis.env.temp.otw@ <location>
|
chassis.env.temp.otf@ <location> |
Cooling requirements:
-
|
Rated
Power
|
Cooling
Requirements
|
Flow
|
|
W
|
BTU/h
|
KJ/h
|
m3/min
|
M3000
|
470
|
1603
|
1692
|
1.75
|
M4000
|
2350
|
8018
|
8046
|
7
|
M5000
|
4590
|
16036
|
16524
|
14
|
M8000
|
10500
|
35857
|
37800
|
94
|
M9000-32
|
21300
|
72740
|
76680
|
102
|
M9000-64
|
42600
|
145479
|
153360
|
205
|
Required environment:
|
Temperature
(°C)
|
Relative
Humidity (%)
|
|
Non-Op.
|
Operating
|
Non-Op.
|
Operating
|
|
Range
|
Range
|
Best Range
|
Range
|
Range
|
Best Range
|
M3000
|
0
to 50
-20
to 60 (packed)
|
0-500
m: 5-35 501-100 m: 5-33 1001-1500
m: 5-31 1501-3000 m: 5-29
|
21-23
|
0-93
|
20-80
|
45-50
|
M4000/M5000
|
0
to 50
-20
to 60 (packed)
|
0-500
m: 5-35 501-100 m: 5-33 1001-1500
m: 5-31 1501-3000 m: 5-29
|
21-23
|
0-93
|
20-80
|
45-50
|
M8000/M9000
|
0
to 50
|
0-1500
m: 5-32 1501-2000 m: 5-30 2001-2500
m: 5-28 2501-3000 m: 5-26
|
22-26
|
8-80
|
20-80
|
40-50
|
Internal section
This document describes the state as of XCP 1090
Additional troubleshooting information:
A. In case of:
- multiple fan failures occurred in a short timeframe
- all fans of same type rotating exactly at the same speed
please pay attention and consider a fan controller issue before replacing all the affected fan tray(s).
B. FF platforms (M5000 and M4000) have different fan backplanes:
- M5000 has one Fan Backplane (for 172mm Fans, FAN_A type) that includes the fan controller
- M4000 has two Fan Backplanes: one for 172mm Fans, one for 60mm Fans (FAN_B type) that includes the fan controller
Keep this in mind when dealing with multiple or repeated fan failures (i.e.: repeated FAN_A fault on M5000 may be due to failed controller, that is not included on the FAN_A backplane); check below the showhardconf output bot both platforms:
M5000
FANBP_C Status:Normal; Ver:0501h; Serial:NN110224CD;
+ FRU-Part-Number:CF00541-3099 01 /541-3099-01 ;
FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FAN_A#2 Status:Normal;
FAN_A#3 Status:Normal;
M4000
FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FANBP_B Status:Normal; Ver:0201h; Serial:BF0844MV6G ;
+ FRU-Part-Number:CF00541-0909 04 /541-0909-04 ;
FAN_B#0 Status:Normal;
FAN_B#1 Status:Normal;
FAN CR's:
- 6875469 - On OPL M3000 Ikkaku, fan speed is set to level 4 upon domain power up, regardless of conditions.
- 6870490 - On M4000/M5000/M8000/M9000, fan alarm condition chgs while XSCF is down are ignored on XSCF boot
References:
- OPL FF & DC Environment preso (Andre Beusch): Environment TOI
- OPL DC Hardware FAQ (Dan Nygren)
- Troubleshooting a Noisy FAN on OPL Servers: Doc 1339901.1
- Tracking page for cases where customers complain about fans being too noisy in M4000 / M5000 systems. Noisy Fan Tracking Page.
Keywords:
OPL, Mx000, thermal temp, fan, fantray, tray, redundancy, M4000, M5000, M8000, M9000, M9000+ speed
Attachments
This solution has no attachment