Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1019147.1
Update Date:2012-07-30
Keywords:

Solution Type  Technical Instruction Sure

Solution  1019147.1 :   Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000 Servers: Fan/fantray temperature and failure behavior  


Related Items
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
  • Sun SPARC Enterprise M3000 Server
  •  
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>OPL Servers
  •  

PreviouslyPublishedAs
235941
Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000 Servers: Fan/fantray temperature and failure behavior

In this Document
  Goal
  Solution
     M3000
     M4000 & M5000
     M8000 & M9000


Applies to:

Sun SPARC Enterprise M4000 Server
Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Enterprise M8000 Server
Sun SPARC Enterprise M5000 Server
All Platforms
Sun SPARC Enterprise(R) Mx000 Servers: Fan/fantray temperature and failure behavior

Goal

This document describes the fan / fan tray redundancy, the behavior of the systems in case of fan / fan tray failure and the fan speed control depending on inlet temperature.

Solution

This section describes the environment and failure behavior.

The showenvironment command is used to display various temperatures and fan speeds.
XSCF> showenvironment
XSCF> showenvironment temp
XSCF> showenvironment Fan
The showaltitude command is used to display the altitude setting.
XSCF> showaltitude

Note: In the following tables, threshold temperatures are temperatures where an action is taken, the sign ">" means greater or equal and the sign "<" means smaller or equal in this context.

M3000

M3000 systems have a single inlet temperature sensor on the Operator Panel (OPNL)
M3000 systems have the exhaust temperature sensor on the Motherboard (MBU). The CPU chip has a temperature sensor.
Fans on M3000  systems have 10 speeds: level 1 .. level 10

Fan failure behavior
  • Failure of a FAN_A, and if the other FAN_A in the cooling group is operational, then this other fan's speed is raised to full speed (level 10) and the speed of all other fans on the platform is raised to high speed (level 9).
  • If the failure is due to a failure of a fan in a PSU, and if all the other fans in the PSUs of the cooling group are operational, then the other fan of this PSU has its speed raised to full speed and the speed of all other fans on the platform is raised to high speed.
  • If the second fan in a cooling group becomes non-operational, then XSCF shuts down the domain and powers off the platform.

M3000 Cooling Groups

CG#1

CG#2

Fans

FAN_A#0
FAN_A#1

Fan in PSU#0
Fan in PSU#1

Hardware

MBU
DIMMs
PCI slots

DVD
HDDs
PSUs


M3000

Typical fan speed

Fan speed

Standby

level1

level2

level3

Level4
*

level5

level6

level7

level8

level9

level10

RPM
FAN_A#

0

3500

3550

3600

3700

3800

4100

4400

5000

5800

6600

RPM
Fan PSU

3400

6000

6000

6000

6000

6200

6350

6550

6700

6900

12000



* Level 4 is the initial value after power on


M3000

Minimum fan speed below which a fan is declared failed.

Fan speed

Standby

level1

level2

level3

Level4

level5

level6

level7

level8

level9

level10

RPM
FAN_A#

N/A

2130

2160

2190

2220

2290

2460

2670

3000

3500

3950

RPM
Fan PSU

N/A

4800

4800

4800

4800

4960

5120

5280

5440

5600

9440


M3000

Fan speed relation to inlet temperature (°C) 500 m or below

Fan speed

Standby

level1

level2

level3

Level4

level5

level6

level7

level8

level9

level10

none

low

low

low

low

middle

middle

high

high

high

full

threshold temp.

Domain
power
off


< 19

> 20
< 21

> 22
< 23

> 24
< 25

> 26
< 27

> 28
< 29

> 30
< 31

> 32
< 33

> 34

*

Inlet overtemperature set: > 38
Inlet overtemperature reset: < 35


M3000

Fan speed relation to inlet temperature (°C) 501 m to 1000 m

Fan speed

Standby

level1

level2

level3

Level4

level5

level6

level7

level8

level9

level10

none

low

low

low

low

middle

middle

high

high

high

full

threshold temp.

Domain
power
off


< 17

> 18
< 16

> 20
< 21

> 22
< 23

> 24
< 25

> 26
< 27

> 28
< 29

> 30
< 31

> 32

*

Inlet overtemperature set: > 36
Inlet overtemperature reset: < 33


M3000

Fan speed relation to inlet temperature (°C) 1001 m to 1500 m

Fan speed

Standby

level1

level2

level3

Level4

level5

level6

level7

level8

level9

level10

none

low

low

low

low

middle

middle

high

high

high

full

threshold temp.

Domain
power
off


< 15

> 16
< 14

> 18
< 16

> 20
< 21

> 22
< 23

> 24
< 25

> 26
< 27

> 28
< 29

> 30

*

Inlet overtemperature set: > 34
Inlet overtemperature reset: < 31


M3000

Fan speed relation to inlet temperature (°C) 1501m to 3000 m

Fan speed

Standby

level1

level2

level3

Level4

level5

level6

level7

level8

level9

level10

none

low

low

low

low

middle

middle

high

high

high

full

threshold temp.

Domain
power
off


< 13

> 14
< 12

> 16
< 14

> 18
< 16

> 20
< 21

> 22
< 23

> 24
< 25

> 26
< 27

> 28

*

Inlet overtemperature set: > 32
Inlet overtemperature reset: < 29


* Full speed is never set based on temperature, it is only used in case of a fan failure in the cooling group.

M3000

Overtemperature behaviour (°C)

Overtemperature status

Overtemperature

Overtemperature warning


Overtemperature fail

condition inlet temperature

see tables above

-

-

condition CPU temperature

set: > 77
reset: < 64

set: > 82

set: > 102

condition MBU temperature

set: > 64
reset: < 48

set: > 69

set: > 79

Action

Set all fans to high speed

Shutdown domain then power off platform

emergency power off platform

ereport

chassis.env.temp.ot@
<location>

chassis.env.temp.otw@
<location>

chassis.env.temp.otf@
<location>


M4000 & M5000

M4000 & M5000 systems have a single inlet temperature sensor on the Operator Panel (OPNL) FRU.
There is an exhaust temperature sensor on each IOU.
Each CPU has a temperature sensor.

Fan failure behavior
  • Failure of a FAN_A, and if the other FAN_A in the cooling group is operational, then this other fan's speed is raised to full speed and the speed of all other fans on the platform is raised to high speed.
  • If the failure is due to a failure of a fan in a PSU, and if all the other fans in the PSUs of the cooling group are operational, then the other fan of this PSU has its speed raised to full speed and the speed of all other fans on the platform is raised to high speed. 
  • If the second fan in a cooling group becomes non-operational, then XSCF sends a shutdown request to all domains in the system and powers off the system.
  • FAN_B are specific for M4000 platform.

M4000 Cooling Groups

CG#1

CG#2

CG#3

Fans

FAN_A#0
FAN_A#1

Fan in PSU#0
Fan in PSU#1

FAN_B#0
FAN_B#1

Hardware

MBU
CPUM#0
CPUM#1
MEB#0..4

PSUs
IOU

XSCFU
HDD
DVDU
TAPEU


M5000 Cooling Groups

CG#1

CG#2

CG#3

CG#4

Fans

FAN_A#0
FAN_A#1

FAN_A#2
FAN_A#3

Fan in PSU#0
Fan in PSU#1

Fan in PSU#2
Fan in PSU#3

Hardware

1/2 MBU
CPUM#0
CPUM#1
MEB#0..3
HDD#0..3

1/2 MBU
CPUM#2
CPUM#3
MEB#4..7
TAPEU
DVDU

PSU#0
PSU#1
IOU0
XSCFU

PSU#2
PSU#3
IOU1


M4000
M5000

Typical fan speed

Fan speed

Standby

Low *

middle

high

full

RPM
Fan PSU

2400

3600

4200

5200

8400

RPM
FAN_A#

0

3200

4200

5300

5900

RPM
FAN_B#

5000

10000

10000

10000

12000


* low is the initial value after power on

M4000
M5000

Minimum fan speed below which a fan is declared failed.

Fan speed

Standby

low

middle

high

full

RPM
Fan PSU

1520

2680

3160

4200

6400

RPM
FAN_A#

N/A

2560

3280

4080

4880

RPM
FAN_B#

3000

7520

7520

7520

9440


 

M4000
M5000

Fan speed relation to inlet temperature 500 m or below
(with air filters installed, lower all threshold temperatures by 3 °C)

Fan speed

Standby

low

middle

high

full

threshold temp.

Domains
powered
off


< 23

> 25
< 28

> 30

*

Inlet overtemperature set: > 38
Inlet overtemperature reset: < 35


M4000
M5000

Fan speed relation to inlet temperature (°C) 501 m to 1000 m
(with air filters installed, lower all threshold temperatures by 3 °C)

Fan speed

Standby

low

middle

high

full

threshold temp.

Domains
powered
off


< 21

> 23
< 26

> 28

*

Inlet overtemperature set: > 36
Inlet overtemperature reset: < 33


M4000
M5000

Fan speed relation to inlet temperature (°C) 1001 m to 1500 m
(with air filters installed, lower all threshold temperatures by 3 °C)

Fan speed

Standby

low

middle

high

full

threshold temp.

Domains
powered
off


< 19

> 21
< 24

> 26

*

Inlet overtemperature set: > 34
Inlet overtemperature reset: < 31


M4000
M5000

Fan speed relation to inlet temperature (°C) 1501m to 3000 m
(with air filters installed, lower all threshold temperatures by 3 °C)

Fan speed

Standby

low

middle

high

full

threshold temp.

Domains
powered
off


< 17

> 19
< 22

> 24

*

Inlet overtemperature set: > 32
Inlet overtemperature reset: < 29


* Full speed is never set based on temperature, it is only used in case of a fan failure in the cooling group.

M4000
M5000

Overtemperature behaviour (°C)

Overtemperature status

Overtemperature

Overtemperature warning


Overtemperature fail

condition inlet temperature

see tables above

-

-

condition CPU temperature

set: > 79
reset: < 71

set: > 82

set: > 104

condition IOU temperature

set: > 60
reset: < 49

set: > 65

set: > 75

Action

Set all fans to high speed

Shutdown all domains then power off platform

Emergency power off platform

ereport

chassis.env.temp.ot@
<location>

chassis.env.temp.otw@
<location>

chassis.env.temp.otf@
<location>


M8000 & M9000

M8000 & M9000 systems have a single inlet temperature sensor on the Sensor (SNSU) FRU.
M8000 systems have exhaust temperature sensors for each CMU.
M9000 systems have exhaust temperature sensors for each CMU and each XBU.
Each CPU has a temperature sensor.

M8000
Cooling Groups

CG#1

CG#2

Fan Trays

FAN_A#0
FAN_A#1
FAN_A#2
FAN_A#3
FAN_B#0
FAN_B#1

FAN_B#2
FAN_B#3
FAN_B#4
FAN_B#5
FAN_B#6
FAN_B#7

Hardware

CMU#0
CMU#1
CMU#2
CMU#3
DDC_A#0
DDC_A#1
XSCFU

IOU#0
IOU#1
IOU#2
IOU#3


M9000
Cooling Groups

CG#1

CG#2

CG#3

Fan Trays

base cabinet

FAN_A#0
FAN_A#1
FAN_A#2
FAN_A#3

FAN_A#4
FAN_A#5
FAN_A#6
FAN_A#7
FAN_A#8
FAN_A#9

FAN_A#10
FAN_A#11
FAN_A#12
FAN_A#13
FAN_A#14
FAN_A#15

Hardware

base cabinet

XBUs
CLCKUs
XSCFUs
IOU#0
IOU#2
IOU#4
IOU#6

CMU#0
CMU#1
CMU#2
CMU#3
IOU#1
IOU#3

CMU#4
CMU#5
CMU#6
CMU#7
IOU#5
IOU#7

Fan Trays

expansion cabinet

FAN_A#20
FAN_A#21
FAN_A#22
FAN_A#23

FAN_A#24
FAN_A#25
FAN_A#26
FAN_A#27
FAN_A#28
FAN_A#29

FAN_A#30
FAN_A#31
FAN_A#32
FAN_A#33
FAN_A#34
FAN_A#35

Hardware

expansion cabinet

XBUs
CLCKUs
XSCFUs
IOU#8
IOU#10
IOU#12
IOU#14

CMU#8
CMU#9
CMU#10
CMU#11
IOU#9
IOU#11

CMU#12
CMU#13
CMU#14
CMU#15
IOU#13
IOU#15


Fan failure behavior:

M8000 and M9000

  • The fan trays have 2 fans (FAN_B) or 3 fans (FAN_A), the fans in the fan trays are N+1 redundant, which means after the first fan failure the fan tray should be replaced as soon as possible to avoid any domain/platform outage. Fan trays are not redundant.
  • The fans in the PSUs do not have their speed controlled by XSCF. The fan speed on these fans is controlled by a microcontroller internal to the PSU.If a fan in a PSU fails, then the PSU is powered off and deconfigured.
  • If there are insufficient operational PSUs to power the platform, then the platform is powered down.
 M8000 specific
  • A failure in cooling group #1 or cooling group #2 will affect the entire platform.
  • If second fan in a specific fantray becomes non-operational or if the fan tray itself fails it then XSCF will not permit to power up the platform. If the failure happens while the platform is powered up, all fan speed in the platform will be raised to high, domains will not be shut down, only warning messages will be issued. In this situation, the system relies on the overtemperature behavior described later in this document.
M9000 specific
  • A failure in cooling group #1 will affect the entire platform. A failure in cooling group #2 or cooling group#3 will affect only their specific FRUs, hence the domains being cooled by that cooling group.    
  • If second fan in a specific fantray becomes non-operational or if the fan tray itself fails it then XSCF will not permit to power up the platform or specific FRUs. If the failure happens while the platform is powered up, all fan speed in the platform will be raised to high, domains will not be shut down, only warning messages will be issued. In this situation, the system relies on the overtemperature behavior described later in this document.

M8000
M9000

Typical fan speed

Fan speed

Normal

High

RPM
FAN_A#
FAN_B#

3700

5500


M8000
M9000

Minimum fan speed below which a fan is declared failed.

Fan speed

Normal

High

RPM
FAN_A#
FAN_B#

3100

4200

 

M8000
M9000

Fan speed relation to inlet temperature (°C) 1500 m or below

Fan speed

Normal

High

threshold temp.

< 24

> 27

Inlet overtemperature set: > 36
Inlet overtemperature reset: < 32



M8000
M9000

Fan speed relation to inlet temperature (°C) 1501 m to 2000 m

Fan speed

Normal

High

threshold temp.

< 22

>25

Inlet overtemperature set: > 34
Inlet overtemperature reset: < 30



M8000
M9000

Fan speed relation to inlet temperature (°C) 2001 m to 2500 m

Fan speed

Normal

High

threshold temp.

< 20

> 23

Inlet overtemperature set: > 32
Inlet overtemperature reset: < 28


M8000
M9000

Fan speed relation to inlet temperature (°C) 2501 m to 3000 m

Fan speed

Normal

High

threshold temp.

< 18

> 21

Inlet overtemperature set: > 30
Inlet overtemperature reset: < 26


M8000
M9000

Overtemperature behaviour (°C)

Overtemperature status

Overtemperature

Overtemperature warning


Overtemperature fail

condition inlet temperature

see tables above

-

-

-

-

condition CPU temperature

set: > 79
reset: < 71

set: > 82

-

set: > 104

-

condition CMU temperature

set: > 61
reset: < 49

set: > 66

-

set: > 76

-

condition XBU temperature

set: > 61
reset: < 49

-

set: > 66

-

set: > 76

Action

Set all fans to high speed

Shutdown domains in CG then power off all hardware in CG

Shutdown all domains then power off platform

emergency power off all hardware in CG

emergency power off platform

ereport

chassis.env.temp.ot@<location>

chassis.env.temp.otw@
<location>


chassis.env.temp.otf@
<location>


Cooling requirements:


Rated Power

Cooling Requirements

Flow


W

BTU/h

KJ/h

m3/min

M3000

470

1603

1692

1.75

M4000

2350

8018

8046

7

M5000

4590

16036

16524

14

M8000

10500

35857

37800

94

M9000-32

21300

72740

76680

102

M9000-64

42600

145479

153360

205

Required environment:


Temperature (°C)

Relative Humidity (%)


Non-Op.

Operating

Non-Op.

Operating


Range

Range

Best Range

Range

Range

Best Range

M3000

0 to 50

-20 to 60 (packed)

0-500 m: 5-35
501-100 m: 5-33
1001-1500 m: 5-31
1501-3000 m: 5-29

21-23

0-93

20-80

45-50

M4000/M5000

0 to 50

-20 to 60 (packed)

0-500 m: 5-35
501-100 m: 5-33
1001-1500 m: 5-31
1501-3000 m: 5-29

21-23

0-93

20-80

45-50

M8000/M9000

0 to 50

0-1500 m: 5-32
1501-2000 m: 5-30
2001-2500 m: 5-28
2501-3000 m: 5-26

22-26

8-80

20-80

40-50


To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - M Series Servers


Internal section

This document describes the state as of XCP 1090

Additional troubleshooting information:

A. In case of:
  • multiple fan failures occurred in a short timeframe
  • all fans of same type rotating exactly at the same speed
please pay attention and consider a fan controller issue before replacing all the affected fan tray(s).

B. FF platforms (M5000 and M4000) have different fan backplanes:
  • M5000 has one Fan Backplane (for 172mm Fans, FAN_A type) that includes the fan controller
  • M4000 has two Fan Backplanes: one for 172mm Fans, one for 60mm Fans (FAN_B type) that includes the fan controller

Keep this in mind when dealing with multiple or repeated fan failures (i.e.: repeated FAN_A fault on M5000 may be due to failed controller, that is not included on the FAN_A backplane); check below the showhardconf output bot both platforms:

M5000

FANBP_C Status:Normal; Ver:0501h; Serial:NN110224CD;
+ FRU-Part-Number:CF00541-3099 01 /541-3099-01 ;
FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FAN_A#2 Status:Normal;
FAN_A#3 Status:Normal;

M4000

FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FANBP_B Status:Normal; Ver:0201h; Serial:BF0844MV6G ;
+ FRU-Part-Number:CF00541-0909 04 /541-0909-04 ;
FAN_B#0 Status:Normal;
FAN_B#1 Status:Normal;


FAN CR's:
  • 6875469 - On OPL M3000 Ikkaku, fan speed is set to level 4 upon domain power up, regardless of conditions.
  • 6870490 - On M4000/M5000/M8000/M9000, fan alarm condition chgs while XSCF is down are ignored on XSCF boot
References:
  • OPL FF & DC Environment preso (Andre Beusch): Environment TOI
  • OPL DC Hardware FAQ (Dan Nygren)
  • Troubleshooting a Noisy FAN on OPL Servers: Doc 1339901.1
  • Tracking page for cases where customers complain about fans being too noisy in M4000 / M5000 systems. Noisy Fan Tracking Page.
Keywords:
OPL, Mx000, thermal temp, fan, fantray, tray, redundancy, M4000, M5000, M8000, M9000, M9000+ speed


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback