InfoDoc ID   Synopsis   Date
47086   AUTOMATIC SYSTEM RECOVERY (ASR) - Sun Fire [TM] v880   16 Jan 2003

Status Issued

Description

AUTOMATIC SYSTEM RECOVERY (ASR) - Sun Fire V880 (OBP version 4.5.x, 4.6.x and Above)

The automatic system recovery feature for the Sun Fire v880 enables the system to resume operation in the event a NON-Fatal Error has occurred. When ASR is enabled, the system's firmware diagnostics automatically detect failed hardware components. The OpenBoot firmware will deconfigure the failed components and restore system operation as long as the system is capable of operating without the failed component. The ASR feature enables the system to reboot automatically, without operator intervention.

How to activate ASR:

The following variable's must be set at the OBP:


AUTOMATIC SYSTEM RECOVERY (ASR) for OBP Versions 4.5.x and Below

ok setenv auto-boot? true

ok setenv auto-boot-on-error? true

ok setenv diag-switch? true

ok setenv diag-level max

ok setenv diag-device disk

ok setenv diag-trigger soft-reset

diag-trigger variable settings that can be used:

power-reset (default): Runs firmware diagnostics only on power-on resets, including RSC-initiated power on resets.

error-reset: Runs firmware diagnostics only on power-on resets and resets triggered by hardware errors, including operating system panics, and watchdog reset events. This DOES NOT include software resets.

soft-reset (recommended): Runs firmware diagnostics on all reset events (including software resets).

none: Disables the automatic triggering of firmware diagnostics by any reset event. You can still invoke firmware diagnostics manually by turning the front panel keyswitch to the Diagnostics position prior to powering on the system.

ok reset-all


AUTOMATIC SYSTEM RECOVERY (ASR) for OBP Versions 4.6.x and Above

ok setenv auto-boot? true

ok setenv auto-boot-on-error? true

ok setenv diag-switch? true

ok setenv diag-level max

ok setenv diag-device disk

ok setenv odiag-trigger all-resets

ok setenv post-trigger all-resets

There are several odiag-trigger and post-trigger variable settings that can be used:

power-on-reset(default): Runs firmware diagnostics only on power-on resets, including RSC-initiated power on resets.

error-reset: Runs firmware diagnostics only on power-on resets and resets triggered by hardware errors, including operating system panics, and watchdog reset events. (This DOES NOT include software resets).

user-reset: Runs firmware diagnostics on user initiated rest events.

all-resets (recommended): Runs firmware diagnostics on all reset events (including software resets). This is the recommended setting.

none: Disables the automatic triggering of firmware diagnostics by any reset event. You can still invoke firmware diagnostics manually by turning the front panel keyswitch to the diagnostics position prior to powering on the system.

Note: the post-trigger and obdiag-trigger variables have no effect unless diag-switch? is set to true.

Note: By setting diag-level to max the system will run through more in-depth diagnostic testing giving the system its best chance to ASR out the defective hardware. ASR will not function to its fullest without max being set.

ok reset-all


ASR falls into 3 different categories:

1. No Error's

If no errors are detected by POST or OpenBoot Diagnostics, the system attempts to boot if auto-boot? is true.

2. Non-Fatal Error's

If only non-fatal errors are detected by POST and/or OBP (errors that will not necessarily prevent the system from booting Solaris) the system attempts to boot if auto-boot? is true and auto-boot-on-error? is true. Non-fatal errors include the following:

3. Fatal Error's

If a fatal error is detected by POST and/or OBP, the system will not boot regardless of the settings of auto-boot? or auto-boot-on-error?. The system will not attempt to boot, and may not get to the ok prompt. Fatal non-recoverable errors include the following:

To view a list of components that can be manually enabled or disabled by ASR, type the following at the ok prompt:

ok .asr

To enable or disable components under asr:

ok asr-enable <dev-id>

ok asr-disable <dev-id>

Where <dev-id> is an absolute device path, a device alias, or a device label.

Valid device labels include:

cpu7-bank3 cpu7-bank2 cpu7-bank1 cpu7-bank0

cpu6-bank3 cpu6-bank2 cpu6-bank1 cpu6-bank0

cpu5-bank3 cpu5-bank2 cpu5-bank1 cpu5-bank0

cpu4-bank3 cpu4-bank2 cpu4-bank1 cpu4-bank0

cpu3-bank3 cpu3-bank2 cpu3-bank1 cpu3-bank0

cpu2-bank3 cpu2-bank2 cpu2-bank1 cpu2-bank0

cpu1-bank3 cpu1-bank2 cpu1-bank1 cpu1-bank0

cpu0-bank3 cpu0-bank2 cpu0-bank1 cpu0-bank0

pci-slot8 pci-slot7 pci-slot6 pci-slot5 pci-slot4

pci-slot3 pci-slot2 pci-slot1 pci-slot0

gptwo-slotd gptwo-slotc gptwo-slotb gptwo-slota

ob-gem ob-fcal ob-scsi

hba9 hba8

cpu7 cpu6 cpu5 cpu4 cpu3 cpu2 cpu1 cpu0

INTERNAL SUMMARY:

Internal Summary:

[email protected]

for more information, please reference:

http://cpre-amer.east/vsp/wgs/products/daktari/asr_index.html

SUBMITTER: Alicia Brown APPLIES TO: AFO Vertical Team Docs/Hardware ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.