Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs

Asset ID:	1-75-1012833.1
Update Date:	2010-08-26
Keywords:

Solution Type Troubleshooting Sure

Solution 1012833.1 : Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs

Related Items


Sun Storage 6920 System

Related Categories


GCS>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays

PreviouslyPublishedAs
217614

Description
This document addresses the identification of failed or failing components in the array via various symptoms provided.

Symptoms:

StorADE Alarms(issued via email, or observed in the User Interface)
Amber LEDs
Loss of Access(Outage, can't find data, etc)
Appearance of filesystem corruption
Application discovered corruption
Application failed/services stopped
Application can't read data
Bad/Slow performance
Data Host Messages
Array Event Log Messages

Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

Steps to Follow
Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs

A. Verify presence of fault(Amber) LEDs on the array

If you are remote and cannot review the array physically, please
skip this step and skip to step B.

1. Use the table below to locate and review component fault LEDs.

Component	Component Location	LED location	Notes
6020 Tray Fault LED	NA	Drive side of array, far right side of tray(Above Wrench Symbol)	None
6020 Drive	Drive Slot	Middle LED under drive slot	Wrench Symbol itself lights up
6020 RAID Controller	Below Fans, Middle card	Next to Fibre Channel connection(left of Wrench Symbol)	None
6020 Loop Cards	Below fans on far left or far right of tray	Above Cable Connections(left of Wrench Symbol	None
6020 Power Supply Unit	Housed with Fans on the trays	Middle of FRU(below Wrench Symbol)	None
DSP Global Fault LED	DSP Housing	Drive side of array, top of DSP(Left of Wrench Symbol)	None
DSP Fan	DSP Housing	Drive side of array, Middle left on DSP(Below the word FAN)	Fan FRUs are accessed via the cabled side. LED is on drive side
DSP SRC	Drive Side of DSP	Left of word Attention	SRCs are located in slots 1-4
DSP SFC	Drive Side of DSP	Left of word Attention	SFCs are located in slots 5-6
DSP SIO Combo	Cabled Side of DSP	Left side of card, Left of Wrench Symbol	SIO Combo Cards are located in slots 1-4, and have a network port
DSP SIO	Cabled Side of DSP	No FAULT LED	SIOs are located in slots 1-4
DSP MIC	Cabled Side of DSP	Left Side of card, left of the word Attention	MICs are located in slots 5-6. Redund LED is amber, and will normally blink for the slot that is in STANDBY state
SP(v100)	Top of Rack	Cabled side of Rack	Right side of SP
SP(v210)	Top of Rack	Cabled side of Rack	Middle of SP, to the right of wrench symbol
SPA Tray	Top of Rack, Below SP	NO FAULT LEDs	None
10/100 Hub	Top of Rack, Below SPA	NO FAULT LEDs	None

2. Note any fault LEDs that are lit, and continue to Step B.

B. Verify the presence of an alarm on the array.

Log into the browser interface for your array: https://<my_array_address>:6789
Click on the Storage Automated Diagnostic Environment link.
Click on the Alarms tab. If there are no alarms, skip to step D. If there are no alarms AND there is a fault LED noted from step A, skip to step E.
Continue to step C for each alarm listed.

C. Check the details of the alarm

NOTE: Some alarms are generated to indicate that a component needs to be replaced. For those alarms that link to a Service Adviser action(see step 3), suggesting part replacement, it is recommended that you contact Sun Services, before taking action

Click on the Details link in the alarms table.
If the Event Code field is listed in the following table, complete the instructions associated with it.
If the Event Code field is NOT listed below, Follow the Recommended Actions provided by the [ Click here to access the Service Adviser ] link. This may require a service call to get a replacement part.
If the Recommended Actions were followed and this did not alleviate your symptoms go to step D.

EventCode	Instructions
30.12.31	Please collect alarm details, and go to Step E.
30.20.451	Snapshots must be deleted to recover. Please review The 6920 Best Practices Guide
30.20.452	Please follow alarm recommendations, and review The 6920 Best Practices Guide
30.20.457	Please follow alarm recommendations, and review The 6920 Best Practices Guide
30.34.21	Please collect alarm details and Solution Extract per alarm Service Action, and go to Step E.
30.36.16	Please collect alarm details, and go to Step E.
38.5.48	Please collect alarm details, and go to Step E.
38.20.13	Please collect alarm details, and go to Step E.
38.20.74	Please collect alarm details and Solution Extract per alarm Service Action, and go to Step E.
38.20.204	Contact Sun Support for a disk drive replacement
38.34.71	Please collect alarm details, and go to Step E.

D. Verify whether there are Events being logged

Log into the browser interface for your array: https://<my_array_address>:6789
Click on the Storage Automated Diagnostic Environment link.
Click on the Administration Tab.
Click on the Event Log Tab.
Verify whether there are any Events logged at or around the time of any other symptoms you may be having.
For each log entry identified, click on the "Details" link.
Check the Event Code in the Event Details page, against the table below. Follow the directions associated with the Event Code. If the Event Code is NOT listed in the table below, go to step E.

EventCode	Instructions
38.13.354	A Diagnostic test has been run and failed on your 6020 tray. Please go to Step E.

E. Additional Data Collection

If after following the above troubleshooting steps, you have not resolved your potential hardware issue, please collect:

Detailed description of hardware problem(host messages, array messages, etc.)
Any Amber LED status.
Array Solution Extract. Refer to <Document: 1003756.1> How to collect an extractor from a Sun StorEdge[TM] 6920 (2.x and 3.x)
Whether there is a fault LED present without an alarm.
Solaris Explorer Collection. Refer to <Document: 1006990.1> Sun[TM] Explorer Implementation Best Practice
Microsoft Windows[R] Data Collection. Refer to <Document: 1006608.1> Microsoft Windows? operating system: How to obtain troubleshooting information for storage issues
Any other data you deem pertinent.

and contact Sun Support

Product
Sun StorageTek 6920 System
Sun StorageTek 6920 Maintenance Update 2
Sun StorageTek 6920 Maintenance Update 1

Internal Comments
Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs

This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the “Document Feedback” alias(es) listed below:

[email protected]

Instructions for Sun Support

F. Review LEDs on system

Are the LEDs Solid Amber? If so, go to Step G., after identifying the

components based on the table in step A. Most Amber LED's will have an Alarm

associated with them.

Note: A USB Flash with a Solid
RED
LED is not a fault. Refer to <Document: 1018698.1>
: Sun StorEdge[tm] 6320/6920 Service Processor USB flash disk LED is red

G. Review the Alarms logged by the system

Recommend component replacement based on alarms logged. If multiple FRU's have failed based on the total number of alarms, skip to Step #.

If the Alarm is listed in the table below, follow the recommended actions.

If there are no alarms, continue to Step H.

Alarm Code	Action
30.12.31	Follow Recommended actions in the Alarm Details, if components cannot be pinged from the SP, reference < Solution: 209097 > : Sun StorEdge[TM] 6920 system: Unable to connect to DSP from SP via telnet
30.20.222	If the MIC is not in a FAULT or OFFLINE state, reference <Document: 1018149.1> : Verify MIC (Management Interface Card) Status on DSP-1000 (Sun StorEdge[TM] 6920) to verify which MIC is Master or Slave, and go to step H.
38.5.48	This requires a change in the "enable_volslice" parameter in the array called out. This can only be done by logging into the array

H. Review Event Codes

If the Event Code is listed in the table below, follow the recommended actions. Otherwise complete the recommended actions described in the Event.

If there are no Events in the array, continue to Step I

Event Code	Action
	There are no Event Codes that require special attention beyond normal health check provided by Step H

I. Review 6920 Health

Review customer provided extractor for overall health status. A good guideline

is to review the contents of Document:

<Document: 1005447.1> Sun StorEdge[TM] 6920 System Health Checklist. If no faults are found, continue to Step I.

J. Review of Data Host System Health

Host system health should be reviewed from a 6920 Centric perspective. In basic terms, this means that the host data collection should be "mined" for the following information:

   * SCSI errors

    * Fibre Channel errors

    * Path status information(luxadm display for Solaris, sstm for Windows/Other)

This should be adjusted to the time on the SP of the 6920. Host problems that correlate to information found in the messages.dsp file, found in this step

could indicate a fault in one or more of the following major components:

FC Switch

FC SFP

FC Host Bust Adapter

FC Cables between all segments

Refer to Troubleshooting Document:

<Document: 1009557.1> : Troubleshooting Fibre Channel Devices from the OS

If no host hardware problems are identified, continue to Step K.

K. Escalation

Escalate to the next level of support providing the following data in a central location:

Detailed description of hardware problem(host messages, array messages, etc.)

Any Amber LED status.

Array Solution Extract. Refer to <Document: 1003756.1> How to collect an extractor from a Sun StorEdge[TM] 6920 (2.x and 3.x)

Whether there is a fault LED present without an alarm.

Solaris Explorer Collection. Refer to <Document: 1006990.1> Sun[TM] Explorer Implementation Best Practice

Microsoft Windows[R] Data Collection. Refer to <Document: 1006608.1> Microsoft Windows? operating system: How to obtain troubleshooting information for storage issues

Results of reviewing the extractor from Step I.

The Knowledge Work Queue for this article is KNO-STO-MIDRANGE_DISK.

6920, system1, unity, LED, Alarm, Event, Health, normalized, Audited
Previously Published As
89103

Change History
Date: 2007-12-10
User Name: 7058
Action: Approved
Comment: Fixed Tmark in title. KE audit.
Version: 6
Date: 2007-12-10
User Name: 7058
Action: Update Started
Comment: Title missing Tmark
Version: 0
Date: 2007-09-11
User Name: 7058
Action: Approved
Comment: Fixed 2 instances of a link that was pointing to 81801 when actually it should have been pointing to 81805.
Minor punctuation fixes.
Spell ck OK.
Other dependent docs are now in final review and will be published soon.
OK to publish.
Version: 5

Attachments

This solution has no attachment