Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1012833.1
Update Date:2010-08-26
Keywords:

Solution Type  Troubleshooting Sure

Solution  1012833.1 :   Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs  


Related Items
  • Sun Storage 6920 System
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
  •  

PreviouslyPublishedAs
217614


Description
This document addresses the identification of failed or failing components in the array via various symptoms provided.


Symptoms:

  • StorADE Alarms(issued via email, or observed in the User Interface)
  • Amber LEDs
  • Loss of Access(Outage, can't find data, etc)
  • Appearance of filesystem corruption
  • Application discovered corruption
  • Application failed/services stopped
  • Application can't read data
  • Bad/Slow performance
  • Data Host Messages
  • Array Event Log Messages

Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.



Steps to Follow
Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs

A. Verify presence of fault(Amber) LEDs on the array

If you are remote and cannot review the array physically, please
skip this step and skip to step B.

1. Use the table below to locate and review component fault LEDs.

Component Component Location LED location Notes
6020 Tray Fault LED NA Drive side of array, far right side of tray(Above Wrench Symbol) None
6020 Drive Drive Slot Middle LED under drive slot Wrench Symbol itself lights up
6020 RAID Controller Below Fans, Middle card Next to Fibre Channel connection(left of Wrench Symbol) None
6020 Loop Cards Below fans on far left or far right of tray Above Cable Connections(left of Wrench Symbol None
6020 Power Supply Unit Housed with Fans on the trays Middle of FRU(below Wrench Symbol) None
DSP Global Fault LED DSP Housing Drive side of array, top of DSP(Left of Wrench Symbol) None
DSP Fan DSP Housing Drive side of array, Middle left on DSP(Below the word FAN) Fan FRUs are accessed via the cabled side. LED is on drive side
DSP SRC Drive Side of DSP Left of word Attention SRCs are located in slots 1-4
DSP SFC Drive Side of DSP Left of word Attention SFCs are located in slots 5-6
DSP SIO Combo Cabled Side of DSP Left side of card, Left of Wrench Symbol SIO Combo Cards are located in slots 1-4, and have a network port
DSP SIO Cabled Side of DSP No FAULT LED SIOs are located in slots 1-4
DSP MIC Cabled Side of DSP Left Side of card, left of the word Attention MICs are located in slots 5-6. Redund LED is amber, and will normally blink for the slot that is in STANDBY state
SP(v100) Top of Rack Cabled side of Rack Right side of SP
SP(v210) Top of Rack Cabled side of Rack Middle of SP, to the right of wrench symbol
SPA Tray Top of Rack, Below SP NO FAULT LEDs None
10/100 Hub Top of Rack, Below SPA NO FAULT LEDs None

2. Note any fault LEDs that are lit, and continue to Step B.

B. Verify the presence of an alarm on the array.

  1. Log into the browser interface for your array: https://<my_array_address>:6789
  2. Click on the Storage Automated Diagnostic Environment link.
  3. Click on the Alarms tab. If there are no alarms, skip to step D. If there are no alarms AND there is a fault LED noted from step A, skip to step E.
  4. Continue to step C for each alarm listed.

C. Check the details of the alarm

NOTE: Some alarms are generated to indicate that a component needs to be replaced. For those alarms that link to a Service Adviser action(see step 3), suggesting part replacement, it is recommended that you contact Sun Services, before taking action

  1. Click on the Details link in the alarms table.
  2. If the Event Code field is listed in the following table, complete the instructions associated with it.
  3. If the Event Code field is NOT listed below, Follow the Recommended Actions provided by the [ Click here to access the Service Adviser ] link. This may require a service call to get a replacement part.
  4. If the Recommended Actions were followed and this did not alleviate your symptoms go to step D.
EventCode Instructions
30.12.31 Please collect alarm details, and go to Step E.
30.20.451 Snapshots must be deleted to recover. Please review The 6920 Best Practices Guide
30.20.452 Please follow alarm recommendations, and review The 6920 Best Practices Guide
30.20.457 Please follow alarm recommendations, and review The 6920 Best Practices Guide
30.34.21 Please collect alarm details and Solution Extract per alarm Service Action, and go to Step E.
30.36.16 Please collect alarm details, and go to Step E.
38.5.48 Please collect alarm details, and go to Step E.
38.20.13 Please collect alarm details, and go to Step E.
38.20.74 Please collect alarm details and Solution Extract per alarm Service Action, and go to Step E.
38.20.204 Contact Sun Support for a disk drive replacement
38.34.71 Please collect alarm details, and go to Step E.

D. Verify whether there are Events being logged

  1. Log into the browser interface for your array: https://<my_array_address>:6789
  2. Click on the Storage Automated Diagnostic Environment link.
  3. Click on the Administration Tab.
  4. Click on the Event Log Tab.
  5. Verify whether there are any Events logged at or around the time of any other symptoms you may be having.
  6. For each log entry identified, click on the "Details" link.
  7. Check the Event Code in the Event Details page, against the table below. Follow the directions associated with the Event Code. If the Event Code is NOT listed in the table below, go to step E.
EventCode Instructions
38.13.354 A Diagnostic test has been run and failed on your 6020 tray. Please go to Step E.

E. Additional Data Collection

If after following the above troubleshooting steps, you have not resolved your potential hardware issue, please collect:

  • Detailed description of hardware problem(host messages, array messages, etc.)
  • Any Amber LED status.
  • Array Solution Extract. Refer to <Document: 1003756.1>  How to collect an extractor from a Sun StorEdge[TM] 6920 (2.x and 3.x)
  • Whether there is a fault LED present without an alarm.
  • Solaris Explorer Collection. Refer to <Document: 1006990.1>  Sun[TM] Explorer Implementation Best Practice
  • Microsoft Windows[R] Data Collection. Refer to <Document: 1006608.1>  Microsoft Windows? operating system: How to obtain troubleshooting information for storage issues
  • Any other data you deem pertinent.

and contact Sun Support



Product
Sun StorageTek 6920 System
Sun StorageTek 6920 Maintenance Update 2
Sun StorageTek 6920 Maintenance Update 1

Internal Comments
Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs

This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the “Document Feedback” alias(es) listed below:

[email protected]



Instructions for Sun Support



F. Review LEDs on system


Are the LEDs Solid Amber? If so, go to Step G., after identifying the

components based on the table in step A. Most Amber LED's will have an Alarm

associated with them.



Note: A USB Flash with a Solid
RED
LED is not a fault. Refer to <Document: 1018698.1>
: Sun StorEdge[tm] 6320/6920 Service Processor USB flash disk LED is red



G. Review the Alarms logged by the system



  1. Recommend component replacement based on alarms logged. If multiple FRU's have failed based on the total number of alarms, skip to Step #.

  2. If the Alarm is listed in the table below, follow the recommended actions.

  3. If there are no alarms, continue to Step H.



















Alarm Code Action
30.12.31 Follow Recommended actions in the Alarm Details, if components cannot be pinged from the SP, reference < Solution: 209097 > : Sun StorEdge[TM] 6920 system: Unable to connect to DSP from SP via telnet
30.20.222 If the MIC is not in a FAULT or OFFLINE state, reference <Document: 1018149.1> : Verify MIC (Management Interface Card) Status on DSP-1000 (Sun StorEdge[TM] 6920) to verify which MIC is Master or Slave, and go to step H.
38.5.48 This requires a change in the "enable_volslice" parameter in the array called out. This can only be done by logging into the array


H. Review Event Codes



  1. If the Event Code is listed in the table below, follow the recommended actions. Otherwise complete the recommended actions described in the Event.

  2. If there are no Events in the array, continue to Step I










Event Code Action

There are no Event Codes that require special attention beyond normal health check provided by Step H


I. Review 6920 Health


Review customer provided extractor for overall health status. A good guideline

is to review the contents of Document:



  •   <Document: 1005447.1> Sun StorEdge[TM] 6920 System Health Checklist. If no faults are found, continue to Step I.



J. Review of Data Host System Health


Host system health should be reviewed from a 6920 Centric perspective. In basic terms, this means that the host data collection should be "mined" for the following information:


   * SCSI errors
* Fibre Channel errors
* Path status information(luxadm display for Solaris, sstm for Windows/Other)

This should be adjusted to the time on the SP of the 6920. Host problems that correlate to information found in the messages.dsp file, found in this step

could indicate a fault in one or more of the following major components:



  • FC Switch

  • FC SFP

  • FC Host Bust Adapter

  • FC Cables between all segments


Refer to Troubleshooting Document:



<Document: 1009557.1> : Troubleshooting Fibre Channel Devices from the OS


If no host hardware problems are identified, continue to Step K.



K. Escalation


Escalate to the next level of support providing the following data in a central location:



  • Detailed description of hardware problem(host messages, array messages, etc.)

  • Any Amber LED status.

  • Array Solution Extract. Refer to <Document: 1003756.1>  How to collect an extractor from a Sun StorEdge[TM] 6920 (2.x and 3.x)

  • Whether there is a fault LED present without an alarm.

  • Solaris Explorer Collection. Refer to <Document: 1006990.1> Sun[TM] Explorer Implementation Best Practice

  • Microsoft Windows[R] Data Collection. Refer to <Document: 1006608.1>  Microsoft Windows? operating system: How to obtain troubleshooting information for storage issues

  • Results of reviewing the extractor from Step I.

The Knowledge Work Queue for this article is KNO-STO-MIDRANGE_DISK.

6920, system1, unity, LED, Alarm, Event, Health, normalized, Audited
Previously Published As
89103

Change History
Date: 2007-12-10
User Name: 7058
Action: Approved
Comment: Fixed Tmark in title. KE audit.
Version: 6
Date: 2007-12-10
User Name: 7058
Action: Update Started
Comment: Title missing Tmark
Version: 0
Date: 2007-09-11
User Name: 7058
Action: Approved
Comment: Fixed 2 instances of a link that was pointing to 81801 when actually it should have been pointing to 81805.
Minor punctuation fixes.
Spell ck OK.
Other dependent docs are now in final review and will be published soon.
OK to publish.
Version: 5


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback