Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1005508.1
Update Date:2011-05-27
Keywords:

Solution Type  Troubleshooting Sure

Solution  1005508.1 :   Analyzing Internal LSI RAID Disk Failures  


Related Items
  • Sun Fire X4600 M2 Server
  •  
  • Sun Fire X4200 M2 Server
  •  
  • Sun Ultra 20 Workstation
  •  
  • Sun Fire X4100 M2 Server
  •  
  • Sun Ultra 40 M2 Workstation
  •  
  • Sun Fire X4640 Server
  •  
  • Sun Netra X4200 Server
  •  
  • Sun Fire X4270 Server
  •  
  • Sun Netra X4450 Server
  •  
  • Sun Ultra 27 Workstation
  •  
  • Sun Fire X4140 Server
  •  
  • Sun Fire X4100 Server
  •  
  • Sun Fire X2100 M2 Server
  •  
  • Sun Ultra 40 Workstation
  •  
  • Sun Fire X4600 Server
  •  
  • Sun Fire V20z Server
  •  
  • Sun Fire X4200 Server
  •  
  • Sun Fire X4240 Server
  •  
  • Sun Fire X4150 Server
  •  
  • Sun Fire X4170 Server
  •  
  • Sun Ultra 20 M2 Workstation
  •  
  • Sun Fire X4275 Server
  •  
  • Sun Fire X4450 Server
  •  
  • Sun Netra X4200 M2 Server
  •  
  • Sun Netra X4270 Server
  •  
  • Sun Fire V40z Server
  •  
  • Sun Fire X2200 M2 Server
  •  
  • Sun Ultra 24 Workstation
  •  
  • Sun Netra X4250 Server
  •  
  • Sun Fire X4540 Server
  •  
  • Sun Fire X4440 Server
  •  
  • Sun Fire X4250 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>x64 Servers
  •  

PreviouslyPublishedAs
207637


Applies to:

Sun Fire V40z Server - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Sun Fire X4100 M2 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Fire X4100 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Fire X2100 M2 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Fire X4170 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
All Platforms

Purpose

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Sun x86 Systems

Purpose/Scope:

This document attempts to address failures of internal LSI MPT RAID disks under the Solaris, Red Hat, SuSE/Novell and Windows operating systems.

The LSI storage controller may be embedded into the platform or provided as an optional PCI card.

The 6Gigabit SAS RAID, Sun StorageTEK Intel/Adaptec RAID, and embedded Intel and NVIDIA RAID controllers are not discussed in this document.

Symptoms:

  • Disk service LED illuminated
  • Disk errors in system messages files
  • Disk errors on console
  • Disk SMART errors during the boot process


Last Review Date

March 1, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Steps to Follow:

Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a 
document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to
isolate the issue and identify the proper resolution. Please do not skip a step.

Step 1. Verify a supported platform disk and part number:

The following link references a support document that assists in the identification of a disk part number. In addition, the document provides the public web location of the Oracle SunSolve Handbook to confirm the disk in question is a supported disk for your platform:

1010055.1 Identifying Sun Supported Platform Disks

Disks that are not listed in a platforms documentation and deemed unsupported. This is because they have not been tested and therefore have unknown properties and as such may produce unknown errors.
Even if an unsupported disk appears to work correctly, it is recommended that you always use supported disks for contracted platforms.

Step 2. Verify disk is or is not a member of a RAID array:

The following links reference support documents that assist in identifying if your Solaris, Linux or Windows operating environments are installed as part of a RAID array or not. The Windows instructions are in line:

Windows:

1017961.1 How to Identify if a Solaris[TM] Operating Environment is Installed on a Hardware RAID Controller

Linux:

1013003.1 How to Identify if a Linux Operating Environment is Installed on a Hardware RAID Controller

Windows:

Click on the following:

Right Click on "My Computer" and select "Properties".
Select the "Hardware" tab from the window that appears.
Click on "Device Manager".
Click on "Disk Drives". Installed disk(s) are listed.

If the drive(s) listed are display with the disk name LSI then your platforms drives are under the control of an LSI RAID device.
If however the drive(s) listed display the name(s) Fujitsu, Hitachi or Seagate then your platform is not configured under the control of a RAID device therefore is a JBOD only (Just a Bunch Of Disks).

Troubleshooting steps differ for platforms that are installed under the control or a RAID management device. This is because disks under RAID control are hidden from the operating environment and are referenced as a pseudo or meta-device.

Step 3. Verify RAID status:

We must now identify which disk has failed if the fault is a persistent fault.
The following link references a support document that assists in identifying the current RAID status of a configured array.
This document details checking of RAID status in BIOS and within the Solaris operating environment:

1013107.1 How to Identify BIOS and Solaris Hardware RAID Status

Linux and Windows platforms will need to check RAID status from within BIOS using the above document, unless they install additional monitoring software from the distributed Tools and Drivers CD-ROM.
The package name is MSM-IR (also known as MegaRAID Storage Manager).

If this package is unavailable to the user via CD-ROM, then it can be downloaded from Oracle Support
Click on the appropriate platform then select "Software Downloads" from the Downloads link on the right menu bar.

Step 4. Verify disk is online has has not been going offline and no physical disk hardware problem:

If the disk fault is not persistent and the RAID controller does not report a "Degraded" RAID, we must now attempt to identify intermittent failures.
The following links reference support documents that assist in identifying the online/offline status of directly attached platform disks. These documents also discuss the location of your operating system error logs and the format in which disk errors should appear:

Solaris:

1005530.1 How to Check for Solaris[TM] x64 Disk Errors and Online/Offline Status

Linux:

1002936.1 How to Check for Linux Platform Disk Errors and Online/Offline Status

Windows:

1011590.1 How to check for Windows platform disk errors and online/offline status

Disks that are not directly attached to the platform (for example, those installed in an external storage array), are not discussed in this document.
Storage array disks may have different properties when connected to and behind an external controller and as such change the error syntax and tools used for collection and configuration.

Step 5. Verify disk firmware revision and known applicable issues:

The following link references a support document that assists in identifying the disk model number and firmware revision to check for known issues and if applicable patch updates:

1008396.1 How to Identify Optical and Hard Disk Firmware Revisions for Checking of Known Issues

Patches and firmware updates are often available for disks under multiple operating systems.
Checking for known issues and updates results in decreased downtime.

Step 6. Run information gathering programs and raise an Oracle service request:

The following links reference support documents that assist in the gathering of information from your Solaris, Red Hat, Novell/SuSE and Windows platforms using their own information gathering tools.

Solaris:

1018748.1 How to Run Sun[TM] Explorer and Forward the Data to a Sun Engineer

Novell/SuSE Enterprise Linux:

1010057.1 How to gather information on SuSE Linux Enterprise Systems

Red Hat Enterprise Linux:

1010058.1 How to Gather Information on Red Hat Enterprise Linux Systems

Windows msinfo32:

Click on Start and select Run.
Type "msinfo32" in the text box that appears.
Select the File menu and then select Export.
Provide a file name and send this file to Sun.

This is necessary if the resolution steps above did not resolve your issue and Sun needs to be engaged to continue diagnosis for you. Information gathering programs gather operating system parameters and configuration information from your platform.

At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required. For additional support contact Oracle Support.

Previously Published As 91626


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback