![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1020006.1 : Steps to ensure there are no disk failures in LDOM environment
PreviouslyPublishedAs 250806 Description This document would explain how to ensure there are no disk failures in LDOM environment. Steps to Follow Disks need to be maintained in good condition to carry on LDOM stuff and other normal operation too. Below information will provide the details how this will occur and what are the Solaris[TM] command available to find out that. Broadly speaking a hard disk can fail in four ways that will lead to a potential loss of data: 1. Firmware Corruption / Damage to the firmware zone 2. Electronic Failure 3. Mechanical Failure 4. Logical Corruption Combinations of these four types of failure are also possible. 1. Firmware Corruption / Damage to the firmware zone Explantion: Hard disk firmware is the software code that controls, and is embedded in, the physical hard drive hardware. If the irmware of a hard disk becomes corrupted or unreadable the computer is often unable to correctly interact with the hard disk. requently the data on the disk is fully recoverable once the drive has been repaired and reprogrammed. Firmware failures - How to diagnose: Common Symptoms * The hard disk will spin up when powered on, but be incorrectly recognised / not recognised at all by the computer * The hard disk will spin up & be recognised correctly by the computer but the system will then hang during the boot process 2. Electronic Failure Explanation: Electronic failure usually relates to problems on the controller board of the actual hard disk. The computer may suffer a power spike or electrical surge that knocks out the controller board on the hard disk making it undetectable to the BIOS. Electrical failures - How to diagnose: Common Symptom * The hard disk will not spin up when the drive is powered on - it will appear dead & not be recognised by the computer 3. Mechanical Failure Mechanical hard disk failures are those which develop on components internal to the hard disk itself. Often as soon as an internal omponent goes faulty the data on the hard disk will become inaccessible. Mechanical failures - How to diagnose: Common Symptoms * When powered on, the hard drive will immediately begin to make a regular ticking or clicking sound 4. Logical Errors Often the easiest and the most difficult problems to deal with, logical errors can range from simple things such as an invalid entry in a file allocation table to truly horrific problems such as the corruption and loss of the file system on a severely fragmented drive. Logical errors are different to the electrical and mechanical problems above as there is usually nothing 'physically' wrong with the disk, just the information on it. First use the format command and cfgadm -al command to see the disk status For example : format AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> /pci@1f,4000/scsi@3/sd@0,0 1. c0t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> /pci@1f,4000/scsi@3/sd@1,0 2. c0t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> /pci@1f,4000/scsi@3/sd@2,0 3. c0t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> /pci@1f,4000/scsi@3/sd@3,0 Here is the 'cfgadm' display for controller c0: cfgadm -al Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected configured unknown c0::dsk/c0t2d0 disk connected configured unknown c0::dsk/c0t3d0 disk connected configured unknown Use the iostat -En whic shows the status too : c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: FUJITSU Product: MAY2073RCSUN72G Revision: 0501 Serial No: 0706S08GSV Size: 73.40GB <73400057856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c1t0d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 Vendor: MATSHITA Product: CD-RW CW-8124 Revision: DZ13 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 2 Predictive Failure Analysis: 0 c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: FUJITSU Product: MAY2073RCSUN72G Revision: 0501 Serial No: 0706S08GST Size: 73.40GB <73400057856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST973401LSUN72G Revision: 0556 Serial No: 071111MCJT Size: 73.40GB <73400057856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST973401LSUN72G Revision: 0556 Serial No: 071111MCDV Size: 73.40GB <73400057856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 Use the command : prtdiag -v to show the disk as well as machine status Solaris[TM] 10 11/06 also known as Solaris[TM] 10 Update 3 has the following feature to know the disk status of the machine: A new Fault Management Architecture-based diagnosis engine (DE) is provided on the Sun machine. This DE monitors the disk drives for predictive failures by using the SMART technology in the disk drive's own firmware. When a disk failure is imminent, the LED next to the disk is illuminated and a Fault Management Architecture fault is generated. This fault alerts the administrator to take specific action to ensure system availability and full performance. We have the below features in T5140 and T5240 machines : Disk mirroring (RAID 1) is a technique that uses data redundancy (two complete copies of all data stored on two separate disks) to protect against loss of data due to disk failure. One logical volume is duplicated on two separate disks. Product Sun Fire T2000 Server Sun Fire T1000 Server Sun Netra T2000 Server Sun Netra T5220 Server Sun Netra T5440 Server Netra T5220 AC Sun SPARC Enterprise T5220 Server Sun SPARC Enterprise T5240 Server Sun Blade T6300 Server Module Sun Blade T6320 Server Module Sun SPARC Enterprise T5120 Server Sun SPARC Enterprise T5140 Server Sun Blade T6340 Server Module Sun SPARC Enterprise T2000 Server Sun SPARC Enterprise T5440 Server Sun SPARC Enterprise T1000 Server Internal Comments This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below: Solairs OS Domain Feedback Alias : [email protected] Normalized Attachments This solution has no attachment |
||||||||||||
|