How to replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure)

Asset ID:	1-71-1479736.1
Update Date:	2012-09-07
Keywords:

Solution Type Technical Instruction Sure

Solution 1479736.1 : How to replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure)

Applies to:

Exadata Database Machine X2-2 Qtr Rack - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine V2 - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine X2-8 - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine X2-2 Hardware - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine X2-2 Full Rack - Version Not Applicable to Not Applicable [Release N/A]
Oracle Solaris on x86-64 (64-bit)
Information in this document applies to any platform.

Goal

Identify and replace a failed hard disk drive from an Exadata Compute (Database) node for hard or predictive failures.

Fix

DISPATCH INSTRUCTIONS:

The customer may choose to do the replacement themselves. In this case, the disk should be sent out using a parts-only dispatch.

The following information will be required prior to dispatch of a replacement:

Type of Exadata (V2 or X2-2 or X2-8)
Type of database node (V2=x4170 / X2-2 = x4170m2 / X2-8 = x4800 or x4800m2)
Name/location of database node
Slot number of failed drive
Image Version (output of "/opt/oracle.cellos/imageinfo -all")

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Linux megaraid familiiarity

TIME ESTIMATE: 60 minutes

Complete time may be dependent on disk re-sync time.

TASK COMPLEXITY: 0

CRU-optional; default is FRU with Task Complexity: 2

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
PROBLEM OVERVIEW: A hard disk in an Exadata V2/X2-2/X2-8 compute (database (DB)) node needs replacing

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Hard disks in Exadata DB nodes are configured into RAID volumes and are hot swappable provided the failed hard disk has been offlined by LSI MegaRAID that manages the volume, and ZFS if running Solaris. The volume contains redundancy and should remain online, while in a degraded state.

The failed hard disk may be marked either “critical” (hard) or “predictive failure”.

For a critical hard failure, the LED for the failed hard disk should have the "OK to Remove" blue LED illuminated/flashing and have the "Service Action Required" amber LED illuminated/flashing. This may trigger alarm HALRT-02007 - refer to Note 1113034.1.

For a predictive failure, the LED for the failed hard disk should have the “Service Action Required” amber LED illuminated/flashing.On certain image revisions, predictive failures may not yet be removed from the volume and may not have a fault LED on. This may trigger alarm HALRT-02008 - refer to Note 1113014.1.

The normal DB node volume arrangement depends on the OS installed and the current active image version. Use “/opt/oracle.cellos/imageinfo” to determine the current active image version, and “uname -s” to determine the OS type. The volumes expected are as follows:

V2/X2-2 Linux only, if dual-boot Solaris image partition has been reclaimed or was not present:

3-disk RAID 5 with 1 global hotspare on images 11.2.3.1.1 and earlier

X2-2 Linux and Solaris dual-boot, if other OS image partitions have not been reclaimed:

2-disk RAID 1 for Linux on images 11.2.2.3.2 and later.
2 single-disk RAID0 as 1 mirrored zpool for Solaris on images 11.2.2.3.2 and later.

X2-2 Solaris only, if dual-boot Linux image partition has been reclaimed:

4 single-disk RAID0 volumes configured into 2 mirrored zpool's (rpool and data), on images 11.2.2.3.2 and later

X2-8 Linux only:

7-disk RAID 5 with 1 global hotspare on images 11.2.3.1.1 and earlier, if dual-boot Solaris image partition has been reclaimed

X2-8 Linux and Solaris dual-boot, if other OS image partitions have not been reclaimed:

3-disk RAID 5 with 1 global hotspare for Linux on images 11.2.3.1.0 and earlier
4 single-disk RAID0 volumes configured into 2 mirrored zpool's (rpool and data) on images 11.2.3.1.0 and earlier
Solaris support on X2-8 was discontinued as of image 11.2.3.1.1

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

1. Backup the volume and be familiar with the restore from bare metal procedure before replacing the disk. See Note 1084360.1 for details.

If the DB node was running 11.2.2.1.1 or 11.2.2.2.x images and was in a state of write-through caching mode at some stage (default is write-back), there is a possibility that the Linux file system is corrupt due to a disk controller firmware bug. When this is encountered the file system may have been operating normally however will go read-only when attempting to rebuild the corrupted blocks across to the hotspare disk. This may be unavoidable as the rebuild copy back from hotspare to replacement occurs automatically. This requires a bare metal restore to correct.

2. Identify the disk using the Amber fault and Blue OK-to-Remove LED states. The DB node server within the rack can be determined from the hostname usually, and the known default Exadata server numbering scheme counting server numbers up from 1 as the lower most DB node in the rack. The server's white Locate LED may be flashing as well.

If still unsure on the slot location, use the following commands to identify the faulted disk:

a. Obtain the enclosure ID for the MegaRAID card:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a0 | grep ID

Device ID : 252 

#

Solaris:

# /opt/MegaRAID/MegaCli -encinfo -a0 | grep ID

Device ID : 252

#

b. Identify the physical disk slot that is failed:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware"

Slot Number: 0 

Firmware state: Unconfigured(bad) 

Slot Number: 1 

Firmware state: Online, Spun Up 

Slot Number: 2 

Firmware state: Online, Spun Up 

Slot Number: 3 

Firmware state: Rebuild

Solaris:

# /opt/MegaRAID/MegaCli -pdlist -a0 | egrep -i "slot|firmware"

"Unconfigured(bad)" is the expected state for the faulted disk. In this example, it is located in physical slot 0, and it can be seen that the Hotspare in slot 3 has started rebuilding the volume.

If all disks show as Online or Hotspare, then the disk may be in predictive failure state but not yet gone offline. The failed disk can be identified using this additional information:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|predictive|firmware"

Slot Number: 0 

Predictive Failure Count: 0 

Last Predictive Failure Event Seq Number: 0 

Firmware state: Online, Spun Up 

Slot Number: 1 

Predictive Failure Count: 12 

Last Predictive Failure Event Seq Number: 0 

Firmware state: Online, Spun Up 

Slot Number: 2 

Predictive Failure Count: 0 

Last Predictive Failure Event Seq Number: 0 

Firmware state: Online, Spun Up 

Slot Number: 3 

Predictive Failure Count: 0 

Last Predictive Failure Event Seq Number: 0 

Firmware state: Hotspare, Spun down

Solaris:

# /opt/MegaRAID/MegaCli -pdlist -a0 | egrep -i "slot|predictive|firmware"

In this example, the disk in slot 1 has reported itself as predictive failed several times but is still online. This disk should be considered the bad one. For more details refer to Note 1452325.1.

c. Use the locate function which turns the "Service Action Required" amber LED on flashing:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[E#:S#] -a0

Solaris:

# /opt/MegaRAID/MegaCli -PdLocate -start -physdrv[E#:S#] -a0

where E# is the enclosure ID number identified in step a, and S# is the slot number of the disk identified in step b. In the example above, the command would be:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[252:0] -a0

3. Verify the state of the RAID is optimal or rebuilding if there is a hotspare, or degraded if there is not, with the good disk(s) online before hot-swap removing the failed disk.

If the failed disk was the global hotspare, then this step should be skipped.

Linux (RAID5 Example):

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"

Virtual Drive: 0 (Target Id: 0) 

State : Degraded 

Slot Number: 3 

Firmware state: Rebuild 

Foreign State: None 

Slot Number: 1 

Firmware state: Online, Spun Up 

Foreign State: None 

Slot Number: 2 

Firmware state: Online, Spun Up 

Foreign State: None 

#

Linux (RAID1 Example):

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot" 

Virtual Drive: 0 (Target Id: 0) 

State : Degraded 

Slot Number: 0 

Firmware state: Online, Spun Up 

Foreign State: None 

Slot Number: 1 

Firmware state: Unconfigured(bad) 

Foreign State: None 

Virtual Drive: 1 (Target Id: 1) 

State : Optimal 

Slot Number: 2 

Firmware state: Online, Spun Up 

Foreign State: None 

Virtual Drive: 2 (Target Id: 2) 

State : Optimal 

Slot Number: 3 

Firmware state: Online, Spun Up 

Foreign State: None

#

Solaris:

The volume type on Solaris is RAID0, and then the failure may cause the virtual drive to no longer be visible. In that case, check the expected number of good drives are present and online (3 of the 4 in X2-2 or 6 of the 8 in X2-8 where the hotspare does not show in this command), and verify the zpool status is degraded with 1 of the mirrors online:

# /opt/MegaRAID/MegaCli -LdPdInfo -a0 | egrep -i "target|state|slot"

Virtual Drive: 0 (Target Id: 0) 

State : Optimal 

Slot Number: 0 

Firmware state: Online, Spun Up 

Foreign State: None 

Slot Number: 1 

Firmware state: Online, Spun Up 

Foreign State: None 

Virtual Drive: 1 (Target Id: 1) 

State : Optimal 

Slot Number: 2 

Firmware state: Online, Spun Up 

Foreign State: None

# zpool status

 pool: rpool

 state: DEGRADED 

status: One or more devices has been removed by the administrator. 

     Sufficient replicas exist for the pool to continue functioning in a 

     degraded state. 

action: Online the device using 'zpool online' or replace the device with

     'zpool replace'. 

  scan: resilvered 9.87G in 0h1m with 0 errors on Tue Jul 10 16:35:50 2012 

config: 



     NAME STATE READ WRITE CKSUM 

     rpool DEGRADED 0 0 0 

      mirror-0 DEGRADED 0 0 0 

       c3t1d0s0 ONLINE 0 0 0 

       c3t2d0s0 REMOVED 0 0 0 



errors: No known data errors 

#

4. On the drive you plan to remove, push the storage drive release button to open the latch.

5. Grasp the latch and pull the drive out of the drive slot (Caution: The latch is not an ejector. Do not bend it too far to the right. Doing so can damage the latch. Also, whenever you remove a storage drive, you should replace it with another storage drive or a filler panel, otherwise the server might overheat due to improper airflow.)

6. Wait three minutes for the system to acknowledge the disk has been removed.

7. Slide the new drive into the drive slot until it is fully seated.

8. Close the latch to lock the drive in place.

9. Verify the "OK/Activity" Green LED begins to flicker as the system recognizes the new drive. The other two LEDs for the drive should no longer be illuminated. The server's locate and disk's service LED locate blinking function should automatically turn off.

If it does not, it can be manually turned off for the device using:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[E#:S#] -a0

Solaris:

# /opt/MegaRAID/MegaCli -PdLocate -stop -physdrv[E#:S#] -a0

where E# is the enclosure ID number identified in step 2a, and S# is the slot number of the disk identified in step 2b. In the example above, the command would be:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[252:0] -a0

OBTAIN CUSTOMER ACCEPTANCE
WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

1. Verify the disk is brought online into a volume by LSI MegaRAID. Until the disk is added into a volume, the OS will not be able to use the disk.

If the OS is Linux, depending on the volume arrangement and image version, the disk may automatically become the new hotspare disk, or it may stay in an Unconfigured(good) state until the hotspare rebuild has completed. If it stays Unconfigured then the hotspare will copy back to rebuild on the new disk after the rebuild has completed. If it is a RAID1 then it should automatically come into the volume and start rebuilding.

If the OS is Solaris, it is a Solaris RAID0 volume, and may not come into a volume automatically and will be in state Unconfigured(good) until it is in a volume.

Use the following to verify the physical disk is in one of these expected states – Hotspare, Unconfigured(good), Copyback, or Online:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[E#:Slot#] -a0

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk replaced. In the example above, the command and output would be:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[252:0] -a0



Adapter #0 



Enclosure Device ID: 252

Slot Number: 0

Device Id: 10

Sequence Number: 7

Media Error Count: 0

Other Error Count: 0

Predictive Failure Count: 0

Last Predictive Failure Event Seq Number: 0

PD Type: SAS

Raw Size: 136.727 GB [0x11174b81 Sectors]

Non Coerced Size: 136.227 GB [0x11074b81 Sectors]

Coerced Size: 136.218 GB [0x11070000 Sectors]

Firmware state: Unconfigured(good), Spun Up

SAS Address(0): 0x5000cca00a1b817d

SAS Address(1): 0x0

Connected Port Number: 2(path0)

Inquiry Data: HITACHI H103014SCSUN146GA1600934FH3Y8E

FDE Capable: Not Capable

FDE Enable: Disable

Secured: Unsecured

Locked: Unlocked

Needs EKM Attention: No

Foreign State: None

Device Speed: 6.0Gb/s

Link Speed: 6.0Gb/s

Media Type: Hard Disk Device

Drive: Not Certified

2 Verify the replacement disk has been added to the expected RAID volume.

If the OS is running Linux and the failed disk was originally the global hotspare, then the replacement should have become the hotspare automatically, identified in step 1, and this step should be skipped. If that did not occur automatically, then the new disk can be assigned as the hotspare with the following command:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdHsp -set -EnclAffinity -PhysDrv[E#:Slot#] -a0

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk replaced.

If the OS is running Linux and the failed disk was part of a RAID volume, use the following MegaRAID command to verify the status of the RAID:

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"

If it has already completed the copyback when checked, then it may already be in “Online” state. If it is in rebuilding or copyback state, you can use the following to verify progress to completion:

# /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physrv [E#:Slot#]

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Rebuild state. This is typically the original Hotspare disk slot.

# /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physrv [252:3]



Rebuild Progress on Device at Enclosure 252, Slot 3 Completed 9% in 3 Minutes. 



Exit Code: 0x00

#

# /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physrv [E#:Slot#]

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Copyback state. This is typically the replaced disk slot.

# /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physdrv [252:0] -a0 



Copyback Progress on Device at Enclosure 252, Slot 0 Completed 79% in 29 Minutes. 



Exit Code: 0x00 

#

If the OS is running Solaris, the RAID0 MegaRAID volume may need to be recreated, if it was not done so automatically. In this example the rpool mirror disk in slot 3 was failed:

# /opt/MegaRAID/MegaCli -cfgldadd -r0[252:3] wb nora direct nocachedbadbbu -strpsz1024 -a0 



Adapter 0: Created VD 2 



Adapter 0: Configured the Adapter!! 



Exit Code: 0x00

Use format to partition the disk with a full-disk Solaris label, single cylinder boot block on slice 8, and the rest of the disk as root partition on slice 0.

# format -e

Searching for disks...done 



c3t0d0: configured with capacity of 275.53GB 



AVAILABLE DISK SELECTIONS: 

   0. c3t0d0 <LSI-MR9261-8i-2.12 cyl 281 alt 2 hd 255 sec 8064>

      /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@0,0 

   1. c3t1d0 <LSI-MR9261-8i-2.12 cyl 36348 alt 2 hd 255 sec 63> ai-disk 

      /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@1,0 

   2. c3t2d0 <LSI-MR9261-8i-2.12 cyl 36348 alt 2 hd 255 sec 63> 

      /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@2,0 

Specify disk (enter its number): 2 

selecting c3t2d0 [disk formatted] 

No Solaris fdisk partition found. 



FORMAT MENU: 

   disk - select a disk 

   type - select (define) a disk type 

   partition - select (define) a partition table 

   current - describe the current disk 

   format - format and analyze the disk 

   fdisk - run the fdisk program 

   repair - repair a defective sector 

   label - write label to the disk 

   analyze - surface analysis 

   defect - defect list management 

   backup - search for backup labels 

   verify - read and display labels 

   save - save new disk/partition definitions 

   inquiry - show disk ID 

   volname - set 8-character volume name 

   !<cmd> - execute <cmd>, then return 

   quit 

format> fdisk 

No fdisk table exists. The default partition for the disk is: 



 a 100% "SOLARIS System" partition 



Type "y" to accept the default partition, otherwise type "n" to edit the

 partition table. 

Y 



format> ver 

Warning: Primary label on disk appears to be different from 

current label. 



Warning: Check the current partitioning and 'label' the disk or use the 

    'backup' command. 



Primary label contents: 



Volume name = < > 

ascii name = <DEFAULT cyl 36348 alt 2 hd 255 sec 63> 

pcyl = 36350 

ncyl = 36348 

acyl = 2 

bcyl = 0 

nhead = 255 

nsect = 63 

Part Tag Flag Cylinders Size Blocks 

0 root wm 1 - 36347 278.43GB (36347/0/0) 583914555 

1 unassigned wu 0 0 (0/0/0) 0 

2 backup wu 0 - 36349 278.46GB (36350/0/0) 583962750 

3 unassigned wu 0 0 (0/0/0) 0 

4 unassigned wu 0 0 (0/0/0) 0 

5 unassigned wu 0 0 (0/0/0) 0 

6 unassigned wu 0 0 (0/0/0) 0 

7 unassigned wu 0 0 (0/0/0) 0 

8 boot wu 0 - 0 7.84MB (1/0/0) 16065 

9 unassigned wu 0 0 (0/0/0) 0 



format> label 

Ready to label disk, continue? y

 

format> q

#

Re-attach the new disk to the zpool. Use -f option if this is a mounted root pool:

# zpool attach -f rpool c3t1d0s0 c3t2d0s0

Make sure to wait until resilver is done before rebooting.

If this was one of the 2 boot disks in the root pool, then re-enable booting:

# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3t2d0s0 

stage2 written to partition 0, 282 sectors starting at 50 (abs 16115)

Verify status of the zpool rebuilding:

# zpool status

  pool: rpool

 state: ONLINE 

status: One or more devices is currently being resilvered. The pool will

        continue to function, possibly in a degraded state. 

action: Wait for the resilver to complete.   

  scan: resilver in progress since Tue Jul 17 17:25:18 2012

    32.9G scanned out of 35.9G at 128M/s, 0h0m to go

    32.9G resilvered, 91.74% done 

config:



     NAME         STATE   READ WRITE CKSUM

     rpool        ONLINE     0     0     0

       nirror-0   ONLINE     0     0     0

        c3t1d0s0  ONLINE     0     0     0

        c3t2d0s0  ONLINE     0     0     0  (resilvering) 



errors: No known data errors 

#

PARTS NOTE:

Refer to the Exadata Database Machine Owner's Guide Appendix C for part information.

How to identify which Exadata disk FRU part number to order , based on image and vendor and mixed disk support status - Note 1416303.1
Oracle Exadata V2 - Full Components List (https://support.us.oracle.com/handbook_internal/Systems/Exadata_V2/component.disks.html)
Oracle Exadata X2-2 - Full Components List (https://support.us.oracle.com/handbook_internal/Systems/Exadata_X2_2/component.disks.html)
Oracle Exadata X2-8 - Full Components List (https://support.us.oracle.com/handbook_internal/Systems/Exadata_X2_8/component.disks.html)

REFERENCE INFORMATION:

Exadata Database Machine Documentation:
- Exadata Database Machine Owner's Guide is available on the Storage Server OS image in /opt/oracle/cell/doc/welcome.html
- http://wd0338.oracle.com/archive/cd_ns/E13877_01/welcome.html
Sun Fire X4170, X4270, and X4275 Servers Service Manual (Servicing Customer-Replaceable Devices) http://docs.oracle.com/cd/E19477-01/820-5830-13/hotswap.html#50634786_83028
Sun Fire X4270 M2 Server Product Library Documentation (includes Sun Fire X4270 M2 Server Service Manual) http://docs.oracle.com/cd/E19245-01/index.html
HALRT-02007: Database node hard disk failure (Doc ID 1113034.1)
HALRT-02008: Database node hard disk failure (Doc ID 1113014.1)
Oracle Sun Database Machine V2 Diagnosability and Troubleshooting Best Practices (Doc ID 1071220.1)
Oracle Sun Database Machine X2-2/X2-8 Diagnosability and Troubleshooting Best Practices (Doc ID 1274324.1)
- Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment (Doc ID 1084360.1)
- Determining when Disks should be replaced on Oracle Exadata Database Machine (Doc ID 1452325.1)

Internal Only References:
- INTERNAL Exadata Database Machine Hardware Current Product Issues (Doc ID 1360343.1)
- INTERNAL Exadata Database Machine Hardware Troubleshooting (Doc ID 1360360.1)

References

@<NOTE:1360343.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues
@<NOTE:1360360.1> - INTERNAL Exadata Database Machine Hardware Troubleshooting
<NOTE:1416303.1> - How to identify which Exadata disk FRU part number to order , based on image, vendor and mixed disk support status
<NOTE:1113034.1> - HALRT-02007: Database node hard disk failure
<NOTE:1113014.1> - HALRT-02008: Database node hard disk predictive failure
<NOTE:1084360.1> - Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment
<NOTE:1071220.1> - Oracle Sun Database Machine V2 Diagnosability and Troubleshooting Best Practices
<NOTE:1452325.1> - Determining when Disks should be replaced on Oracle Exadata Database Machine
<NOTE:1274324.1> - Oracle Sun Database Machine X2-2/X2-8 Diagnosability and Troubleshooting Best Practices

Attachments

This solution has no attachment