Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1479736.1
Update Date:2012-09-07
Keywords:

Solution Type  Technical Instruction Sure

Solution  1479736.1 :   How to replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure)  


Related Items
  • Exadata Database Machine X2-2 Half Rack
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Exadata Database Machine X2-2 Full Rack
  •  
  • Exadata Database Machine X2-8
  •  
  • Exadata Database Machine X2-2 Qtr Rack
  •  
  • Exadata Database Machine V2
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  
  • .Old GCS Categories>ST>Server>Engineered Systems>Exadata>Hardware
  •  


Canned Action Plan procedure to replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure). This covers Exadata disk alert HALRT-02007 and HALRT-02008.

Applies to:

Exadata Database Machine X2-2 Qtr Rack - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine V2 - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine X2-8 - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine X2-2 Hardware - Version Not Applicable to Not Applicable [Release N/A]
Exadata Database Machine X2-2 Full Rack - Version Not Applicable to Not Applicable [Release N/A]
Oracle Solaris on x86-64 (64-bit)
Information in this document applies to any platform.

Goal

Identify and replace a failed hard disk drive from an Exadata Compute (Database) node for hard or predictive failures.

Fix

DISPATCH INSTRUCTIONS:

The customer may choose to do the replacement themselves. In this case, the disk should be sent out using a parts-only dispatch.

The following information will be required prior to dispatch of a replacement:

  • Type of Exadata (V2 or X2-2 or X2-8)

  • Type of database node (V2=x4170 / X2-2 = x4170m2 / X2-8 = x4800 or x4800m2)

  • Name/location of database node

  • Slot number of failed drive

  • Image Version (output of "/opt/oracle.cellos/imageinfo -all")

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Linux megaraid familiiarity

TIME ESTIMATE: 60 minutes

Complete time may be dependent on disk re-sync time.

TASK COMPLEXITY: 0

CRU-optional; default is FRU with Task Complexity: 2

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
PROBLEM OVERVIEW: A hard disk in an Exadata V2/X2-2/X2-8 compute (database (DB)) node needs replacing

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Hard disks in Exadata DB nodes are configured into RAID volumes and are hot swappable provided the failed hard disk has been offlined by LSI MegaRAID that manages the volume, and ZFS if running Solaris. The volume contains redundancy and should remain online, while in a degraded state.

The failed hard disk may be marked either “critical” (hard) or “predictive failure”.

For a critical hard failure, the LED for the failed hard disk should have the "OK to Remove" blue LED illuminated/flashing and have the "Service Action Required" amber LED illuminated/flashing. This may trigger alarm HALRT-02007 - refer to Note 1113034.1.

For a predictive failure, the LED for the failed hard disk should have the “Service Action Required” amber LED illuminated/flashing.On certain image revisions, predictive failures may not yet be removed from the volume and may not have a fault LED on. This may trigger alarm HALRT-02008 - refer to Note 1113014.1.

The normal DB node volume arrangement depends on the OS installed and the current active image version. Use “/opt/oracle.cellos/imageinfo” to determine the current active image version, and “uname -s” to determine the OS type. The volumes expected are as follows:

V2/X2-2 Linux only, if dual-boot Solaris image partition has been reclaimed or was not present:

  • 3-disk RAID 5 with 1 global hotspare on images 11.2.3.1.1 and earlier 

X2-2 Linux and Solaris dual-boot, if other OS image partitions have not been reclaimed:

  • 2-disk RAID 1 for Linux on images 11.2.2.3.2 and later.
  • 2 single-disk RAID0 as 1 mirrored zpool for Solaris on images 11.2.2.3.2 and later. 

X2-2 Solaris only, if dual-boot Linux image partition has been reclaimed:

  • 4 single-disk RAID0 volumes configured into 2 mirrored zpool's (rpool and data), on images 11.2.2.3.2 and later

X2-8 Linux only:

  • 7-disk RAID 5 with 1 global hotspare on images 11.2.3.1.1 and earlier, if dual-boot Solaris image partition has been reclaimed

X2-8 Linux and Solaris dual-boot, if other OS image partitions have not been reclaimed:

  • 3-disk RAID 5 with 1 global hotspare for Linux on images 11.2.3.1.0 and earlier
  • 4 single-disk RAID0 volumes configured into 2 mirrored zpool's (rpool and data) on images 11.2.3.1.0 and earlier
  • Solaris support on X2-8 was discontinued as of image 11.2.3.1.1

 

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

1. Backup the volume and be familiar with the restore from bare metal procedure before replacing the disk. See Note 1084360.1 for details.

If the DB node was running 11.2.2.1.1 or 11.2.2.2.x images and was in a state of write-through caching mode at some stage (default is write-back), there is a possibility that the Linux file system is corrupt due to a disk controller firmware bug. When this is encountered the file system may have been operating normally however will go read-only when attempting to rebuild the corrupted blocks across to the hotspare disk. This may be unavoidable as the rebuild copy back from hotspare to replacement occurs automatically. This requires a bare metal restore to correct.

2. Identify the disk using the Amber fault and Blue OK-to-Remove LED states. The DB node server within the rack can be determined from the hostname usually, and the known default Exadata server numbering scheme counting server numbers up from 1 as the lower most DB node in the rack. The server's white Locate LED may be flashing as well.

If still unsure on the slot location, use the following commands to identify the faulted disk:

a. Obtain the enclosure ID for the MegaRAID card:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a0 | grep ID
Device ID : 252
#

Solaris:

# /opt/MegaRAID/MegaCli -encinfo -a0 | grep ID
Device ID : 252
#

b. Identify the physical disk slot that is failed:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware"
Slot Number: 0
Firmware state: Unconfigured(bad)
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Rebuild

  Solaris:

# /opt/MegaRAID/MegaCli -pdlist -a0 | egrep -i "slot|firmware"

"Unconfigured(bad)" is the expected state for the faulted disk. In this example, it is located in physical slot 0, and it can be seen that the Hotspare in slot 3 has started rebuilding the volume.

If all disks show as Online or Hotspare, then the disk may be in predictive failure state but not yet gone offline. The failed disk can be identified using this additional information:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|predictive|firmware"
Slot Number: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Predictive Failure Count: 12
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Slot Number: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Slot Number: 3
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Hotspare, Spun down

Solaris:

# /opt/MegaRAID/MegaCli -pdlist -a0 | egrep -i "slot|predictive|firmware"

In this example, the disk in slot 1 has reported itself as predictive failed several times but is still online. This disk should be considered the bad one. For more details refer to Note 1452325.1.

c. Use the locate function which turns the "Service Action Required" amber LED on flashing:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[E#:S#] -a0

Solaris:

# /opt/MegaRAID/MegaCli -PdLocate -start -physdrv[E#:S#] -a0

where E# is the enclosure ID number identified in step a, and S# is the slot number of the disk identified in step b. In the example above, the command would be:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[252:0] -a0

3. Verify the state of the RAID is optimal or rebuilding if there is a hotspare, or degraded if there is not, with the good disk(s) online before hot-swap removing the failed disk.

If the failed disk was the global hotspare, then this step should be skipped.

Linux (RAID5 Example):

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"
Virtual Drive: 0 (Target Id: 0)
State : Degraded
Slot Number: 3
Firmware state: Rebuild
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
#

 Linux (RAID1 Example):

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"
Virtual Drive: 0 (Target Id: 0)
State : Degraded
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Unconfigured(bad)
Foreign State: None
Virtual Drive: 1 (Target Id: 1)
State : Optimal
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 2 (Target Id: 2)
State : Optimal
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
#
Solaris:
The volume type on Solaris is RAID0, and then the failure may cause the virtual drive to no longer be visible. In that case, check the expected number of good drives are present and online (3 of the 4 in X2-2 or 6 of the 8 in X2-8 where the hotspare does not show in this command), and verify the zpool status is degraded with 1 of the mirrors online:
# /opt/MegaRAID/MegaCli -LdPdInfo -a0 | egrep -i "target|state|slot"
Virtual Drive: 0 (Target Id: 0)
State : Optimal
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 1 (Target Id: 1)
State : Optimal
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
# zpool status
 pool: rpool
 state: DEGRADED
status: One or more devices has been removed by the administrator.
     Sufficient replicas exist for the pool to continue functioning in a
     degraded state.
action: Online the device using 'zpool online' or replace the device with
     'zpool replace'.
  scan: resilvered 9.87G in 0h1m with 0 errors on Tue Jul 10 16:35:50 2012
config:

     NAME STATE READ WRITE CKSUM
     rpool DEGRADED 0 0 0
      mirror-0 DEGRADED 0 0 0
       c3t1d0s0 ONLINE 0 0 0
       c3t2d0s0 REMOVED 0 0 0

errors: No known data errors
#

4. On the drive you plan to remove, push the storage drive release button to open the latch.

5. Grasp the latch and pull the drive out of the drive slot (Caution: The latch is not an ejector. Do not bend it too far to the right. Doing so can damage the latch. Also, whenever you remove a storage drive, you should replace it with another storage drive or a filler panel, otherwise the server might overheat due to improper airflow.)

6. Wait three minutes for the system to acknowledge the disk has been removed.

7. Slide the new drive into the drive slot until it is fully seated.

8. Close the latch to lock the drive in place.

9. Verify the "OK/Activity" Green LED begins to flicker as the system recognizes the new drive.  The other two LEDs for the drive should no longer be illuminated. The server's locate and disk's service LED locate blinking function should automatically turn off.

If it does not, it can be manually turned off for the device using:

Linux:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[E#:S#] -a0

Solaris:

# /opt/MegaRAID/MegaCli -PdLocate -stop -physdrv[E#:S#] -a0

where E# is the enclosure ID number identified in step 2a, and S# is the slot number of the disk identified in step 2b. In the example above, the command would be:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[252:0] -a0

 

OBTAIN CUSTOMER ACCEPTANCE
 WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

1. Verify the disk is brought online into a volume by LSI MegaRAID. Until the disk is added into a volume, the OS will not be able to use the disk.

If the OS is Linux, depending on the volume arrangement and image version, the disk may automatically become the new hotspare disk, or it may stay in an Unconfigured(good) state until the hotspare rebuild has completed.  If it stays Unconfigured then the hotspare will copy back to rebuild on the new disk after the rebuild has completed. If it is a RAID1 then it should automatically come into the volume and start rebuilding.

If the OS is Solaris, it is a Solaris RAID0 volume, and may not come into a volume automatically and will be in state Unconfigured(good) until it is in a volume.

Use the following to verify the physical disk is in one of these expected states – Hotspare, Unconfigured(good), Copyback, or Online:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[E#:Slot#] -a0

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk replaced. In the example above, the command and output would be:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[252:0] -a0

Adapter #0

Enclosure Device ID: 252
Slot Number: 0
Device Id: 10
Sequence Number: 7
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 136.727 GB [0x11174b81 Sectors]
Non Coerced Size: 136.227 GB [0x11074b81 Sectors]
Coerced Size: 136.218 GB [0x11070000 Sectors]
Firmware state: Unconfigured(good), Spun Up
SAS Address(0): 0x5000cca00a1b817d
SAS Address(1): 0x0
Connected Port Number: 2(path0)
Inquiry Data: HITACHI H103014SCSUN146GA1600934FH3Y8E
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified 

2 Verify the replacement disk has been added to the expected RAID volume.

If the OS is running Linux and the failed disk was originally the global hotspare, then the replacement should have become the hotspare automatically, identified in step 1, and this step should be skipped. If that did not occur automatically, then the new disk can be assigned as the hotspare with the following command:

# /opt/MegaRAID/MegaCli/MegaCli64 -PdHsp -set -EnclAffinity -PhysDrv[E#:Slot#] -a0

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk replaced.

If the OS is running Linux and the failed disk was part of a RAID volume, use the following MegaRAID command to verify the status of the RAID:

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"

If it has already completed the copyback when checked, then it may already be in “Online” state. If it is in rebuilding or copyback state, you can use the following to verify progress to completion:

# /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physrv [E#:Slot#]

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Rebuild state. This is typically the original Hotspare disk slot.

# /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physrv [252:3]

Rebuild Progress on Device at Enclosure 252, Slot 3 Completed 9% in 3 Minutes.

Exit Code: 0x00
#

or

# /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physrv [E#:Slot#]

where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Copyback state. This is typically the replaced disk slot.

# /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physdrv [252:0] -a0

Copyback Progress on Device at Enclosure 252, Slot 0 Completed 79% in 29 Minutes.

Exit Code: 0x00
#

If the OS is running Solaris, the RAID0 MegaRAID volume may need to be recreated, if it was not done so automatically. In this example the rpool mirror disk in slot 3 was failed:

# /opt/MegaRAID/MegaCli -cfgldadd -r0[252:3] wb nora direct nocachedbadbbu -strpsz1024 -a0

Adapter 0: Created VD 2

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

#

Use format to partition the disk with a full-disk Solaris label, single cylinder boot block on slice 8, and the rest of the disk as root partition on slice 0.

# format -e
Searching for disks...done

c3t0d0: configured with capacity of 275.53GB

AVAILABLE DISK SELECTIONS:
   0. c3t0d0 <LSI-MR9261-8i-2.12 cyl 281 alt 2 hd 255 sec 8064>
      /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@0,0
   1. c3t1d0 <LSI-MR9261-8i-2.12 cyl 36348 alt 2 hd 255 sec 63> ai-disk
      /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@1,0
   2. c3t2d0 <LSI-MR9261-8i-2.12 cyl 36348 alt 2 hd 255 sec 63>
      /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@2,0
Specify disk (enter its number): 2
selecting c3t2d0 [disk formatted]
No Solaris fdisk partition found.

FORMAT MENU:
   disk - select a disk
   type - select (define) a disk type
   partition - select (define) a partition table
   current - describe the current disk
   format - format and analyze the disk
   fdisk - run the fdisk program
   repair - repair a defective sector
   label - write label to the disk
   analyze - surface analysis
   defect - defect list management
   backup - search for backup labels
   verify - read and display labels
   save - save new disk/partition definitions
   inquiry - show disk ID
   volname - set 8-character volume name
   !<cmd> - execute <cmd>, then return
   quit
format> fdisk
No fdisk table exists. The default partition for the disk is:

 a 100% "SOLARIS System" partition

Type "y" to accept the default partition, otherwise type "n" to edit the
 partition table.
Y

format> ver
Warning: Primary label on disk appears to be different from
current label.

Warning: Check the current partitioning and 'label' the disk or use the
    'backup' command.

Primary label contents:

Volume name = < >
ascii name = <DEFAULT cyl 36348 alt 2 hd 255 sec 63>
pcyl = 36350
ncyl = 36348
acyl = 2
bcyl = 0
nhead = 255
nsect = 63
Part Tag Flag Cylinders Size Blocks
0 root wm 1 - 36347 278.43GB (36347/0/0) 583914555
1 unassigned wu 0 0 (0/0/0) 0
2 backup wu 0 - 36349 278.46GB (36350/0/0) 583962750
3 unassigned wu 0 0 (0/0/0) 0
4 unassigned wu 0 0 (0/0/0) 0
5 unassigned wu 0 0 (0/0/0) 0
6 unassigned wu 0 0 (0/0/0) 0
7 unassigned wu 0 0 (0/0/0) 0
8 boot wu 0 - 0 7.84MB (1/0/0) 16065
9 unassigned wu 0 0 (0/0/0) 0

format> label
Ready to label disk, continue? y

format> q
#

Re-attach the new disk to the zpool. Use -f option if this is a mounted root pool:

# zpool attach -f rpool c3t1d0s0 c3t2d0s0
Make sure to wait until resilver is done before rebooting.

 If this was one of the 2 boot disks in the root pool, then re-enable booting:

# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3t2d0s0
stage2 written to partition 0, 282 sectors starting at 50 (abs 16115)

Verify status of the zpool rebuilding:

# zpool status
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered. The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.  
  scan: resilver in progress since Tue Jul 17 17:25:18 2012
    32.9G scanned out of 35.9G at 128M/s, 0h0m to go
    32.9G resilvered, 91.74% done
config:

     NAME         STATE   READ WRITE CKSUM
     rpool        ONLINE     0     0     0
       nirror-0   ONLINE     0     0     0
        c3t1d0s0  ONLINE     0     0     0
        c3t2d0s0  ONLINE     0     0     0  (resilvering)

errors: No known data errors
#

PARTS NOTE:

Refer to the Exadata Database Machine Owner's Guide Appendix C for part information.

How to identify which Exadata disk FRU part number to order , based on image and vendor and mixed disk support status - Note 1416303.1
Oracle Exadata V2 - Full Components List (https://support.us.oracle.com/handbook_internal/Systems/Exadata_V2/component.disks.html)
Oracle Exadata X2-2 - Full Components List (https://support.us.oracle.com/handbook_internal/Systems/Exadata_X2_2/component.disks.html)
Oracle Exadata X2-8 - Full Components List (https://support.us.oracle.com/handbook_internal/Systems/Exadata_X2_8/component.disks.html)


REFERENCE INFORMATION:

Internal Only References:
  - INTERNAL Exadata Database Machine Hardware Current Product Issues (Doc ID 1360343.1)
  - INTERNAL Exadata Database Machine Hardware Troubleshooting (Doc ID 1360360.1)

References

@<NOTE:1360343.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues
@<NOTE:1360360.1> - INTERNAL Exadata Database Machine Hardware Troubleshooting
<NOTE:1416303.1> - How to identify which Exadata disk FRU part number to order , based on image, vendor and mixed disk support status
<NOTE:1113034.1> - HALRT-02007: Database node hard disk failure
<NOTE:1113014.1> - HALRT-02008: Database node hard disk predictive failure
<NOTE:1084360.1> - Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment
<NOTE:1071220.1> - Oracle Sun Database Machine V2 Diagnosability and Troubleshooting Best Practices
<NOTE:1452325.1> - Determining when Disks should be replaced on Oracle Exadata Database Machine
<NOTE:1274324.1> - Oracle Sun Database Machine X2-2/X2-8 Diagnosability and Troubleshooting Best Practices

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback