Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1010753.1 : Sun Fire[TM] Servers (V280R, V480, V490, V880, V890): How to Replace Disks for Systems With Internal FC-AL Drives Under Solaris[TM] Volume Manager Control.
PreviouslyPublishedAs 214845
Applies to:Sun Fire V890 ServerSun Fire 280R Server Solaris SPARC Operating System - Version: 8.0 and later [Release: 8.0 and later] Sun Fire V480 Server Sun Fire V490 Server All Platforms To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Oracle Solaris Entrylevel Servers. GoalBeginning with Solaris[TM] 9 Operating System (OS), Solaris[TM] Volume Manager (SVM) software uses a new feature called Device-ID (or DevID), which identifies each disk not only by it's c#t#d# name, but also by a unique ID generated by the disk's World Wide Number (WWN), or serial number. The SVM software relies on the Solaris OS to supply it with each disk's correct DevID. When a disk fails and is replaced, a specific procedure is required to make sure that the Solaris OS is updated with the neSolutionSteps to FollowSun Fire[TM] Servers (V280R, V480, V490, V880, V890): How to Replace Disks for Systems With Internal FC-AL Drives Under Solaris[TM] Volume Manager To replace a disk, use the luxadm(1M) command to remove it and insert the new disk. This procedure causes an update of the Solaris OS device framework, so that the new disk's DevID is inserted and the old disk's DevID is removed. PROCEDURE FOR REPLACING MIRRORED DISKSThe following set of commands should work in all cases. Follow the exact sequence to ensure a smooth operation. To replace a disk, which is controlled by SVM, and is part of a mirror, perform the following steps: 1. Run "metadetach" to detach all the submirrors on the failing disk from their respective mirrors (see the following): # metadetach -f NOTE: The "-f" option is not required if the metadevice is in an "okay" state. 2. Run metaclear to remove the # metaclear You can verify there are no existing metadevices left on the disk, by running the following: # metastat -p | grep c#t#d# 3. If there are any replicas on this disk, note the number of replicas, and remove them using the following: # metadb -i (number of replicas to be noted). # metadb -d c#t#d#s# Verify that there are no existing replicas left on the disk by running the following: # metadb | grep c#t#d# 4. If there are any open filesystems on this disk not under SVM control, or non-mirrored metadevices, unmount them. 5. Run "format" or "prtvtoc/fmthard" to save the disk partition table information. # prtvtoc /dev/rdsk/c#t#d#s2 > file 6. Run the 'luxadm' command to remove the failed disk. #luxadm remove_device -F /dev/rdsk/c#t#d#s2 At the prompt, physically remove the disk and continue. The picld daemon notifies the system that the disk has been removed. 7. Initiate devfsadm cleanup subroutines by entering the following command: # /usr/sbin/devfsadm -C -c disk The default devfsadm operation is, to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files, in the /devices directory, and logical links in /dev. With the "-c disk" option, devfsadm will only update disk device files. This saves time, and is important on systems that have tape devices attached. Rebuilding these tape devices could cause undesirable results on non-Sun hardware. The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names. This should remove all the device paths for this particular disk. This can be verified with: # ls -ld /dev/dsk/cxtxd* This should return no devices. 8. It is now safe to physically replace the disk. Insert a new disk, and configure it. Create the necessary entries in the Solaris OS device tree, with one of the following commands # devfsadm or # /usr/sbin/luxadm insert_device where sx is the slot number or # /usr/sbin/luxadm insert_device (if enclosure name is not known) Note: In many cases, luxadm insert_device does not require the enclosure name and slot number. Use the following to find the slot number: # luxadm display To find the # luxadm probe Run "ls -ld /dev/dsk/c1t1d*" to verify that the new device paths have been created. CAUTION: After inserting a new disk and running devfsadm (or luxadm), the old ssd instance number changes to a new ssd instance number. This change is expected, so ignore it. For Example: When the error occurs on the following disk, whose ssd instance is given by ssd3: WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3): Error for Command: read(10) Error Level: Retryable Requested Block: 15392944 Error Block: 15392958 After inserting a new disk, the ssd instance changes to ssd10 as shown below. It is not a cause of concern as this is expected. picld[287]: [ID 727222 daemon.error] Device DISK0 inserted qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE scsi: [ID 799468 kern.info] ssd10 at fp2: name w21000011c63f0c94,0, bus address ef genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@ w21000011c63f0c94,0 scsi: [ID 365881 kern.info] 9. Run "format" or "prtvtoc/fmthard" to put the desired partition table on the new disk. # fmthard -s file /dev/rdsk/c#t#d#s2 ['file' is the prtvtoc saved in step 5] 10. Use "metainit" and "metattach" to create and attach those submirrors to the mirrors to start the resync: # metainit 11. If necessary, re-create the same number of replicas that existed previously, using the -c option of the metadb(1M) command: # metadb -a -c# c#t#d#s# 12. Be sure to correct the EEPROM entry for the boot-device (only if one of the root disks has been replaced). PROCEDURE FOR REPLACING A DISK IN A RAID-5 VOLUMENote: If a disk is used in BOTH a mirror and a RAID5, do not use the following procedure; instead, follow the instructions for the MIRRORED devices (above). This is because the RAID5 array just healed, is treated as a single disk for mirroring purposes. To replace an SVM-controlled disk, which is part of a RAID5 metadevice, the following steps must be followed: 1. If there are any open filesystems on this disk not under SVM control, or non-mirrored metadevices, unmount them. 2. If there are any replicas on this disk, remove them using: # metadb -d c#t#d#s# Verify there are no existing replicas left on the disk by running: # metadb | grep c#t#d# 3. Run "format" or "prtvtoc/fmthard" to save the disk partition table information. # prtvtoc /dev/rdsk/c#t#d#s2 > file 4. Run the 'luxadm' command to remove the failed disk. # luxadm remove_device -F /dev/rdsk/c#t#d#s2 At the prompt, physically remove the disk and continue. The picld daemon notifies the system that the disk has been removed. 5. Initiate devfsadm cleanup subroutines by entering the following command: # /usr/sbin/devfsadm -C -c disk The default devfsadm operation, is to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files in the /devices directory, and logical links in /dev. With the "-c disk" option, devfsadm will only update disk device files. This saves time and is important on systems that have tape devices attached. Rebuilding these tape devices could cause undesirable results on non-Sun hardware. The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names. This should remove all the device paths for this particular disk. This can be verified with: # ls -ld /dev/dsk/cxtxd* This should return no devices. 6. It is now safe to physically replace the disk. Insert a new disk, and configure it. Create the necessary entries in the Solaris OS device tree, with one of the following commands: # devfsadm or # /usr/sbin/luxadm insert_device or # /usr/sbin/luxadm insert_device (if enclosure name is not known) Note: In many cases, luxadm insert_device does not require the enclosure name and slot number. Use the following to find the slot number: # luxadm display To find the # luxadm probe Run "ls -ld /dev/dsk/c1t1d*" to verify that the new device paths have been created. CAUTION: After inserting a new disk and running devfsadm(or luxadm), the old ssd instance number changes to a new ssd instance number. This change is expected, so ignore it. For Example: When the error occurs on the following disks(ssd3). WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3): Error for Command: read(10) Error Level: Retryable Requested Block: 15392944 Error Block: 15392958 After inserting a new disk, the ssd instance changes to ssd10 as shown below. It is not a cause of concern as this is expected. picld[287]: [ID 727222 daemon.error] Device DISK0 inserted qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE scsi: [ID 799468 kern.info] ssd10 at fp2: name w21000011c63f0c94,0, bus address ef genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@ w21000011c63f0c94,0 scsi: [ID 365881 kern.info] 7. Run 'format' or 'prtvtoc' to put the desired partition table on the new disk # fmthard -s file /dev/rdsk/c#t#d#s2 ['file' is the prtvtoc saved in step 3] 8. Run 'metadevadm' on the disk, which will update the New DevID. # metadevadm -u c#t#d# Note: Due to BugID 4808079 a disk can show up as "unavailable" in the 9. If necessary, recreate any replicas on the new disk: # metadb -a c#t#d#s# 10. Run metareplace to enable and resync the new disk: # metareplace -e Reference: The following Infodocs were referenced: <Document 1006196.1> *Synopsis:* Solaris[TM] Volume Manager software: Replacing SCSI Disks (Solaris[TM] 9 Operating System and above)
Date: 2010-12-21 @ Attachments This solution has no attachment |
||||||||||||
|