Asset ID: |
1-71-1004100.1 |
Update Date: | 2012-07-30 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1004100.1
:
Oracle Cluster 3.x: Rolling firmware update on SCSI JBOD disk with Solaris Volume Manager and root disk
Related Items |
- Sun Storage D1000 Array
- Sun Storage D2 Array
- Solaris Cluster
- Solstice DiskSuite Software
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Sun Cluster>SN-DK: Cluster
|
PreviouslyPublishedAs
205707
Applies to:
Solaris Cluster - Version 3.0 and later
Sun Storage D2 Array - Version Not Applicable and later
Sun Storage D1000 Array - Version Not Applicable and later
Solstice DiskSuite Software - Version 2.0.1 and later
All Platforms
Goal
This Technical Instruction explains how to do a rolling disk firmware update with minimum downtime in Oracle Cluster using Solaris Volume Manager. This disk could be the local disk (root/mirror) on the nodes or a external shared SCSI JBOD disk.
Fix
There are two different approaches to do the firmware update, one for shared SCSI JBOD disks and one for cluster node disks (root/mirror).
The following procedure assumes that:
1) All metadevices (local and in disksets) are in the "Okay" state, all submirrores are attached and no resyncing of any metadevice is still in progress!
2) Your /etc/lvm/md.tab is current - compare output of "metastat -p" with the entries in /etc/lvm/md.tab. If they are missing or not up to date then run:
# metastat -p >> /etc/lvm/md.tab
and for the disksets:
# metastat -s <setname> -p >> /etc/lvm/md.tab
Note: If you run an "explorer" all this info is saved in the output file!
3) Both cluster nodes are members and this will not change during the procedure.
4) The cluster 'did' namespace is current with no mismatches.
5) Check that all cluster 'did' id's do match the physical disk's id. Check /var/adm/messages for similar warnings like:
'device id for '/dev/rdsk/c2t8d0' does not match physical disk's id. The drive may have been replaced'
If this is the case then first identify the cluster DID device which does not match:
[root]# scdidadm -L
1 msun0001:/dev/rdsk/c1t0d0 /dev/did/rdsk/d1
2 msun0001:/dev/rdsk/c1t1d0 /dev/did/rdsk/d2
3 msun0002:/dev/rdsk/c3t8d0 /dev/did/rdsk/d3
3 msun0001:/dev/rdsk/c3t8d0 /dev/did/rdsk/d3
4 msun0001:/dev/rdsk/c3t9d0 /dev/did/rdsk/d4
4 msun0002:/dev/rdsk/c3t9d0 /dev/did/rdsk/d4
5 msun0001:/dev/rdsk/c2t8d0 /dev/did/rdsk/d5 <<<<<< wrong ID
5 msun0002:/dev/rdsk/c2t8d0 /dev/did/rdsk/d5 <<<<<<
6 msun0001:/dev/rdsk/c2t9d0 /dev/did/rdsk/d6
6 msun0002:/dev/rdsk/c2t9d0 /dev/did/rdsk/d6
7 msun0002:/dev/rdsk/c1t0d0 /dev/did/rdsk/d7
8 msun0002:/dev/rdsk/c1t1d0 /dev/did/rdsk/d8
Now run the following commands on the identified DID device to update cluster config: (these commands are safe to run on a production cluster!)
Check the current ID
[root]# scdidadm -o asciidiskid -l d5
IBM 8RM838
Update DID
[root]# scdidadm -R d5
Check that ID is correctly updated
[root]# scdidadm -o asciidiskid -l d5
SEAGATE 3JA97LEV00007503
The ID should match the label on the front of the physical disk! You can use "iostat -En" to check all real serial number (and revision too!) and "scdidadm -o asciidiskid -l dYX" an all DID for cross checking.
......
c2t8d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607L SUN36G Revision: 0507 Serial No: 00007503
Size: 18.11GB <18110967808 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
......
If there is no metadevice which is offline or in maintenance state and all DID ID's mach the physical ID's, the continue with A) and /or B).
A) How to do a firmware update on SCSI JBOD cluster shared disk
Since the disks will be spinning down and performing a hard reset while doing a F/W update, you cannot do this on a disk in use. You would lose a mirror half! The second problem is that the "download" routine is checking if SVM too if drive is in use. To overcome both problems you need to:
First offline the disk for the period of updating, so will not lose the mirror half and the resync is quite quick! Then you need to run the "download" routine from the note which is currently NOT the owner of the diskset with the disk to update. In other words, if node 1 has the diskset imported, then run the "firmware "download" from node 2.
root@msun0002 # scstat -D
....
Device Group Primary Secondary
------------ ------- ---------
Device group servers: nfs-set msun0001 msun0002
....
root@msun0002 # metastat -s nfs-set
Proxy command to: msun0001
nfs-set/d300: Mirror
Submirror 0: nfs-set/d301
State: Okay
Submirror 1: nfs-set/d302
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 142239915 blocks
nfs-set/d301: Submirror of nfs-set/d300
State: Okay
Size: 142239915 blocks
Stripe 0: (interlace: 128 blocks)
Device Start Block Dbase State Hot Spare
d3s0 0 No Okay
d4s0 0 No Okay
nfs-set/d302: Submirror of nfs-set/d300
State: Okay
Size: 142239915 blocks
Stripe 0: (interlace: 128 blocks)
Device Start Block Dbase State Hot Spare
d5s0 0 No Okay
d6s0 0 No Okay
root@msun0002 # metaoffline -s nfs-set d300 d301
Change directory to the firmware patch directory:
root@msun0002 # cd /var/tmp/116369-11
root@msun0002 # ./download
Firmware Download Utility, V4.2
************************** WARNING **************************
NO OTHER ACTIVITY IS ALLOWED DURING FIRMWARE UPGRADE!!!
No other programs including any volume manager (e.g. Veritas,
SDS, or Vold) should be running. Other host systems sharing
any I/O bus with this host must either be offline or
disconnected. Any interruption (e.g. power loss) during
upgrade can result in damage to devices being upgraded.
Any disk to be upgraded should first have its data backed up.
***************************************************************
Searching for devices...
rmt/0: Mode Sense for default pages failed!
DISK DEVICES
Device Rev Product
c1t0d0: 0507 ST336607L -- SUN36G
c1t1d0: 1804 MAN3367M -- SUN36G
c2t8d0: 0507 ST336607L -- SUN36G <<<<<<<<<<<<<<<
c2t9d0: 0507 ST336607L -- SUN36G
c3t8d0: S96H DDYST3695 -- SUN36G
c3t9d0: 0507 ST336607L -- SUN36G
Total Devices: 6
Enter command: p c2t8d0 <<<<<<<<<<<<<<
NOTE: select ONLY the one disk to update!!!
NOTICE: Cannot access kernel, kvm_open did not succeed!
Upgrading devices...
c2t8d0: Successful download
Enter command: inq " check if new firmware in place!"
DISK DEVICES
Device Rev Product S/N
........
c2t8d0: 0707 ST336607L -- SUN36G <<<<<<<<<<<<<<<
........
Enter command: q
Now online the disk again and observe syncing.....
root@msun0002 # metaonline -s nfs-set d300 d301
Proxy command to: msun0001
root@msun0002 # metastat -s nfs-set | grep %
Proxy command to: msun0001
32 % done
root@msun0002 #
Repeat with other disks if necessary.
B) How to do a firmware update on cluster node disk
If you have to update local disk, the just switch all the resource groups to the node.
root@msun0002 # scswitch -z -g <resourcegroup> -h msun0002
Then reboot this node into "none cluster mode":
root@msun0002 # init 0
> OK boot -xs
Once booted, you will have to delete the metadb on the disk to be updated and detach and clear the metadevice:
root@msun0002 # metadb
flags first blk block count
a m p luo 16 4096 /dev/dsk/c1t0d0s7
a p luo 4112 4096 /dev/dsk/c1t0d0s7
a p luo 8208 4096 /dev/dsk/c1t0d0s7
a p luo 16 4096 /dev/dsk/c1t1d0s7
a p luo 4112 4096 /dev/dsk/c1t1d0s7
a p luo 8208 4096 /dev/dsk/c1t1d0s7
root@msun0002 # metadb -d /dev/dsk/c1t0d0s7
root@msun0002 # metadb
flags first blk block count
a p luo 16 4096 /dev/dsk/c1t1d0s7
a p luo 4112 4096 /dev/dsk/c1t1d0s7
a p luo 8208 4096 /dev/dsk/c1t1d0s7
Save your rootdisk configuration before you start and save the original md.tab file.
root@msun0002 # cp /etc/lvm/md.tab /etc/lvm/md.tab.orig
root@msun0002 # metastat -p > /etc/lvm/md.tab
root@msun0002 # metastat -p
d200 -m d201 d202 1
d201 1 1 c1t0d0s0
d202 1 1 c1t1d0s0
d210 -m d211 d212 1
d211 1 1 c1t0d0s1
d212 1 1 c1t1d0s1
d230 -m d231 d232 1
d231 1 1 c1t0d0s3
d232 1 1 c1t1d0s3
d240 -m d241 d242 1
d241 1 1 c1t0d0s4
d242 1 1 c1t1d0s4
d250 -m d251 d252 1
d251 1 1 c1t0d0s5
d252 1 1 c1t1d0s5
d260 -m d261 d262 1
d261 1 1 c1t0d0s6
d262 1 1 c1t1d0s6
root@msun0002 # metadetach d200 d201
......
Repeat this with all other submirrors
.....
The metastat should now look something like this:
root@msun0002 # metastat -p
d200 -m d202 1
d202 1 1 c1t1d0s0
d210 -m d212 1
d212 1 1 c1t1d0s1
d230 -m d232 1
d232 1 1 c1t1d0s3
d240 -m d242 1
d242 1 1 c1t1d0s4
d250 -m d252 1
d252 1 1 c1t1d0s5
d260 -m d262 1
d261 1 1 c1t0d0s6
d262 1 1 c1t1d0s6
d201 1 1 c1t0d0s0
d211 1 1 c1t0d0s1
d231 1 1 c1t0d0s3
d241 1 1 c1t0d0s4
d251 1 1 c1t0d0s5
Now save this configuration again, you will see further down the reason for this.
root@msun0002 # cp /etc/lvm/md.tab /etc/lvm/md.tab.bothmirrors
root@msun0002 # metastat -p > /etc/lvm/md.tab
Now clear all the submirrors
root@msun0002 # metaclear d201
....
repeat for all submirrors
....
Now you can update your firmware but for this disk only. (Follow the procedure in patch readme or like in A)
When finished, just run:
root@msun0002 # metainit -a
metainit is now using the entries in /etc/lvm/md.tab by default and it will recreate all the missing submirrors again. There will be a lot of messages telling that some mirrors exist, that is OK, ignore.
root@msun0002 # metattach d200 d201
....
repeat for all submirrors
....
Repeat with other local disk (if nessessary). Reboot node into cluster again and repeat with other node (if necessary).
Community Discussions
Still have questions? Use the live My Oracle Support Oracle Solaris Cluster Community window below, to search for similar discussions or start a new discussion on this subject.
Attachments
This solution has no attachment