Oracle Cluster 3.x: Rolling firmware update on SCSI JBOD disk with Solaris Volume Manager and root disk

Asset ID:	1-71-1004100.1
Update Date:	2012-07-30
Keywords:

Solution Type Technical Instruction Sure

Solution 1004100.1 : Oracle Cluster 3.x: Rolling firmware update on SCSI JBOD disk with Solaris Volume Manager and root disk

Applies to:

Solaris Cluster - Version 3.0 and later
Sun Storage D2 Array - Version Not Applicable and later
Sun Storage D1000 Array - Version Not Applicable and later
Solstice DiskSuite Software - Version 2.0.1 and later
All Platforms

Goal

This Technical Instruction explains how to do a rolling disk firmware update with minimum downtime in Oracle Cluster using Solaris Volume Manager. This disk could be the local disk (root/mirror) on the nodes or a external shared SCSI JBOD disk.

Fix

There are two different approaches to do the firmware update, one for shared SCSI JBOD disks and one for cluster node disks (root/mirror).

The following procedure assumes that:

1) All metadevices (local and in disksets) are in the "Okay" state, all submirrores are attached and no resyncing of any metadevice is still in progress!

2) Your /etc/lvm/md.tab is current - compare output of "metastat -p" with the entries in /etc/lvm/md.tab. If they are missing or not up to date then run:

# metastat -p >> /etc/lvm/md.tab

and for the disksets:

# metastat -s <setname> -p >> /etc/lvm/md.tab

Note: If you run an "explorer" all this info is saved in the output file!

3) Both cluster nodes are members and this will not change during the procedure.

4) The cluster 'did' namespace is current with no mismatches.

5) Check that all cluster 'did' id's do match the physical disk's id. Check /var/adm/messages for similar warnings like:

'device id for '/dev/rdsk/c2t8d0' does not match physical disk's id. The drive may have been replaced'

If this is the case then first identify the cluster DID device which does not match:

[root]# scdidadm -L

1        msun0001:/dev/rdsk/c1t0d0      /dev/did/rdsk/d1

2        msun0001:/dev/rdsk/c1t1d0      /dev/did/rdsk/d2

3        msun0002:/dev/rdsk/c3t8d0      /dev/did/rdsk/d3

3        msun0001:/dev/rdsk/c3t8d0      /dev/did/rdsk/d3

4        msun0001:/dev/rdsk/c3t9d0      /dev/did/rdsk/d4

4        msun0002:/dev/rdsk/c3t9d0      /dev/did/rdsk/d4

5        msun0001:/dev/rdsk/c2t8d0      /dev/did/rdsk/d5 <<<<<< wrong ID

5        msun0002:/dev/rdsk/c2t8d0      /dev/did/rdsk/d5 <<<<<<

6        msun0001:/dev/rdsk/c2t9d0      /dev/did/rdsk/d6

6        msun0002:/dev/rdsk/c2t9d0      /dev/did/rdsk/d6

7        msun0002:/dev/rdsk/c1t0d0      /dev/did/rdsk/d7

8        msun0002:/dev/rdsk/c1t1d0      /dev/did/rdsk/d8

Now run the following commands on the identified DID device to update cluster config: (these commands are safe to run on a production cluster!)

Check the current ID

[root]# scdidadm -o asciidiskid -l d5

IBM     8RM838

Update DID

[root]# scdidadm -R d5

Check that ID is correctly updated

[root]# scdidadm -o asciidiskid -l d5

SEAGATE 3JA97LEV00007503

The ID should match the label on the front of the physical disk! You can use "iostat -En" to check all real serial number (and revision too!) and "scdidadm -o asciidiskid -l dYX" an all DID for cross checking.

......

c2t8d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0

Vendor: SEAGATE  Product: ST336607L SUN36G  Revision: 0507 Serial No: 00007503

Size: 18.11GB <18110967808 bytes>

Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0

......

If there is no metadevice which is offline or in maintenance state and all DID ID's mach the physical ID's, the continue with A) and /or B).

A) How to do a firmware update on SCSI JBOD cluster shared disk

Since the disks will be spinning down and performing a hard reset while doing a F/W update, you cannot do this on a disk in use. You would lose a mirror half! The second problem is that the "download" routine is checking if SVM too if drive is in use. To overcome both problems you need to:

First offline the disk for the period of updating, so will not lose the mirror half and the resync is quite quick! Then you need to run the "download" routine from the note which is currently NOT the owner of the diskset with the disk to update. In other words, if node 1 has the diskset imported, then run the "firmware "download" from node 2.

root@msun0002 # scstat -D

....

Device Group        Primary             Secondary

------------        -------             ---------

Device group servers:  nfs-set             msun0001            msun0002

....



root@msun0002 # metastat -s nfs-set

Proxy command to: msun0001

nfs-set/d300: Mirror

Submirror 0: nfs-set/d301

State: Okay

Submirror 1: nfs-set/d302

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 142239915 blocks



nfs-set/d301: Submirror of nfs-set/d300

State: Okay

Size: 142239915 blocks

Stripe 0: (interlace: 128 blocks)

Device Start Block  Dbase State        Hot Spare

d3s0          0     No    Okay

d4s0          0     No    Okay



nfs-set/d302: Submirror of nfs-set/d300

State: Okay

Size: 142239915 blocks

Stripe 0: (interlace: 128 blocks)

Device Start Block  Dbase State        Hot Spare

d5s0          0     No    Okay

d6s0          0     No    Okay



root@msun0002 # metaoffline -s nfs-set d300 d301

Change directory to the firmware patch directory:

root@msun0002 # cd /var/tmp/116369-11

root@msun0002 # ./download

Firmware Download Utility, V4.2



**************************  WARNING  **************************

NO OTHER ACTIVITY IS ALLOWED DURING FIRMWARE UPGRADE!!!

No other programs including any volume manager (e.g. Veritas,

SDS, or Vold) should be running.  Other host systems sharing

any I/O bus with this host must either be offline or

disconnected.  Any interruption (e.g. power loss) during

upgrade can result in damage to devices being upgraded.



Any disk to be upgraded should first have its data backed up.

***************************************************************



Searching for devices...

rmt/0: Mode Sense for default pages failed!



DISK DEVICES

Device         Rev   Product

c1t0d0:        0507  ST336607L -- SUN36G

c1t1d0:        1804  MAN3367M -- SUN36G

c2t8d0:        0507  ST336607L -- SUN36G <<<<<<<<<<<<<<<

c2t9d0:        0507  ST336607L -- SUN36G

c3t8d0:        S96H  DDYST3695 -- SUN36G

c3t9d0:        0507  ST336607L -- SUN36G

Total Devices:  6

Enter command: p c2t8d0 <<<<<<<<<<<<<<



NOTE: select ONLY the one disk to update!!!



NOTICE: Cannot access kernel, kvm_open did not succeed!

Upgrading devices...



c2t8d0: Successful download



Enter command: inq   " check if new firmware in place!"



DISK DEVICES

Device  Rev   Product              S/N

........

c2t8d0:        0707  ST336607L -- SUN36G <<<<<<<<<<<<<<<

........



Enter command: q

Now online the disk again and observe syncing.....

root@msun0002 # metaonline -s nfs-set d300 d301

Proxy command to: msun0001

root@msun0002 # metastat -s nfs-set | grep %

Proxy command to: msun0001

32 % done

root@msun0002 #

Repeat with other disks if necessary.

B) How to do a firmware update on cluster node disk

If you have to update local disk, the just switch all the resource groups to the node.

root@msun0002 # scswitch -z -g <resourcegroup> -h msun0002

Then reboot this node into "none cluster mode":

root@msun0002 # init 0



> OK boot -xs

Once booted, you will have to delete the metadb on the disk to be updated and detach and clear the metadevice:

root@msun0002 # metadb

flags           first blk       block count

a m  p  luo        16              4096            /dev/dsk/c1t0d0s7

a    p  luo        4112            4096            /dev/dsk/c1t0d0s7

a    p  luo        8208            4096            /dev/dsk/c1t0d0s7

a    p  luo        16              4096            /dev/dsk/c1t1d0s7

a    p  luo        4112            4096            /dev/dsk/c1t1d0s7

a    p  luo        8208            4096            /dev/dsk/c1t1d0s7



root@msun0002 # metadb -d /dev/dsk/c1t0d0s7

root@msun0002 # metadb

flags           first blk       block count

a    p  luo        16              4096            /dev/dsk/c1t1d0s7

a    p  luo        4112            4096            /dev/dsk/c1t1d0s7

a    p  luo        8208            4096            /dev/dsk/c1t1d0s7

Save your rootdisk configuration before you start and save the original md.tab file.

root@msun0002 # cp /etc/lvm/md.tab /etc/lvm/md.tab.orig

root@msun0002 # metastat -p > /etc/lvm/md.tab

root@msun0002 # metastat -p

d200 -m d201 d202 1

d201 1 1 c1t0d0s0

d202 1 1 c1t1d0s0

d210 -m d211 d212 1

d211 1 1 c1t0d0s1

d212 1 1 c1t1d0s1

d230 -m d231 d232 1

d231 1 1 c1t0d0s3

d232 1 1 c1t1d0s3

d240 -m d241 d242 1

d241 1 1 c1t0d0s4

d242 1 1 c1t1d0s4

d250 -m d251 d252 1

d251 1 1 c1t0d0s5

d252 1 1 c1t1d0s5

d260 -m d261 d262 1

d261 1 1 c1t0d0s6

d262 1 1 c1t1d0s6

root@msun0002 # metadetach d200 d201

......
Repeat this with all other submirrors
.....

The metastat should now look something like this:

root@msun0002 # metastat -p

d200 -m d202 1

d202 1 1 c1t1d0s0

d210 -m d212 1

d212 1 1 c1t1d0s1

d230 -m d232 1

d232 1 1 c1t1d0s3

d240 -m d242 1

d242 1 1 c1t1d0s4

d250 -m d252 1

d252 1 1 c1t1d0s5

d260 -m d262 1

d261 1 1 c1t0d0s6

d262 1 1 c1t1d0s6

d201 1 1 c1t0d0s0

d211 1 1 c1t0d0s1

d231 1 1 c1t0d0s3

d241 1 1 c1t0d0s4

d251 1 1 c1t0d0s5

Now save this configuration again, you will see further down the reason for this.



root@msun0002 # cp /etc/lvm/md.tab /etc/lvm/md.tab.bothmirrors

root@msun0002 # metastat -p > /etc/lvm/md.tab

Now clear all the submirrors

root@msun0002 # metaclear d201

....
repeat for all submirrors
....

Now you can update your firmware but for this disk only. (Follow the procedure in patch readme or like in A)

When finished, just run:

root@msun0002 # metainit -a

metainit is now using the entries in /etc/lvm/md.tab by default and it will recreate all the missing submirrors again. There will be a lot of messages telling that some mirrors exist, that is OK, ignore.

root@msun0002 # metattach d200 d201

....
repeat for all submirrors
....

Repeat with other local disk (if nessessary). Reboot node into cluster again and repeat with other node (if necessary).

Community Discussions

Still have questions? Use the live My Oracle Support Oracle Solaris Cluster Community window below, to search for similar discussions or start a new discussion on this subject.

Attachments

This solution has no attachment