Asset ID: |
1-72-1479681.1 |
Update Date: | 2012-10-02 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1479681.1
:
Error During Exadata 11.2.3.1.1 Upgrade Unable to backup file into /boot/cellboot.backup.11.2.2.3.5.110815.tar
Related Items |
- Oracle Exadata Storage Server Software
- Exadata Database Machine X2-2 Qtr Rack
- Exadata Database Machine X2-2 Full Rack
- Exadata Database Machine X2-8
- Exadata Database Machine X2-2 Half Rack
- Exadata Database Machine X2-2 Hardware
|
Related Categories |
- PLA-Support>Database Technology>Engineered Systems>Oracle Exadata>DB: Exadata_EST
|
Created from <SR 3-5996964501>
Applies to:
Oracle Exadata Storage Server Software - Version 11.2.2.3.5 to 11.2.3.1.1 [Release 11.2]
Exadata Database Machine X2-2 Full Rack - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
Information in this document applies to any platform.
Symptoms
Patching of one exadata cells failed with [ERROR] Unable to backup file into /boot/cellboot.backup.11.2.2.3.5.110815.tar
dm13cel09: [ERROR] Unable to backup file into /boot/cellboot.backup.11.2.2.3.5.110815.tar
dm13cel09: _EXIT_ERROR_Cell dm13cel09 10.5.35.206 2012-07-28 06:51:27: Patch or rollback failed as reported by /root/_patch_hctap_/_p_/install.sh -query state on the cell.
dm13cel09:
dm13cel09: [INFO] Patchmgr was launched from dm13db01.cbp.dhs.gov_10.5.35.196_tmp_patch_11.2.3.1.1.120607.
dm13cel09: Cell dm13cel09 10.5.35.206
dm13cel09: _EXIT_ERROR_Cell dm13cel09 10.5.35.206 2012-07-28 06:51:27: Patch or rollback failed as reported by /root/_patch_hctap_/_p_/install.sh -query state on the cell.
40;31mFAILED[0m for following cells
dm13cel09: dm13cel09 10.5.35.206 2012-07-28 06:51:27: Patch or rollback failed as reported by /root/_patch_hctap_/_p_/install.sh -query state on the cell.
2012-07-28 06:51:28 4 of 5 :[40;31mFAILED[0m: Details in files <cell_name>.log, /tmp/patch_11.2.3.1.1.120607/patchmgr.stdout, /tmp/patch_11.2.3.1.1.120607/patchmgr.stderr.
2012-07-28 06:51:28 4 of 5 :[40;1;31mFAILED[0m: DONE: Wait for cells to reboot and come online.
[ERROR] This patchmgr run failed. Please run cleanup before retrying.
[40;1;36m================PatchMgr run ended Sat Jul 28 06:51:28 EDT 2012 ===========[0m
Current imageinfo
# imageinfo
Kernel version: 2.6.18-194.3.1.0.4.el5 #1 SMP Sat Feb 19 03:38:37 EST 2011 x86_64
Cell version: CELL-01514: Connect Error. Verify that Management Server is listening at the specified HTTP port: 8888.
Cell rpm version: cell-11.2.2.3.5_LINUX.X64_110815-1
Active image version: 11.2.2.3.5.110815
Active image activated: 2011-10-15 21:09:02 -0400
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8
In partition rollback: Impossible
Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.2.3.5.110815
Inactive image version: 11.2.3.1.1.120607
[WARNING] File not found /opt/oracle.cellos/patch/history/image.id.11.2.3.1.1.120607
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7
Boot area has rollback archive for the version: undefined
Rollback to the inactive partitions: Impossible
Changes
Upgrading cell image from 11.2.2.3.5 to 11.2.3.1.1
Cause
Journal entries for /dev/md4 was missing
tune2fs -l /dev/md4 output from good cell and bad cell
bad cell
=======
filesystem features: filetype sparse_super
...
Filesystem created: Sat Feb 5 21:36:44 2011
Last mount time: Sat Oct 15 21:04:22 2011
Last write time: Sun Apr 22 01:22:41 2012
Mount count: 0
...
blank for journal
good cell
========
Filesystem features: has_journal ext_attr filetype needs_recovery sparse_super
...
Filesystem created: Sat Feb 5 21:37:06 2011
Last mount time: Sat Jul 28 01:00:27 2012
Last write time: Sat Jul 28 01:00:27 2012
Mount count: 35
...
Journal inode: 8
mdadm --detail /dev/md$x; *note -- sub x for 1 2 5 6 7 8 11
/dev/md2:
Version : 0.90
Creation Time : Sat Feb 5 21:35:55 2011
Raid Level : raid1
Array Size : 2096384 (2047.59 MiB 2146.70 MB)
Used Dev Size : 2096384 (2047.59 MiB 2146.70 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Sun Jul 22 04:24:36 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 44f506c7:5d4d1b71:a9f3cadd:69dc557b
Events : 0.116
Number Major Minor RaidDevice State
0 8 9 0 active sync /dev/sda9
1 8 25 1 active sync /dev/sdb9
/dev/md5:
Version : 0.90
Creation Time : Sat Feb 5 21:36:04 2011
Raid Level : raid1
Array Size : 10482304 (10.00 GiB 10.73 GB)
Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 5
Persistence : Superblock is persistent
Update Time : Sat Jul 28 17:00:15 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : a67a2ad1:05c76851:623dfe7f:ff0c322d
Events : 0.106
Number Major Minor RaidDevice State
0 8 5 0 active sync /dev/sda5
1 8 21 1 active sync /dev/sdb5
/dev/md4 which was the /boot partition was missing
Note : Cell was booting up because the /boot was still found from the USB recovery Drive
Solution
Isolate the faulty cell by dropping all grid disk manually that belong to faulty cell
Fix the journal using the command below
df-h
====
dm13cel09: Filesystem Size Used Avail Use% Mounted on
dm13cel09: /dev/md6 9.9G 4.9G 4.5G 52% /
dm13cel09: tmpfs 12G 0 12G 0% /dev/shm
dm13cel09: /dev/md8 2.0G 645M 1.3G 34% /opt/oracle
dm13cel09: /dev/md11 2.3G 182M 2.0G 9% /var/log/oracle
tune2fs -j /dev/md4
mount -a
df -h must show /boot mounted
Reboot faulty cell
Then attempt to patch faulty cell
cd /tmp/patch_11.2.3.1.1.120607
echo "dm13cel09" > dm13cel09
./patchmgr -cells dm13cel09 -cleanup
./patchmgr -cells dm13cel09 –patchcheck_prereq
./patchmgr -cells dm13cel09 –patch
./patchmgr -cells dm13cel09 -cleanup
After patching add griddisk back manually
Attachments
This solution has no attachment