Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1479681.1
Update Date:2012-10-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1479681.1 :   Error During Exadata 11.2.3.1.1 Upgrade Unable to backup file into /boot/cellboot.backup.11.2.2.3.5.110815.tar  


Related Items
  • Oracle Exadata Storage Server Software
  •  
  • Exadata Database Machine X2-2 Qtr Rack
  •  
  • Exadata Database Machine X2-2 Full Rack
  •  
  • Exadata Database Machine X2-8
  •  
  • Exadata Database Machine X2-2 Half Rack
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
Related Categories
  • PLA-Support>Database Technology>Engineered Systems>Oracle Exadata>DB: Exadata_EST
  •  




Created from <SR 3-5996964501>

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.2.3.5 to 11.2.3.1.1 [Release 11.2]
Exadata Database Machine X2-2 Full Rack - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
Information in this document applies to any platform.

Symptoms

Patching of one exadata cells failed with [ERROR] Unable to backup file into /boot/cellboot.backup.11.2.2.3.5.110815.tar

dm13cel09: [ERROR] Unable to backup file into /boot/cellboot.backup.11.2.2.3.5.110815.tar
dm13cel09: _EXIT_ERROR_Cell dm13cel09 10.5.35.206 2012-07-28 06:51:27: Patch or rollback failed as reported by /root/_patch_hctap_/_p_/install.sh -query state on the cell.
dm13cel09:
dm13cel09: [INFO] Patchmgr was launched from dm13db01.cbp.dhs.gov_10.5.35.196_tmp_patch_11.2.3.1.1.120607.
dm13cel09: Cell dm13cel09 10.5.35.206
dm13cel09: _EXIT_ERROR_Cell dm13cel09 10.5.35.206 2012-07-28 06:51:27: Patch or rollback failed as reported by /root/_patch_hctap_/_p_/install.sh -query state on the cell.

40;31mFAILED[0m for following cells
dm13cel09:  dm13cel09 10.5.35.206 2012-07-28 06:51:27: Patch or rollback failed as reported by /root/_patch_hctap_/_p_/install.sh -query state on the cell.
2012-07-28 06:51:28 4 of 5 :[40;31mFAILED[0m: Details in files <cell_name>.log, /tmp/patch_11.2.3.1.1.120607/patchmgr.stdout, /tmp/patch_11.2.3.1.1.120607/patchmgr.stderr.
2012-07-28 06:51:28 4 of 5 :[40;1;31mFAILED[0m: DONE: Wait for cells to reboot and come online.
[ERROR] This patchmgr run failed. Please run cleanup before retrying.
[40;1;36m================PatchMgr run ended Sat Jul 28 06:51:28 EDT 2012 ===========[0m


Current imageinfo


# imageinfo

Kernel version: 2.6.18-194.3.1.0.4.el5 #1 SMP Sat Feb 19 03:38:37 EST 2011 x86_64
Cell version: CELL-01514: Connect Error. Verify that Management Server is listening at the specified HTTP port: 8888.
Cell rpm version: cell-11.2.2.3.5_LINUX.X64_110815-1

Active image version: 11.2.2.3.5.110815
Active image activated: 2011-10-15 21:09:02 -0400
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.2.3.5.110815

Inactive image version: 11.2.3.1.1.120607
[WARNING] File not found /opt/oracle.cellos/patch/history/image.id.11.2.3.1.1.120607
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

Boot area has rollback archive for the version: undefined
Rollback to the inactive partitions: Impossible

Changes

 Upgrading cell image from 11.2.2.3.5 to 11.2.3.1.1

Cause

Journal entries for /dev/md4 was missing

tune2fs -l /dev/md4 output from good cell and bad cell

bad cell
=======
filesystem features:      filetype sparse_super
...
Filesystem created:       Sat Feb  5 21:36:44 2011
Last mount time:          Sat Oct 15 21:04:22 2011
Last write time:          Sun Apr 22 01:22:41 2012
Mount count:              0
...
blank for journal

good cell
========
Filesystem features:      has_journal ext_attr filetype needs_recovery sparse_super
...
Filesystem created:       Sat Feb  5 21:37:06 2011
Last mount time:          Sat Jul 28 01:00:27 2012
Last write time:          Sat Jul 28 01:00:27 2012
Mount count:              35
...
Journal inode:            8


mdadm --detail /dev/md$x; *note -- sub x for 1 2 5 6 7 8 11


/dev/md2:
        Version : 0.90
  Creation Time : Sat Feb  5 21:35:55 2011
     Raid Level : raid1
     Array Size : 2096384 (2047.59 MiB 2146.70 MB)
  Used Dev Size : 2096384 (2047.59 MiB 2146.70 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Jul 22 04:24:36 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 44f506c7:5d4d1b71:a9f3cadd:69dc557b
         Events : 0.116

    Number   Major   Minor   RaidDevice State
       0       8        9        0      active sync   /dev/sda9
       1       8       25        1      active sync   /dev/sdb9
/dev/md5:
        Version : 0.90
  Creation Time : Sat Feb  5 21:36:04 2011
     Raid Level : raid1
     Array Size : 10482304 (10.00 GiB 10.73 GB)
  Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 5
    Persistence : Superblock is persistent

    Update Time : Sat Jul 28 17:00:15 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : a67a2ad1:05c76851:623dfe7f:ff0c322d
         Events : 0.106

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1       8       21        1      active sync   /dev/sdb5


/dev/md4 which was the /boot partition was missing


Note : Cell was booting up because the /boot was still found from the USB recovery Drive

Solution

Isolate the faulty cell by dropping all grid disk manually that belong to faulty cell

Fix the journal using the command below

df-h
====
dm13cel09: Filesystem            Size  Used Avail Use% Mounted on
dm13cel09: /dev/md6              9.9G  4.9G  4.5G  52% /
dm13cel09: tmpfs                  12G     0   12G   0% /dev/shm
dm13cel09: /dev/md8              2.0G  645M  1.3G  34% /opt/oracle
dm13cel09: /dev/md11             2.3G  182M  2.0G   9% /var/log/oracle

tune2fs -j /dev/md4

mount -a

df -h must show /boot mounted

Reboot faulty cell

Then attempt to patch faulty cell

cd /tmp/patch_11.2.3.1.1.120607
echo "dm13cel09" > dm13cel09
./patchmgr -cells dm13cel09 -cleanup
./patchmgr -cells dm13cel09 –patchcheck_prereq
./patchmgr -cells dm13cel09 –patch
./patchmgr -cells dm13cel09 -cleanup

After patching add griddisk back manually


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback