Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1420126.1
Update Date:2012-05-10
Keywords:

Solution Type  Problem Resolution Sure

Solution  1420126.1 :   ODA (Oracle Database Appliance ) showing different disks disappear randomly after a reboot  


Related Items
  • Oracle Database Appliance
  •  
Related Categories
  • PLA-Support>Sun Systems>x64>Engineered Systems HW>SN-x64: ORA-DATA-APP
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-5334656327>

Applies to:

Oracle Database Appliance - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

As a result of these missing disks, ASM disks and Grid are not coming up

  • ODA is ASM is failing to identify disks during startup
  • The problem symptom includes different disks randomly showing a failed,  predictive failure or missing
  • The problem can lead to an entire node failing to startup

 

Example

The problematic Node has been rebooted several times and has come back up with different disks missing each time:

DATA dg - missing disks
---------------
/dev/mapper/HDD_E1_S19_993871319p1
/dev/mapper/HDD_E1_S11_1196820151p1
/dev/mapper/HDD_E0_S13_1196881379p1
/dev/mapper/HDD_E0_S04_1196963151p1


RECO dg - missing disks
---------------------
/dev/mapper/HDD_E1_S19_993871319p2
/dev/mapper/HDD_E1_S11_1196820151p2
/dev/mapper/HDD_E0_S13_1196881379p2
/dev/mapper/HDD_E0_S04_1196963151p2

After rebooting node 6 disks are missing (different ones):

ASMCMD> mount all
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "21" is missing from group number "3"
ORA-15042: ASM disk "20" is missing from group number "3"
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "17" is missing from group number "2"
ORA-15042: ASM disk "16" is missing from group number "2"
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "17" is missing from group number "1"
ORA-15042: ASM disk "16" is missing from group number "1" (DBD ERROR:
OCIStmtExecute)

[grid@svp-oda1 ~]$ ls -l /dev/mapper/HDD* |wc -l
57

[root@svp-oda2 ~]# ls -l /dev/mapper/HDD* |wc -l
51


Node 1
------------
57 disks

Node 2
----------
51 disks

As a result of these missing disks, ASM disks and Grid are not coming up

Commands for determining missing disks :

# oakcli show disk
NAME PATH TYPE STATE STATE_DETAILS

pd_00 /dev/sdam HDD ONLINE Good
pd_01 /dev/sdaw HDD ONLINE Good
pd_02 /dev/sdaa HDD ONLINE Good
pd_03 /dev/sdak HDD ONLINE Good
pd_04 /dev/sdan HDD ONLINE Good
pd_05 /dev/sdax HDD ONLINE Good
pd_06 /dev/sdab HDD ONLINE Good
pd_07 /dev/sdal HDD ONLINE Good
pd_08 /dev/sdao HDD ONLINE Good
pd_09 /dev/sdau HDD ONLINE Good
pd_10 /dev/sdac HDD ONLINE Good
pd_11 /dev/sdai HDD ONLINE Good
pd_12 /dev/sdap HDD ONLINE Good
pd_13 /dev/sdav HDD ONLINE Good
pd_14 /dev/sdad HDD ONLINE Good
pd_15 /dev/sdaj HDD ONLINE Good
pd_16 /dev/sdaq HDD ONLINE Good
pd_17 /dev/sdas HDD ONLINE Good
pd_18 /dev/sdae HDD ONLINE Good
pd_19 /dev/sdag HDD ONLINE Good
pd_20 /dev/sdar SSD ONLINE Good
pd_21 /dev/sdat SSD ONLINE Good
pd_22 /dev/sdaf SSD ONLINE Good
pd_23 /dev/sdah SSD ONLINE Good


# oakcli show diskgroup data
ASM_DISK PATH DISK STATE STATE_DETAILS

data_00 /dev/mapper/HDD_E0_S00_975071251p1 pd_00 ONLINE Good
data_01 /dev/mapper/HDD_E0_S01_973074223p1 pd_01 ONLINE Good
data_02 /dev/mapper/HDD_E1_S02_975283211p1 pd_02 ONLINE Good
data_03 /dev/mapper/HDD_E1_S03_975067947p1 pd_03 ONLINE Good
data_04 /dev/mapper/HDD_E0_S04_975277007p1 pd_04 ONLINE Good
data_05 /dev/mapper/HDD_E0_S05_975080611p1 pd_05 ONLINE Good
data_06 /dev/mapper/HDD_E1_S06_975276063p1 pd_06 ONLINE Good
data_07 /dev/mapper/HDD_E1_S07_975284323p1 pd_07 ONLINE Good
data_08 /dev/mapper/HDD_E0_S08_970712075p1 pd_08 ONLINE Good
data_09 /dev/mapper/HDD_E0_S09_975061523p1 pd_09 ONLINE Good
data_10 /dev/mapper/HDD_E1_S10_975282083p1 pd_10 ONLINE Good
data_11 /dev/mapper/HDD_E1_S11_975281571p1 pd_11 ONLINE Good
data_12 /dev/mapper/HDD_E0_S12_975274931p1 pd_12 ONLINE Good
data_13 /dev/mapper/HDD_E0_S13_977596619p1 pd_13 ONLINE Good
data_14 /dev/mapper/HDD_E1_S14_975053527p1 pd_14 ONLINE Good
data_15 /dev/mapper/HDD_E1_S15_975284719p1 pd_15 ONLINE Good
data_16 /dev/mapper/HDD_E0_S16_975268647p1 pd_16 ONLINE Good
data_17 /dev/mapper/HDD_E0_S17_975283679p1 pd_17 ONLINE Good
data_18 /dev/mapper/HDD_E1_S18_975281159p1 pd_18 ONLINE Good
data_19 /dev/mapper/HDD_E1_S19_975279427p1 pd_19 ONLINE Good



it would be good to add commands to determine
- missing disks
- possible OSWatcher symptoms
- other diagnostic information that can help support and customers confirm that they are hitting this bug
Add / supplement information as you deem appropriate

Changes

This problem can occur after

  • one ASM disk is lost
  • replacing a Disk
  • reboot (of one server )

 

Cause


This problem has been identified as a bug

<Bug: 13728921> - PHYSICAL DISKS DISAPPEAR AFTER REBOOTING NODE
-closed as a duplicate of

<Bug: 13618428> - AFTER LOSING ONE ASM DISK, MULTIPLE DISKS BECAME UNRESPONSIVE

 
CR 7132662 - P1 erie/firmware Cluster outage resulted from a single HDD failure - X4370M2 with Erie

Solution



Resolution

Apply the ODA 2.1.0.3.0. Patch Bundle  Patch 13622348
- then
Apply the ODA 2.1.0.3.1 Emergency Patch:13817532  -- single patch applied on top of  2.1.0.3.0



See Urgent Mandatory OAK Patch 2.1.0.3.1 [Document 1438089.1]

Workaround
----------------
1) Power cycle both (2) of the servers

 

 

References

@ <BUG:13618428> - AFTER LOSING ONE ASM DISK, MULTIPLE DISKS BECAME UNRESPONSIVE
<BUG:13728921> - PHYSICAL DISKS DISAPPEAR AFTER REBOOTING NODE 2
<NOTE:1438089.1> - ALERT - Urgent Mandatory OAK Patch 2.1.0.3.1 for ODA - (Oracle Database Appliance)
<NOTE:438089.1> - Creating legal entity, not able to select Pakistan territory

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback