Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1360789.1
Update Date:2012-05-14
Keywords:

Solution Type  Problem Resolution Sure

Solution  1360789.1 :   Storage Cell Disk Controller Writeback Cache Policy Set to WriteThrough Mode  


Related Items
  • Exadata Database Machine X2-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>x64>Engineered Systems HW>SN-x64: EXADATA
  •  




Created from <SR 3-4556368491>

Applies to:

Exadata Database Machine X2-2 Hardware - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

Exacheck reports:

Storage cell da01cel06 reporting failure: storage server disk controller not using writeback cache.

Changes

Ran Exacheck on the cluster.

Cause

Synopsis:

A bad hard disk device is experienced on the same storage cell server.

The cell alert.log shows the bad disk detection:

IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=3148695552 (in sectors) sz=1048576 bytes] (errno: Input/output error [5])
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=3843227648 (in sectors) sz=4096 bytes] (errno: Input/output error [5])
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=3074523312 (in sectors) sz=4096 bytes] (errno: Input/output error [5])
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=15679504 (in sectors) sz=8192 bytes] (errno: Input/output error [5])
Mon Aug 29 08:01:47 2011
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=65536 (in sectors) sz=4096 bytes] (errno: Input/output error [5])
Mon Aug 29 08:01:47 2011
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=24453120 (in sectors) sz=1048576 bytes] (errno: Input/output error [5])
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=24455168 (in sectors) sz=1048576 bytes] (errno: Input/output error [5])
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=24457216 (in sectors) sz=1048576 bytes] (errno: Input/output error [5])
IO Error on dev=/dev/sdk cdisk=CD_10_da01cel06 [op=RD offset=24459264 (in sectors) sz=1048576 bytes] (errno: Input/output error [5])
Mon Aug 29 08:01:52 2011
Drop celldisk CD_10_da01cel06 (options: force, from memory only, no-erase) - begin
Drop celldisk CD_10_da01cel06 - end

All flash disks are fine:

CellCLI> list celldisk
CD_00_da01cel06 normal
CD_01_da01cel06 normal
CD_02_da01cel06 normal
CD_03_da01cel06 normal
CD_04_da01cel06 normal
CD_05_da01cel06 normal
CD_06_da01cel06 normal
CD_07_da01cel06 normal
CD_08_da01cel06 normal
CD_09_da01cel06 normal
CD_10_da01cel06 not present
CD_11_da01cel06 normal
FD_00_da01cel06 normal
FD_01_da01cel06 normal
FD_02_da01cel06 normal
FD_03_da01cel06 normal
FD_04_da01cel06 normal
FD_05_da01cel06 normal
FD_06_da01cel06 normal
FD_07_da01cel06 normal
FD_08_da01cel06 normal
FD_09_da01cel06 normal
FD_10_da01cel06 normal
FD_11_da01cel06 normal
FD_12_da01cel06 normal
FD_13_da01cel06 normal
FD_14_da01cel06 normal
FD_15_da01cel06 normal

Battery status is fine too:

The Management Service drops the cache policy as seen in ms_odl.trc, which shows the MegaSAS disk controller cache policy change:

[2011-09-20T14:21:55.018-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542914948:101906,0] Try to drop pinned cache on phys: 20:10
[2011-09-20T14:21:55.018-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542914948:101906,0] Try to drop pineed cache on slot: 0
[2011-09-20T14:21:55.085-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542914948:101906,0] Found preserved cache on target id(lun) : 10 Deleting the pinned cache. cmd - MegaCli64 -DiscardPreservedCache -L10 -a0 -nolog
[2011-09-20T14:21:55.086-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542914948:101906,0] Try to set WB cache.
[2011-09-20T14:21:55.135-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542914948:101906,0] get WB caching state.
[2011-09-20T14:21:59.666-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542919598:101907,0] Try to drop pinned cache on phys: 20:10
[2011-09-20T14:21:59.666-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542919598:101907,0] Try to drop pineed cache on slot: 0
[2011-09-20T14:21:59.733-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542919598:101907,0] Found preserved cache on target id(lun) : 10 Deleting the pinned cache. cmd - MegaCli64 -DiscardPreservedCache -L10 -a0 -nolog
[2011-09-20T14:21:59.734-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542919598:101907,0] Try to set WB cache.
[2011-09-20T14:21:59.783-04:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.diskadp.MSDiskAdapterImpl] [tid: 16] [ecid: 10.66.220.45:24832:1316542919598:101907,0] get WB caching state.

Solution

Replace the failed fixed disk drive.

MS will automatically detect the new disk and change cache policy back to normal write back automatically.


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback