![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1362174.1 : Exadata Compute Node RAID Controller Failed
In this Document Created from <SR 3-4598608111>
Applies to:Exadata Database Machine V2 - Version: Not ApplicableInformation in this document applies to any platform. Symptoms- On the affected compute node the filesystems become read only. - It's not possible to remount them as read/write : # mount -o remount,rw / - MegaCLI64 commands do not work correctly: # /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0 - The console log reports messages like : ADP_RESET_GEN2: retry time=3e8, hostdiag=a4 megaraid_sas: FW was restarted successfully, initiating next stage... megaraid_sas: HBA recovery state machine, state 2 starting... printk: 9 messages suppressed. printk: 9 messages suppressed. megaraid_sas: out: controller is not in ready state megasas: waiting_for_outstanding: after issue OCR. megasas: waiting_for_outstanding: before issue OCR. FW state = f0000000 megaraid_sas: pending commands remain even state = f0000000 megaraid_sas: pending commands remain even after reset handling. megasas[0]: Dumping Frame Phys Address of all pending cmds in FW megasas[0]: Total OS Pending cmds : 0 megasas[0]: 64 bit SGLs were sent to FW megasas[0]: Pending OS cmds in FW : megasas[0]: Frame addr :0x37f22800 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0, lba lo : 0x167727f, lba_hi : 0x0, sense_buf addr : 0x37f20500,sge count : 0x1 ..... 0x7f77f400 : <3>megasas[0]: Dumping Done. megasas: failed to do reset sd 0:2:0:0: megasas: RESET -1140663 cmd=2a retries=0 megasas: cannot recover from previous reset failures sd 0:2:0:0: megasas: RESET -1140663 cmd=2a retries=0 megasas: cannot recover from previous reset failures sd 0:2:0:0: timing out command, waited 360s end_request: I/O error, dev sda, sector 23119751 printk: 8 messages suppressed. Buffer I/O error on device sda1, logical block 2889961 lost page write due to I/O error on sda1 sd 0:2:0:0: rejecting I/O to offline device sd 0:2:0:0: rejecting I/O to offline device ... _journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data journal commit I/O error ext3_abort called. EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only CauseThis is likely to be a failure of the LSI RAID Controller.SolutionReplace the LSI controller on the affected compute node ( 6GIGABIT SAS RAID PCI EXPRESS HBA, B4 ASIC ), then restart the compute node.References<BUG:9336229> - DB SERVER SHOWS READ ONLY FILESYSTEM AFTER DISK CONTROLLER GOES OFFLINEAttachments This solution has no attachment |
||||||||||||
|