Asset ID: |
1-72-1392156.1 |
Update Date: | 2012-09-04 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1392156.1
:
Pillar Axiom: NAS File Systems May Not Recover, with Possible Impending Data Loss, Following a Power Cycle
Related Items |
- Pillar Axiom 300 Storage System
- Pillar Axiom 600 Storage System
- Pillar Axiom 500 Storage System
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Pillar Axiom>SN-DK: Ax600
|
Axiom NAS file system journals are placed in Battery Backed Memory on the buddy Slammer CU for ease of recovery in the event of sudden failure of the owning CU. An error in recovering these journals in the event of Slammer or system power loss or power cycle may result in data loss.
In this Document
Applies to:
Pillar Axiom 500 Storage System - Version Not Applicable to Not Applicable [Release N/A]
Pillar Axiom 600 Storage System - Version Not Applicable to Not Applicable [Release N/A]
Pillar Axiom 300 Storage System - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.
N/A
Symptoms
NAS File Systems may not recover, with possible impending data loss, following a power cycle
Synopsis
During the recovery period of a filesystem following a power cycle, the filesystem may be marked offline with no actual journal recovery performed if the underlying physical storage (the Brick pool) is not yet available. In this offline state, the filesystem's journal may be overwritten by other filesystem data, causing subsequent data loss.
Affected Hardware and Software Versions
This issue affects all NAS or NAS/SAN Axioms that are on AxiomONE 04.00.xx, 04.01.xx, or 04.02.00 releases. This defect does not affect SAN specific systems or any AxiomONE release prior to R4.0.
Changes
N/A
Cause
Problem
A software defect has been discovered that may lead to NAS file system corruption following a system power cycle, caused by a power outage event or a manual power recycling of the Axiom.
Specifically, during the recovery period of the filesystem after the power event, the underlying storage containing the low-level Virtual LUN, or vLUN (not to be confused with a SAN LUN), may not yet be available. In this case, the defect described here will result in the filesystem being marked as offline and the journal replay operation (necessary to bring the filesystem back into sync) will be inadvertently skipped.
Also, the nature of this defect leads the system to believe the filesystem's journal, if present, is no longer active, allowing other filesystems to write over the space it once occupied. If the latter occurs, the filesystem will experience data loss.
IMPORTANT!
Contact Pillar Data Systems Customer Support immediately if this issue is encountered.
Solution
Solution/Workaround
The fix for this issue is included in AxiomONE Release 04.02.01 and above. An upgrade to 04.02.01 or the current recommended patch level above 04.02.01 will prevent this issue.
Axioms on release 04.00.xx, 04.01.00 to 04.01.06 and 04.02.00 are subject to this bug.
A fix is available from 04.01.07 and 04.02.01. Latest releases like 04.03, 04.05 and 04.06 also have the fix.
Pillar customers that utilize the callhome feature of the Axiom may be notified proactively by the Pillar World Wide Support Center if their system(s) have the potential to encounter this issue, as we will be scanning the weekly Periodic Callhome files that are received by the Pillar callhome infrastructure.
Those customers not using the callhome feature in the Axiom and are below AxiomONE release 04.02.01, are urged to contact Pillar World Wide Support Center to open a Service Request for screening for this issue and schedule an upgrade for remediation if it is determined that your system is at risk for encountering this issue.
Common Questions
Question: How much risk is there?
Answer: While the triggering of this issue is reliant upon certain conditions to exist, it is Pillar's position that any risk to data is too much. It is, therefore, strongly recommended to take appropriate steps to remove that risk as soon as possible.
Question: How long have you known about this? Why didn’t we hear about this before?
Answer: This issue was discovered during internal testing very recently. Pillar is proactively issuing this alert following confirmation of risk.
Question: Is the fix non-disruptive?
Answer: Yes. Those customers already on AxiomONE release 4.0, 4.1, or 4.2 may upgrade non-disruptively. Contact the Support Center for your specific upgrade options.
Attachments
This solution has no attachment