![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Sun Alert Sure Solution 1128605.1 : Firmware for RAID Controllers Causes Unscheduled Simultaneous Reboot of Controllers After 828.5 Days of Continuous Operation
In this Document
Applies to:Sun Storage 6540 Array - Version: Not Applicable to Not Applicable - Release: N/A to N/ASun Storage 6580 Array - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Storage 6780 Array - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Storage Flexline 380 Array - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Storage 6180 Array - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Information in this document applies to any platform. ________________________ SUNBUG:6949589, 6872995 Date of Resolved Release: 18-Jun-2010 ________________________________ DescriptionA known issue with vxWorks RAID controller firmware for Sun StorageTek arrays (as listed in Section 2) may cause drives associated with host/IO volumes to experience write failures when the controllers reboot. This issue can occur after approximately 828.5 days of uptime, when vxWorks (by default) is scheduled for a simultaneous auto-reboot of the controllers. Likelihood of OccurrenceThis issue can occur on the following platforms:
This issue is not restricted to the above arrays, as this firmware may also be used with other arrays, servers, or switches. To determine the version of firmware on the controller, please view the Common Array Manager (CAM) Storage System Summary page of the CAM host managing the array. There is a timer in vxWorks (vxAbsTicks) that is a double word long 0x00000000 (a 32 bit number). cfgMonitorTask monitors this offset to avoid drive failure during IO to the disk, and reboots the controller once the vxAbsTicks reaches 0xff000000. When this timer rolls over from 0xffffffff to 0x00000000 (approximately 828.5 days) there is a possibility that if host I/O volumes exist, the associated drives will be failed with a write failure. Possible SymptomsRAID arrays using software mirroring to mirror data between the two arrays perform an unscheduled simultaneous reboot at nearly the same time (approximately 828.5 days uptime), causing a write failure. Workaround or ResolutionTo avoid the (unscheduled) controller auto-reboot, alternately reboot the controllers anytime between 1 day uptime and 800 days uptime to restart the counter prior to vxAbsTicks rollover. With a proper failover environment, there should be no interruption of service. Even with a workaround of rebooting each controller prior to the vxAbsTicks rollover, the issue will still be experienced after the reboot of the controllers every ~828 days by arrays with 6.x firmware revision. This issue is resolved in the firmware 07.35.10.10 or later for the 25xx series and 07.15.11.12 or later for the 6xxx series. The fix changes the reboot schedule to different times for each controller without the need for a manual reboot. (Reboot date/time can still be scheduled manually). You cannot upgrade directly from 6.x firmware to 7.x. You must first upgrade to the firmware bundled in the upgrade utility then use CAM to upgrade to the required firmware level. See also: "Procedure to Upgrade the Sun StorageTek[TM] 6540 Array, 6140 Array or FLX380 Storage Array from Firmware 06.xx to 07.xx." Document: 1131593.1 and "Procedure to Upgrade the Sun StorageTek[TM] 2500 Series Array Controller Firmware from 06.xx to 07.xx."<Document:1319254.1> The firmware matrix are available in the following documents: <Document:1021780.1> Sun StorageTek 2510 (iSCSI) Firmware Matrix <Document:1005365.1> Sun StorageTek 2530 (SAS) Firmware Matrix <Document:1017877.1> Sun StorageTek 2540 (FC) Firmware Matrix <Document:1011474.1> Sun StorageTek 6140 Firmware Matrix <Document:1022296.1> Sun Storage 6180 Firmware Matrix <Document:1009934.1> Sun StorageTek 6540 Firmware Matrix <Document:1022298.1> Sun Storage 6580/6780 Firmware Matrix <Document:1011551.1> Sun StorageTek Flexline 380 (FLX380) with Common Array Manager (CAM) Firmware Matrix PatchesModification History18-Jun-2010: Document created, issue is Resolved01-Jun-2010: Updated for minor formatting issues 15-Dec-2011: Update "Notes" in Workaround section 19-Dec-2011: Update Workaround section for additional information vxWorks Detail What is it? - There is a timer in the firmware, specifically in vxWorks, called vxAbsTicksthat is only a double word long 0x0000 0000. When this timer rolls over from 0xffff ffff to 0x0000 0000 (approximately 828.5 days) there is the possibility that if there is host I/O to volumes, the associated drives will be failed with a write failure. Thiswas discovered in 2003, CR# 68447 was opened against the issue. The CR# put a function in the controller firmware called 'cfgMonitorTask' that will reboot the controllerif the vxAbsTicks value is within 12 days of 828 days. This has been in the firmware from 03.xx up to 06.60 firmware. You can monitor this using the following shell command: % vxAbsTicks vxAbsTicks = 0x2e5540: value = 227183 = 0x3776f What Happened? - When the conversion from RC1 to RC2 was completed The functionality in cfgMonitorTask was not ported into 07.xx CFW. Therefore, this reintroduced the ungraceful vxAbsTicks timer rollover at approximately 828.5days with the possibility that if there is host I/O to volumes, the associated drives will be failed with a write failure. Where was it fixed? - CR 138248 was added to RC2 trunk prior to Emerald/Exmoor and is in all subsequent releases which adds the proactive reboot of the controllers prior to the ungraceful vxAbsTicks timer rollover. ReferencesSUNBUG:6872995SUNBUG:6949589 Attachments This solution has no attachment |
||||||||||||
|