Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1128605.1
Update Date:2011-12-19
Keywords:

Solution Type  Sun Alert Sure

Solution  1128605.1 :   Firmware for RAID Controllers Causes Unscheduled Simultaneous Reboot of Controllers After 828.5 Days of Continuous Operation  


Related Items
  • Sun Storage 6580 Array
  •  
  • Sun Storage Flexline 380 Array
  •  
  • Sun Storage 6540 Array
  •  
  • Sun Storage 6180 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 2540 Array
  •  
  • Sun Storage 6780 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage 6140 Array
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • .Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
  Description
  Likelihood of Occurrence
  Possible Symptoms
  Workaround or Resolution
  Patches
  Modification History
  References


Applies to:

Sun Storage 6540 Array - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Sun Storage 6580 Array - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage 6780 Array - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage Flexline 380 Array - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage 6180 Array - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Information in this document applies to any platform.
________________________

SUNBUG:6949589, 6872995

Date of Resolved Release: 18-Jun-2010
________________________________

Description


A known issue with vxWorks RAID controller firmware for Sun StorageTek arrays (as listed in Section 2) may cause drives associated with host/IO volumes to experience write failures when the controllers reboot. This issue can occur after approximately 828.5 days of uptime, when vxWorks (by default) is scheduled for a simultaneous auto-reboot of the controllers.

Likelihood of Occurrence


This issue can occur on the following platforms:
  • Sun StorageTek 2510 Array
  • Sun StorageTek 2530 Array
  • Sun StorageTek 2540 Array
  • Sun StorageTek 6140 Array
  • Sun StorageTek 6180 Array
  • Sun StorageTek 6540 Array
  • Sun StorageTek 6580 Array
  • Sun StorageTek 6780 Array
  • Sun StorageTek Flexline 380 Array
running vxWorks Controller Firmware versions 6.70.xx or earlier.

This issue is not restricted to the above arrays, as this firmware may also be used with other arrays, servers, or switches.

To determine the version of firmware on the controller, please view the Common Array Manager (CAM) Storage System Summary page of the CAM host managing the array.

There is a timer in vxWorks (vxAbsTicks) that is a double word long 0x00000000 (a 32 bit number). cfgMonitorTask monitors this offset to avoid drive failure during IO to the disk, and reboots the controller once the vxAbsTicks reaches 0xff000000. When this timer rolls over from 0xffffffff to 0x00000000 (approximately 828.5 days) there is a possibility that if host I/O volumes exist, the associated drives will be failed with a write failure.

Possible Symptoms


RAID arrays using software mirroring to mirror data between the two arrays perform an unscheduled simultaneous reboot at nearly the same time (approximately 828.5 days uptime), causing a write failure.

Workaround or Resolution


To avoid the (unscheduled) controller auto-reboot, alternately reboot the controllers anytime between 1 day uptime and 800 days uptime to restart the counter prior to vxAbsTicks rollover. With a proper failover environment, there should be no interruption of service.

Even with a workaround of rebooting each controller prior to the vxAbsTicks rollover, the issue will still be experienced after the reboot of the controllers every ~828 days by arrays with 6.x firmware revision.

This issue is resolved in the firmware 07.35.10.10 or later for the 25xx series and 07.15.11.12 or later for the 6xxx series. The fix  changes the reboot schedule to different times for each controller without the need for a manual reboot. (Reboot date/time can still be scheduled manually).

You cannot upgrade directly from 6.x firmware to 7.x. You must first upgrade to the firmware bundled in the upgrade utility then use CAM to upgrade to the required firmware level.

See also: "Procedure to Upgrade the Sun StorageTek[TM] 6540 Array, 6140 Array or FLX380 Storage Array from Firmware 06.xx to 07.xx." Document: 1131593.1 and "Procedure to Upgrade the Sun StorageTek[TM] 2500 Series Array Controller Firmware from 06.xx to 07.xx."<Document:1319254.1>

The firmware matrix are available in the following documents:

<Document:1021780.1> Sun StorageTek 2510 (iSCSI) Firmware Matrix
<Document:1005365.1> Sun StorageTek 2530 (SAS) Firmware Matrix
<Document:1017877.1> Sun StorageTek 2540 (FC) Firmware Matrix
<Document:1011474.1> Sun StorageTek 6140 Firmware Matrix
<Document:1022296.1> Sun Storage 6180 Firmware Matrix
<Document:1009934.1> Sun StorageTek 6540 Firmware Matrix
<Document:1022298.1> Sun Storage 6580/6780 Firmware Matrix
<Document:1011551.1> Sun StorageTek Flexline 380 (FLX380) with Common Array Manager (CAM) Firmware Matrix

Patches

 

Modification History

18-Jun-2010: Document created, issue is Resolved
01-Jun-2010: Updated for minor formatting issues
15-Dec-2011: Update "Notes" in Workaround section
19-Dec-2011: Update Workaround section for additional information


vxWorks Detail
What is it? - There is a timer in the firmware, specifically
in vxWorks, called vxAbsTicksthat is only a double word long
0x0000 0000.  When this timer rolls over from 0xffff ffff
to 0x0000 0000 (approximately 828.5 days) there is the possibility
that if there is host I/O to volumes, the associated drives will
be failed with a write failure.  Thiswas discovered in 2003,
CR# 68447 was opened against the issue.  The CR# put a function
in the controller firmware called 'cfgMonitorTask' that will
reboot the controllerif the vxAbsTicks value is within 12 days
of 828 days.  This has been in the firmware from 03.xx up to
06.60 firmware.

You can monitor this using the following shell
command:
% vxAbsTicks
vxAbsTicks = 0x2e5540: value = 227183 = 0x3776f

What Happened? - When the conversion from RC1 to RC2 was completed
The functionality in cfgMonitorTask was not ported into 07.xx CFW.
Therefore, this reintroduced the ungraceful vxAbsTicks timer rollover
at approximately 828.5days with the possibility that if there is
host I/O to volumes, the associated drives will be failed with a
write failure.

Where was it fixed? - CR 138248 was added to RC2 trunk prior to
Emerald/Exmoor and is in all subsequent releases which adds the
proactive reboot of the controllers prior to the ungraceful vxAbsTicks
timer rollover.

References

SUNBUG:6872995
SUNBUG:6949589

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback