Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1019002.1
Update Date:2011-02-25
Keywords:

Solution Type  Sun Alert Sure

Solution  1019002.1 :   Collecting Support Data On Certain Arrays May Cause One or Both Array Controllers to Reboot  


Related Items
  • Sun Storage 6540 Array
  •  
  • Sun Storage Flexline 280 Array
  •  
  • Sun Storage 6130 Array
  •  
  • Sun Storage 6140 Array
  •  
  • Sun Storage Flexline 240 Array
  •  
  • Sun Storage Flexline 380 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
231741


Bug Id
<SUNBUG: 6620880>

Product
Sun StorageTek Flexline 240 Array
Sun StorageTek Flexline 280 Array
Sun StorageTek Flexline 380 Array
Sun StorageTek 6130 Array
Sun StorageTek 6140 Array
Sun StorageTek 6540 Array

Date of Workaround Release
15-Feb-2008

Date of Resolved Release
16-Jun-2008

Collecting Support Data On Certain Arrays May Cause One or Both Array Controllers to Reboot

1. Impact

For Sun StorageTek 6130, 6140, 6540 and Flexline 240, 280, and 380 arrays, collecting support data may cause one or both array controllers to reboot. Results can range from device path failovers to loss of access due to both primary and secondary paths to a device being inaccessible.

2. Contributing Factors

This issue can occur on the following platforms and in the following releases:
    • Sun StorEdge 6130 arrays with array firmware 06.19.25.10 (or later)
    • Sun StorageTek 6140, 6540 arrays with array firmware 06.16.81.10 (or later)
    • Sun StorageTek Flexline 240, 280, 380 arrays with array firmware 06.19.25.00 (or later)
    • Sun StorageTek SANtricity Storage Manager 06.16 (or later)
    • Sun StorageTek Common Array Manager 6.0
      Notes:

      This issue can only occur during array support data collection using SANtricity or Common Array Manager (CAM).  Specifically the "Capture State" function of the collection which generates the stateCaptureData.dmp file.

      The following specific circumstances lead to this bug:

      1) The Host Port/HBA must be or have been seen in the SAN by one or both controllers. This can be verified if it is listed in the WWPN drop down in CAM, or as  an un-aliased Host Port in SANtricity. This means that "fake" WWPNs that are aliased on the array, or proactively created before the SAN is zoned WILL NOT cause this issue.

      2) The Host Port/Initiator must have an existing alias. This means that not only had the HBA been seen by the array, but at some point the user had created an alias for it to assign to a specific host for Volume-to-LUN mapping on the array.

      3) Since having been aliased, as in step 2, the Host Port/Initiator must have either:

      A) been removed from (unliased) and placed in the "Free" list (available as a drop down in CAM)
      or
      B) had the alias name changed

      4) After conditions 1-3 are met, the execution of collecting the "All Support Data" or specifically the "State Capture" (SANtricity Only) will cause the controllers to panic.

      3. Symptoms

      At a minimum, users will see device path fail-overs.  At worst, they will see a loss of access due to both primary and secondary paths to a device being inaccessible.

      This issue can only occur during array support data collection. Host message events showing SCSI or Fibre Channel messages indicating loss of access to one or both array controllers are the most common symptom associated with this issue. SANtricity or CAM may throw an error during collection, similar to the following (CAM error shown):
          ionShow 99 fails!
      devmgr.v0916api13.sam.jal.ManagementOperationFailedException:
      ManagementOperationFailedExceptionError 1007 - Could not communicate
      with the controller in slot A to complete this request.
      The controller(s) will boot due to a defined inititiator or host port not being seen by the array at the time the data collection command is run.

      4. Workaround

      This issue can be avoided by not collecting support data when you know that:

      A) An Initiator/Host Port alias for a connected HBA has been changed

      B) An Initiator/Host Port alias has been deleted from the array. This is normal if you replace/remove an HBA as part of reallocation or hardware replacement.

      If either of the above, consider performing a controller reset to avoid an unplanned outage.

      Otherwise, individual collections of "Alarms", "Array Profiles", and "Major Event Logs" are possible within each of the management utilities.

      The following CLI commands can be run to obtain support data for CAM:
          service -d <array_name> -c print -t arrayprofile > arrayProfileSummary.txt

      service -d <array_name> -c print -t mel > majorEventLog.txt

      service -d <array_name> -c read -q nvsram region=0xEE > NVSRAMdata.txt

      service -d <array_name> -c print -t rls > readLinkStatus.csv

      sscs list alarm > alarms.txt

      The following SMcli or script editor commands can be run under SANtricity:

      save controller [a] NVSRAM file="A-NVSRAMdata.txt"

      save controller [b] NVSRAM file="B-NVSRAMdata.txt"

      save allDrives logFile="driveDiagnosticData.bin"

      save storageArray configuration file="storageArrayConfiguration.txt" allConfig

      save storageArray allEvents file="majorEventLog.txt

      save storageArray performanceStats file="persistentReservations.txt"

      save storageArray RLSCounts file="readLinkStatus.csv"

      show storageArray profile

      5. Resolution

      This issue is addressed in the following releases:
      • Sun StorEdge 6130, 6140, 6540 arrays with array firmware 06.60.11.10 or later
      • Sun StorageTek Flexline 380 array with array firmware 06.60.11.10 or later
      • Sun StorageTek Flexline 240, 280 arrays with array firmware 06.60.11.20 or later
      • Sun StorageTek SANtricity Storage Manager 10.10 or later
      • Sun StorageTek Common Array Manager 6.1.0 or later
      The above firmware releases are available with CAM 6.1.0 at:

      http://www.sun.com/download/index.jsp

      View by Category -> Systems Administration -> Storage Management

      This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.

      Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.


      Modification History
      16-Jun-2008: Updated Contributing Factors and Resolution sections; Now Resolved
      14-Jul-2008: Updated Workaround section for corrections


      Internal Comments
      Please send technical questions to the following email:
      [email protected]
      and CC the following persons:
      Internal Contributor/Submitter
      Internal Eng Responsible Engineer
      Internal Services Knowledge Engineer

      Note: Firmware 07.10.25.10
      is the next revision up, but is not bundled with CAM. In order to obtain
      this Firmware, a call with Sun Support is required.



      If stateCapture is required, please escalate the issue, and a tailored serial collection sequence will be made available.

      Controllers will report an exception in the exception log(excLogShow) as in the following example:

      ---- Log Entry #33 OCT-10-2007 08:27:55 AM ----
      Exception: Invalid Opcode
         pc:  0x00000780   (Unknown Program Counter)
      Registers:
         edi    = 1e330955   esi    =        0   ebp    = 1d549f08   esp    =
      1d549ec
      4
         ebx    =       14   edx    =        0   ecx    =        8   eax
      =
      0
         eflags =      246   pc     =      780
      Stack Trace:
        53cdba vxTaskEntry         +a   : vkiTask (1d54a8cc, 0, 0, 0, 0, 0, 0,
      0, 0,
      0)
        4bb6eb vkiTask             +bb  : srcOpTask (1e33b2f0, 0, 0, 0, 0, 0,
      0, 0, 0
      , 0, 1d54a6c0, 4bb640, 0, 0, eeeeeeee, 1d54a6a4, ...)
      1d9c325b srcOpTask           +db  : cmdProcess ([17b651d8, ffffffff,
      1d54a640,
      297, 1d54a8cc, ...]) ???
      1dbb32ff cmdProcess          +6f  : symSYMbolCommandHandler (17b651d8,
      &cmdE0,
      0, 53cf13, 17b651d8, 3429334d)
      1dd9fd54 symSYMbolCommandHandler+44  : svciov_dispatch (17a8ba10,
      17b651d8, 1d5
      4a5a4, &cmdE0)
      1ddb26bf svciov_dispatch     +3cf : stateCapture_1 ([17a7b780, 17b651d8,
      54, 17
      a7b780, &cmdE0, ...]) ???
      1dd9dd91 stateCapture_1      +41  : systemStateCapture ([17a7b75c, 1,
      a010241,
      1ddb28b9, 0, ...]) ???
      1ddf5737 systemStateCapture  +6b7 : ionShow (63, 0, 0, 0, 0, 0, 0, 0, 0,
      0, 0,
      0)
      1e0defa2 ionShow             +52  : showState__CQ23ion10IonManager
      ([1d264434,
      1d54a39c, 1d54a390, 1e0def5f, &ionShow, ...]) ???
      1e0df0e6 showState__CQ23ion10IonManager+a6  :
      tditnShow__CQ23ion10IonManageri (
      1d264434, 0, 1ddf4fa0, 1d54a39c)
      1e103977 tditnShow__CQ23ion10IonManageri+a7  : tditn +340 ([0, 43108e,
      1d54a310
      , 1e1038e4, 1, ...]) ???
      1e102f2e tditn               +42e : tditn +70 (0, 4, 0, 1e3308c7)
      ********
      Task Id:         0x1d54a6f0
      Name:            "symTask1"
      Status:          0x00 (ready)
      Options:         0x0004 (dealloc_stk)
      Priority:        125
      Stack base:      0x1d54a6f0
      Stack end:       0x1d5476f0 (adjusted for name)
      Stack size:      0x3000 (12288)
      Stack margin:    0x17c (380)
      Stack limit:     0x1d5479b8
      Pend queue:      0x17a7b498
      Last errno:      0x1c0001
      value = 1 = 0x1
      ->
      Internal Contributor/submitter
      [email protected]

      Internal Eng Responsible Engineer
      [email protected]

      Internal Services Knowledge Engineer
      [email protected]

      Internal Eng Business Unit Group
      NWS (Network Storage)

      Internal Escalation ID
      37959518, 37970982, 1-22943001, 65766342

      Internal Sun Alert & FAB Admin Info
      13-Feb-2008, david m: draft created, send for review
      14-Feb-2008, david m: revised draft considerable, resend for review
      16-Jun-2008, david m: CAM release, republish resolved


      Attachments
      This solution has no attachment
        Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
       Feedback