Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000530.1
Update Date:2011-02-25
Keywords:

Solution Type  Sun Alert Sure

Solution  1000530.1 :   Creating Host Mappings on Sun StorageTek 5320, 6140 and 6540 Arrays May Cause Controller Reboots  


Related Items
  • Sun Storage 6540 Array
  •  
  • Sun Storage 6140 Array
  •  
  • Sun Storage 5320 NAS Appliance
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
200675


Product
Sun StorageTek 6140 Array
Sun StorageTek 5320
Sun StorageTek 6540 Array

Bug Id
<SUNBUG: 6494572>, <SUNBUG: 6496496>

Date of Workaround Release
03-JAN-2007

Date of Resolved Release
21-FEB-2007

Impact

While creating, modifying, or deleting initiators, hosts, host groups, or volumes to LUN mappings on a Sun StorageTek 5320, 6140 or 6540 array, the array controller may reboot due to corruption in the array mapping database. As a result, in some cases the monitoring and management of the array may be disabled.


Contributing Factors

This issue can occur on the following platforms:

  • Sun StorageTek 6140 Array without firmware 06.19.23.10
  • Sun StorageTek 6540 Array without firmware 06.19.23.10
  • Sun StorageTek 5320 Array without patch 119352-05 

Symptoms

This issue is typically seen during array installations, either during registration with Sun StorageTek Common Array Manager (CAM) or during host initiator creation before or after volume initialization. Common symptoms may be similar to the following:

1. Events showing array boot cycles or communication timeouts while changing initiators, hosts, hostgroups, or LUN mappings

2. Snoop traces of LAN for source or destination port number of 2463 may be generated similar to the following:

     26   0.00037 192.168.2.247 -> 192.168.2.245 TCP D=2463 S=32874 Push
    Ack=2133598 775 Seq=2446933261 Len=4 Win=49640
    27   0.17254 192.168.2.245 -> 192.168.2.247 TCP D=32874 S=2463
    Ack=2446933265  Seq=2133598775 Len=0 Win=8192
    28   0.00003 192.168.2.247 -> 192.168.2.245 TCP D=2463 S=32874 Push
    Ack=2133598 775 Seq=2446933265 Len=40 Win=49640
    29   0.00007 192.168.2.245 -> 192.168.2.247 TCP D=32874 S=2463
    Ack=2446933305 Seq=2133598775 Len=0 Win=8192
    30   6.49049 192.168.2.245 -> (broadcast)  ARP C Who is 192.168.2.245,
    192.168.2.245 ?
    31  18.34879 192.168.2.247 -> 192.168.2.245 TCP D=2463 S=32874 Fin
    Ack=2133598775 Seq=2446933305 Len=0 Win=49640
     32   0.00011 192.168.2.245 -> (broadcast)  ARP C Who is 192.168.2.247,
    192.168.2.247 ?
    33   0.00003 192.168.2.247 -> 192.168.2.245 ARP R 192.168.2.247,
    192.168.2.247 is 0:a0:d1:d7:e3:fd
    34   0.00006 192.168.2.245 -> 192.168.2.247 TCP D=32874 S=2463 Rst
    Seq=2133598775 Len=0 Win=0
    35  37.04363 192.168.2.246 -> (broadcast)  ARP C Who is 192.168.2.246,
    192.168.2.246 ?

The above output indicates that the connection was working, then timed out.

Serial connections to the array controller during the above activities may show the following:

    11/02/06-13:46:58 (GMT) (symTask1): PANIC: Assertion failed:
    hostPortCount ==
    hostPortIdx, file spmSymbolObjectBundle.cc, line 385

Notes:

  1. The line and the *.cc file referenced above may not match the message shown during your serial connection; however, any file referenced with "spm" and the array matching the contributing factors is suspected of this bug.
  2. No data access is lost due to this issue being present on a given array.

Workaround

To work around the described issue (during manipulation of initiators, hosts, host groups, or mapping LUNs), reset the array configuration as follows:

via CAM:

Click on your Array Name --> Click on the "Reset Configuration" Button

or, via sscs:

sscs reset array --> array_name

Note: The above workaround may not be applicable in all cases. Additional assistance may be required from your Sun Services Provider.


Resolution

This issue is addressed in the following releases:

  • Sun StorageTek 6140 Array with firmware 06.19.23.10 or later
  • Sun StorageTek 6540 Array with firmware 06.19.23.10 or later
  • Sun StorageTek 5320 Array with patch 119352-05 or later

The above firmware revision is bundled with Common Array Manager (CAM) 5.1 and later, as well as SANtricity 09.19.G2.04 and later.

Common Array Manager 5.1 is available at:

http://www.sun.com/download/products.xml?id=45d3689d

Important Note: The firmware upgrade only prevents new occurrences of the mapping corruption. Should you have the symptoms above, you should contact Sun Services immediately for corrective action. You will need to upgrade as part of this action.



Modification History
Date: 21-FEB-2007
  • State: Resolved
  • Updated Contributing Factors and Resolution sections

Date: 06-APR-2007
  • Updated Contributing Factors and Resolution sections


References

<SUNPATCH: 119352-05>

Previously Published As
102762
Internal Comments


This issue is due to corruption in the array host mapping database, and does *NOT* affect customer data.



This issue most commonly turns up during array registration with CAM. Therefore, we have recommended the array configuration reset as the most common workaround. Besides those methods defined above, turn on debugging in CAM as follows:



Solaris:



Set: log4j.rootLogger=DEBUG in



/var/opt/webconsole/webapps/se6130ui/WEB-INF/classes/log4j.properties



and restart the webserver:



    $ /usr/sbin/smcwebserver restart



Windows:



Set: log4j.rootLogger=DEBUG in



%systemdrive%\Sun\WebConsole\var\opt\webconsole\webapps\log4j.properties



and reboot.



the "debug_se6130ui.log" will show a message similar to the following:




    2006-11-09 15:12:44,371 [HttpProcessor[6789][3]] ERROR
    com.sun.netstorage.array.mgmt.cfg.core.impl.oz.RegistrationHelper -
    Exception stack and context listed below ****

    devmgr.versioned.jrpc.RPCError: TIMEOUT
    at devmgr.versioned.jrpc.TCPChannel.readFully(Unknown Source)
    at devmgr.versioned.jrpc.TCPChannel.receiveMsg(Unknown Source)
    at devmgr.versioned.jrpc.RPCClientTCP.transact(Unknown Source)
    at devmgr.versioned.jrpc.RPCClientGeneric.call(Unknown Source)
    at devmgr.versioned.jrpc.RPCClient.call(Unknown Source)
    at devmgr.versioned.symbol.SYMbolAPIClientV1.getObjectGraph
    (Unknown Source)


"excLogShow" in the "stateCaptureData.dmp" file or via serial connection will show a message similar to the following:




    ================= EXCEPTION LOG =================
    Serial number: 1T63078373
      Entry count: 11
    Wrap-arounds: 0
    First entry time: FEB-26-1970 07:08:35 AM
    Current date/time: NOV-15-2006 09:36:25 AM


Correcting this issue once it occurs is as follows:



***This requires an OUTAGE!!!****



1) serial into either controller to the serial cli at the command prompt "->":



2) loadDebug



3) spmDbClear



4) sysReboot



5) serial into the other controller



6) sysReboot



7) the customer will be required to reconfigure their initiators, hosts, host-groups, and lun mappings. Otherwise, they will be forced to lose data by resetting the array configuration as stated in the Workaround section of this document.



The fix is in firmware release 06.19.xx.10, which is available via escalation to PTS/Backline.



The firmware upgrade does not correct any existing SPM database corruption. It only prevents new corruption from occuring. Customers will still need to go through the above, if they run into this corruption, prior to updating the firmware.


Internal Contributor/submitter
[email protected], [email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Escalation ID
1-20434975, 1-262238603, 1-20623234, 1-269973801, 37744840

Internal Resolution Patches
119352-05

Internal Sun Alert Kasp Legacy ID
102762

Internal Sun Alert & FAB Admin Info
Critical Category: Availability ==> Pervasive
Significant Change Date: 2007-01-03, 2007-02-21
Avoidance: Firmware
Responsible Manager: [email protected]
Original Admin Info: [WF 06-Apr-2007, dave m: update CF and Res, republish]
[WF 21-Feb-2007, karened: re-releasing as Resolved per [email protected]]
[WF 03-Jan-2007, dave m: review completed, send for release]
[WF 02-Jan-2007, dave m: draft created, send for review]

Product_uuid
8ac7dca5-a8bd-11da-85b4-080020a9ed93|Sun StorageTek 6140 Array
9d23ea64-a8be-11da-85b4-080020a9ed93|Sun StorageTek 5320
e35cfcfc-a31a-11da-85b4-080020a9ed93|Sun StorageTek 6540 Array

References

SUNPATCH:119352-05

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback