Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000641.1
Update Date:2011-02-24
Keywords:

Solution Type  Sun Alert Sure

Solution  1000641.1 :   Host Does Not See LUNs on StorEdge 6120/6320 After "Volslice Create/Remove" Commands When Connected to McData FC Switches  


Related Items
  • Sun Storage 6320 System
  •  
  • Sun Storage 6120 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
200850


Product
Sun StorageTek 6120 Array
Sun StorageTek 6320 System

Bug Id
<SUNBUG: 5056021>

Date of Workaround Release
02-DEC-2004

Date of Resolved Release
09-NOV-2007

Impact

The use of certain sscs(1M) commands, GUI actions or commands run from telnet(1) to manage a Sun StorEdge 6120/6320 Array attached via McData Fibre Channel (FC) switches (listed below), and with Host Bus Adapters (HBA) using the Sun QLC HBA driver (listed below), may cause loss of connectivity to a host(s). As a result, the use of these commands may cause multiple path failures, which could lead to a complete loss of host access to the array.


Contributing Factors

This issue can occur on the following platforms:

  • Sun StorEdge 6120/6320 Arrays without firmware 3.2.7

when connected to the following Fibre Channel switch models:

  • McData 4300/4500/6064/6140 switches running E/OS 06.02.00 or earlier

and the following Host Bus Adapters (HBA):

  • PCI Dual FC Network Adapter+ - Option 6757
  • PCI Single FC Host Adapter - Option 6799
  • PCI Dual FC Network Adapter+ - Option 6727
  • 2GB PCI Single FC Network Adapter - Option SG-XPCI1FC-QF2
  • 2GB PCI Dual FC Network Adapter - Option SG-XPCI2FC-QF2
  • Compact PCI (cPCI) Dual Channel FC - Option 6748

The described issue may occur in the configurations described above when the following sscs(1M) commands, or array/StorEdge 3900SL CLI commands are issued:

sscs(1M) commands:

  • sscs modify volgroup
  • sscs create volume
  • sscs create initiator
  • sscs create pool
  • sscs modify array
  • sscs add initgroup

StorEdge 6120/T3+ telnet(1) commands:

  • lun perm
  • hwwn
  • volslice
  • vol mount
  • sys mp_support

StorEdge 3900SL Service Processor (SP) CLI commands: (the following menu options in the program "/opt/SUNWsecfg/runsecfg"):

  • 3) Configure Sun StorEdge T3+ Array(s)
  • 6) Modify Sun StorEdge T3+ Array Sys Parameters
  • 8) Manage Sun StorEdge T3+ Array LUN Slicing
  • 9) Manage Sun StorEdge T3+ Array LUN Masking

The following commands from the directory "/opt/SUNWsecfg/bin" on the Service Processor (SP):

  • createt3group
  • addtot3group
  • delfromt3group
  • rmt3group
  • createt3slice
  • rmt3slice
  • modifyt3config
  • savet3config
  • modifyt3params
  • sett3lunperm

Note: Equivalent StorEdge 3900SL/6320 GUI actions to these commands may also cause this issue to occur.

The following "read-only" commands will NOT trigger the described issue:

  • lun perm list
  • hwwn list
  • hwwn listgrp
  • volslice list

sscs(1M) "read-only" commands:

  • sscs create volume
  • sscs remove volume

StorEdge 6120 "read-only" telnet(1) commands:

  • volslice
  • lun perm
  • hwwn

To determine if a system uses the Sun QLC HBA driver for its connection to an array, do the following:

Run the format(1M) command as the "root" user, and choose a LUN on each controller path for each 6120/6320 array from the output list:

# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
	  /ssm@0,0/pci@1a,700000/pci@2/SUNW,isptwo@4/sd@0,0
1. c0t6d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
	  /ssm@0,0/pci@1a,700000/pci@2/SUNW,isptwo@4/sd@6,0
--> 2. c3t60003BA27CC6B00040C472BF000262B0d0 <SUN-T4-0301 cyl 11705 alt 2 hd 7 sec 128>
	  /scsi_vhci/ssd@g60003ba27cc6b00040c472bf000262b0...
	  [lines omitted] ...
 Specify disk (enter its number): ^D
#

In the above example, we see an STMS device path for a StorEdge 6120 device on "c3", so we need to know which HBA(s) are used to access that STMS device. This is done by using the "luxadm display" command.

The following steps need to be repeated for each different controller number on a system for any controller numbers which are used to connect to StorEdge 6120/6320 arrays:

1) As "root" user, run "luxadm display" on the STMS device path, in order to see the physical HBA(s) which are used to access that STMS device. This uses the STMS device name from format(1M) (in this example, that device name is c3t60003BA27CC6B00040C472BF000262B0d0) prepended with "/dev/rdsk" and adding the suffix "s2" (assuming the Solaris standard label with a slice 2 has been used on that device; if this is not the case in your environment, then add a slice number which is used on the STMS device):

# luxadm display /dev/rdsk/c3t60003BA27CC6B00040C472BF000262B0d0s2
DEVICE PROPERTIES for disk:
/dev/rdsk/c3t60003BA27CC6B00040C472BF000262B0d0s2
Vendor:               SUN
Product ID:           T4
Revision:             0301
Serial Num:           Unsupported
Unformatted capacity: 5121.812 MBytes
Write Cache:          Enabled
Read Cache:           Enabled
Minimum prefetch:   0x0
Maximum prefetch:   0x0
Device Type:          Disk device
Path(s):
/dev/rdsk/c3t60003BA27CC6B00040C472BF000262B0d0s2
/devices/scsi_vhci/ssd@g60003ba27cc6b00040c472bf000262b0:c,raw
-->   Controller
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1/fp@0,0
Device Address              20030003ba27cc6b,6
Host controller port WWN    210000e08b0aadac
Class                       primary
State                       ONLINE
-->   Controller
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0
Device Address              20030003ba27cc63,6
Host controller port WWN    210100e08b2aadac
Class                       secondary
State                       STANDBY
#

2) From the above output of the "luxadm display" command, note the 2 physical HBA paths used for connecting to the StorEdge 6120 LUN:

/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1/fp@0,0
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0

Note how these device paths contain the string "SUNW,qlc". This means that these HBAs will be using the Sun QLC driver.


Symptoms

Should the described issue occur, "lun failover" messages and host messages from STMS reporting that LUNs are being offlined, and that the paths allowing access to those LUNs are now degraded due to the loss of one path, will be displayed in the array syslog, similar to the following:

    [date time hostname] scsi: [ID 243001 kern.info]
/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0 (fcp1):
[date time hostname] offlining lun=1f (trace=0), target=90100
(trace=2800004)
...
[date time hostname] Initiating failover for device ssd (GUID
60003ba27cc6b00040c473c3000525ab)
[date time hostname] mpxio: [ID 669396 kern.info]
/scsi_vhci/ssd@g60003ba27cc6b00040c473d600076246 (ssd0) multipath status
: degraded, path /ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0 (fp1) to
target address: 20030003ba27cc63,1f is offline. Load balancing:
round-robin

Note: The above are examples only. On each system, the LUN numbers, target numbers and device paths will vary. To identify that this issue is being seen, check the target trace value ("trace=2800004" above) and the overall sequence of events, where many LUNs failover, and a path is reported to be "offline", after performing any of the commands shown in section 2.


Workaround

There are two methods available to clear this condition. The first is to offline and online the affected port on the McData switch. Offlining/onlining the switch port can be done through the CLI or through the GUI. The second is to reset the SE6120 (which is not recommended since it takes a long time to reboot the array). The following describes the steps for offlining/onlining the switch port, first using the CLI and next using the GUI:

I. Port Online/Offline Using CLI:

  1. Telnet to the switch (default login/passwd is: Administrator/password)
  2. execute 'config port blocked <port#> true
  3. execute 'config port blocked <port#> false

II. Port Online/Offline Using GUI:

  1. Start the EFCM GUI (default login/passwd is: Administrator/password) or SANpilot.
  2. Double click on the icon for the subject switch
  3. Double click on the UPM card that controls the port to offline/online
  4. Right click on the port and select Block Port (the box to the left should not be checked). Then select OK. Note the port LED turning off.
  5. Right click on the port and select Block Port (the box to the left should be checked now). Then select OK. Note the port LED turning on.

Once the above steps are completed and the proper status is displayed on the switch, you should be able to execute the luxadm display command for each affected LUN, and see that the Primary Paths are in a state of "ONLINE" and the secondary paths are in a state of "STANDBY".


Resolution

This issue is addressed on the following platforms:

  • Sun StorEdge 6120/6320 Arrays with firmware 3.2.7 (as delivered in patch 116931-23 or later)

with:

  • McData 4300/4500/6064/6140 switches running E/OS 8.02.00 (as delivered in patch 119554-01 or later)


Modification History
Date: 09-NOV-2007
  • Updated Contributing Factors and Resolution sections
  • State: Resolved


References

<SUNPATCH: 119554-01>
<SUNPATCH: 116931-23>

Previously Published As
101609
Internal Comments



The following bugs are related to the issue and are driving changes in different areas:



Bug 5056021: 6120 array is doing LOGO with 000000 S_ID when hwwn is added .....



Bug 5089209: Host does not see luns after SE6210 volslice create/remove W/ McData Switch



Bug 6179794: introduce delay to completing hwwn/volslice/lun perm commands to avoid high frequency of link events



Bug 5072558: qlc responds with FLOGI after receiving LOGO



Sun Alert 57609: The Use of Certain sscs(1M) Commands, Array/StorEdge 3900SL CLI Commands, or Certain StorEdge 3900SL/6320 GUI Actions to Manage a Sun StorEdge 3900SL/6120/6320/T3+ Array, Attached via Certain Brocade Switches, May Cause Loss of Connectivity to a Host(s)



Bug 5109873: Host does not see luns after SE6210 volslice create/remove W/ McData Switch



Bug 5068068: Host offlines SE6x20 target when multiple hwwns are added on the target side



Info Doc 78347: Host does not see LUNs after SE6120(T4) volslice create/remove with McData Switch



Here is a summary of the problem from infodoc 78347:



"Host loses access to LUNs after a volslice create or remove is executed on the 6120 array. The configuration in which this issue shows up is when one zone is created on the McData switch with two hosts and both of the SE6120 host ports are connected to the same zone. When a volslice create/remove is executed, the frontend ISP2300 interface initiates and completes the NOS/OLS/LR/LRR handshake with the switch, then the SE6120 sends an FLOGI and before the switch sends the ACC, it sends RSCNs to the registered hosts indicating the SE6120 state change. As a result of receiving the RSCNs, the host sends the PLOGI to the SE6120, this is not acknowledged because it did not receive the ACC for the FLOGI. This condition seems to confuse the the HBA (which happens to Qlogic) and causes the host not to see the LUNs when a format is executed.



Note: This configuration has been tested without any issues on the Brocade switch. Also, the above described issue shows up with McData E/OS firmware revs 4.02.00 to 6.01.00 on all switch modules (4300, 4500, 6064, 6140)."



This is a summary of the issues from McData:



- storage port sending frames with bad S-ID



After the port bounces from a config change there is a 5 millisecond window between when the port goes offline and sends out a FLOGI. The trace shows that the storage receives a Plogi from the HBA's and sends an LS-ACC back with a source ID of 00-00-00 which is not routable. THe storage port should wait for the Flogi accept for its address before sending the LS_ACC.



- hba not properly recovering the sequence after the storage ports bounce



on the trace showing the failure, the storage port and the switch get synced up and stable, the HBA PLOGI's the array for the 2nd time (the first time fails due to the frames come in prior to the flogi acc and the storage port acc's with a bad S-ID), then PRLI's but never does a Report Luns or anything beyond the PRLI. Also, the HBA never ABTS the original Plogis sent before the flogi Acc.



- the switch is hitting a timing issue between the route tables being updated and the name server servicing requests so frames are routed to the storage port before the Flogi acc goes out (this is not against the spec). Also, the switch drops the Flogi from the second storage port due to using the software port state - PR#45428 - fixed in E/OS 7.0.


Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Escalation ID
1-3879461

Internal Resolution Patches
119554-01, 116931-23

Internal Sun Alert Kasp Legacy ID
101609, 57689 (Sun Alert)

Internal Sun Alert & FAB Admin Info
Critical Category: Availability ==> Pervasive
Significant Change Date: 2004-12-02, 2007-11-09
Avoidance: Patch, Workaround
Responsible Manager: null
Original Admin Info: [WF 09-Nov-2007, dave m: sent email for update, resolved with patches, rerelease]
This document has been imported from KMS Creator and may need adjustment before re-publishing.

This imported document has been reviewed/adjusted by:
Review Name:
Review Date:

Original KMS Creator attributes below:

--- PLEASE DO NOT MAKE ANY CHANGES BELOW THIS LINE! ---

Sun Alert ID: 57689
Synopsis: Host Does Not See Luns on StorEdge 6120/6320 After "Volslice Create/Remove" Commands When Connected to McData FC Switches
Category: Availability
Product: Sun StorEdge 6120/6320 Arrays
BugIDs: 5056021
Avoidance: Workaround
State: Engineering Complete
Date Released: 02-Dec-2004
Date Closed:
Date Modified:
Escalation IDs: 1-3879461
Pending Patches:
Resolution Patches:
FIN:
FCO:
Date Submitted: 23-Nov-2004
Submitter: [email protected]
Responsible Engineer: [email protected]
Responsible Manager:
CTE group: NWS
Responsible Writer: [email protected]
Distribution: Contract SunSolve

Workflow History:

WF State: Issued, 02-Dec-2004, David Mariotto
WF Note: sending for release


WF State: Draft, 01-Dec-2004, David Mariotto
WF Note: Approved by McData per Joseph Poon
waiting on final approval from Joe for synopsis change
release today

WF State: Draft, 30-Nov-2004, David Mariotto
WF Note: final corrections per Chessin - waiting on McData
approval for release.

WF State: Draft, 29-Nov-2004, David Mariotto
WF Note: sending for review

WF State: Draft, 24-Nov-2004, David Mariotto
WF Note: email to submitter for McData reference

WF State: Draft, 23-Nov-2004, David Mariotto
WF Note: Article created.

Exported from KMS Creator Sat May 21 09:14:54 2005 GMT, [email protected]
Internal SA-FAB Eng Submission
Host Does Not See Luns on StorEdge 6120/6320 After "Volslice Create/Remove" Commands When Connected to McData FC Switches

Product_uuid
2cd2e7d2-2980-11d7-9c3f-c506fe37b7ef|Sun StorageTek 6120 Array
4de60cc2-a08e-4610-b8bf-6a1881cb59c6|Sun StorageTek 6320 System

References

SUNPATCH:116931-23
SUNPATCH:119554-01

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback