Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1019005.1
Update Date:2011-02-17
Keywords:

Solution Type  Sun Alert Sure

Solution  1019005.1 :   Persistent Reservation Commands Processed Slowly on Sun StorageTek Arrays May Cause Loss of Access or Timeouts to Filesystems  


Related Items
  • Sun Storage 6540 Array
  •  
  • Sun Storage 6140 Array
  •  
  • Sun Storage Flexline 380 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
231801


Bug Id
<SUNBUG: 6639618>

Product
Sun StorageTek 6140 Array
Sun StorageTek 6540 Array
Sun StorageTek Flexline 380 Array

Date of Preliminary Release
15-Feb-2008

Date of Resolved Release
10-Sep-2008

Persistent Reservation Commands Processed Slowly on Sun StorageTek Arrays May Cause Loss of Access or Timeouts to Filesystems

1. Impact


Sun StorageTek 6140, 6540 and Flexline 380 arrays attached to servers running cluster software that employs the SCSI-3 reservation commands of "PrIN" or "PrOUT" may see severe performance degradation, general loss or application loss of filesystem access, and command timeouts to devices serviced by the affected arrays.

2. Contributing Factors


This issue can occur on the following platforms:
  • Sun StorageTek 6140 Array running firmware without 06.60.11.10 (maintenance update) or 07.10.25.10 (feature update) or later firmware
  • Sun StorageTek 6540 Array running firmware without 06.60.11.10 (maintenance update) or 07.10.25.10 (feature update) or later firmware
  • Sun StorageTek Flexline 380 Array without 06.60.11.20 or 07.10.25.10 or later firmware
connected to servers running:
  • Sun Cluster 3.x (for Solaris 8, 9 or 10) with three(3) or more nodes in the cluster
  • Windows 2000/2003 using cluster software
Note: All versions of Sun Cluster 3.x and earlier that use SCSI-3 reservation commands may be affected by this issue.

3. Symptoms


Two basic symptoms can occur:

On the host:

1) The hosts show that filesystems experience hangs, performance loss, or loss of access.  Sun Clusters typically show a SCSI timeout on one or more LUNs, or cause filesystems to unmount.

Solaris Example from /var/adm/messages:

Nov 15 16:53:06 myhost
/scsi_vhci/ssd@g600a0b8000325ad200004c07471f48c6 (ssd719):
Command Timeout on path /pci@11,700000/SUNW,emlxs@0/fp@0,0 (fp1)
Nov 15 16:53:11 myhost scsi: [ID 107833 kern.warning] WARNING:
/scsi_vhci/ssd@g600a0b8000325ad200004c07471f48c6 (ssd719):
Nov 15 16:53:11 myhost      SCSI transport failed: reason 'timeout':giving up
Nov 15 16:53:11 myhost Cluster.scdpmd: [ID 977412 daemon.notice] The
state of the path to device: /dev/did/rdsk/d62s0 has changed to FAILED

Windows Cluster Example:

17/12/2007      17:25:13        ClusSvc Error   Resource Monitor
1145    N/A     myhost   Cluster resource Disk BD1 (L:)
timed out. If the pending timeout is too short for this resource,
consider increasing the pending timeout value.
17/12/2007      17:25:13        ClusSvc Error   Resource Monitor
1145    N/A     myhost   Cluster resource Disk F: timed
out. If the pending timeout is too short for this resource, consider
increasing the pending timeout value.

2) The array shows NO ERRORS in the array Event Log during the period of time that the host system(s) had their symptoms.


Notes:
  1. Typically, SCSI-3 Persistent Reservations are performed during volume/LUN management on the array, cluster failover, array path failover.  Any of these actions may be a catalyst for this issue.
  2. Other standalone or clustered hosts using the same array can be impacted by this issue, as all IO to the array from any host will be affected.

4. Workaround


There is no workaround for this issue.

5. Resolution

For 6140, 6540, and Flexline 380 arrays staying with the 06.xx firmware

6.60.11.10 is bundled with Sun StorageTek Common Array Manager(CAM) 6.1.0 or later:

CAM can be downloaded from

http://www.sun.com/downloads or 
http://www.sun.com/download/index.jsp?tab=2#S

For 6140, 6540, and Flexline 380 arrays using Sun StorageTek SANtricity, customers wishing to get firmware 06.60.11.10 or later

Please contact Sun Support for the firmware release bundle.  You will also need to get a new copy of SANtricity 10.10.  Please order it from:  https://www2.sun.de/dct/forms/reg_us_1508_643_0.jsp

For 6140, 6540, and Flexline 380 arrays going to the 07.10 firmware feature release

07.10.25.10 firmware is a feature release and requires a service call for an upgrade.  It is not bundled with CAM or SANtricity.

CAM 6.1.0 or later is required
SANtricity 10.10 or later is required

Please contact Sun Support if you require the 07.10 feature release update.

This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.

Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.


Modification History
29-May-2008: Updated Contributing Factors and Resolution sections
10-Sep-2008: Updated Resolution section; Resolved


Internal Comments
Please send technical questions to the following email:
[email protected]
and CC the following persons:
Internal Contributor/Submitter
Internal Eng Responsible Engineer
Internal Services Knowledge Engineer

Please see FAB 236969 http://sunsolve.sun.com/search/document.do?assetkey=1-63-236969-1:
Crystal firmware 7.10.xx.xx upgrades will require a special Upgrade
Utility to install. For 07.10 firmware requests.





Additional data that can be reviewed for issue validation(should it be
required):





  1. The customer meets the contributing factors section of this document


  2. The customer experiences timeouts, filesystem issues, loss of


     access, or application failure


  3. There are no errors on in the Major Event Log during the time of


     the loss of access


  4. A serial session of the output of "srcAnalyze 1,1,1" showing a


     PRIN or PROUT command is taking on the order of 1 second to 2


     minutes to complete.





The serial output can be gathered by using manual access to the serial
shell, or by using the "selsiser" command available at:


http://pts-storage.west/products/SE6130/Tools.html.



Simply place the command alone in a file and use it as the script file
for running commands.





SAMPLE USAGE(see readme file for more details):





selsiser -f "scriptfile" -o speed=9600 -o cntr=se6540 -o





logfile=logfile6540 -p /dev/term/b





###Start scriptfile###





scrAnalyze(1,1,1,0,0,0,0,0)





###End scriptfile###





Example srcAnalyze output:





>>>>>>>>>>>>>> RUN
COMMAND >>>>>>>>
srcAnalyze(1,0,0,0,0,0,0,0,0,0)M





Total ops: 63^M





 1





 SrcOp:3f9af3e8  IopId:-1971022329  Hst:14  Vol:1f  HstLun: e





Qtype:20  Qtag:01c50021





 Times===> Start:59695743  Cur:59695921  Elapsed:2966 ms





 CDB: 5e 00 00 00 00 00 00 01 08 00^M





  Owner: RAID Engine





Location: Unknown





 Flags : 0x00020012 Delivered Active AutoSns^M





 2





 SrcOp:1f9b5e60  IopId:2141383 <http://sunsolve.sun.com/search/document.do?assetkey=urn:cds:docid:1-1-2141383-1>  Hst:11  Vol:1a  HstLun:29  Qtype:20





Qtag:02fe819b





 Times===> Start:  475638  Cur:  476625  Elapsed:  16 sec





 CDB: 5f 06 00 00 00 00 00 00 18 00





  Owner: High-level driver





Location: Unknown





 Flags : 0x0006001e Delivered RecvData Pause Active AutoSns NonCacheBfr





The commands are identified by the first byte of the CDB:





5e = PRIN  Persistent Reservation IN


5f = PROUT Persistent Reservation OUT


2a = Read


2e = Write


12 = Inquiry





Other commands can and will also be seen in the srcAnalyze output.  They
also can be listed as being on the queue to be serviced for a long time,
which can be comparable to the PRIN or PROUT command.



Internal Contributor/submitter
[email protected]

Internal Eng Responsible Engineer
[email protected], [email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Escalation ID
44195345 http://radspweb.holland.sun.com/webteamcgi/task.cgi?key=44195345>71085332 http://rad-spweb.central.sun.com/webteamcgi/task.cgi?key=71085332>71204936 http://rad-spweb.central.sun.com/webteamcgi/task_closure.cgi?key=71204936

Internal Sun Alert & FAB Admin Info
14-Feb-2008, david m: draft created, sent for review
15-Feb-2008, david m: send for release
29-May-2008, david m: updated CF and Res sections for firmware release
10-Sep-2008, david m: update Res section, now resolved, updated INTERNAL section for support


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback