Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1010521.1
Update Date:2010-05-31
Keywords:

Solution Type  Problem Resolution Sure

Solution  1010521.1 :   Sun StorEdge[TM] 3310/3320/3510/3511: Running sccli command may result in SCSI timeouts  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage RAID Manager (RM6) Software
  •  
  • Sun Storage A3000 Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
  • Sun Storage A1000 Array
  •  
  • Sun Netra st A1000 Array
  •  
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage A3500 FC Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - Other
  •  

PreviouslyPublishedAs
214456


Symptoms
This phenomenon appears on systems connected with Sun StorEdge[TM] A3x00/A1000 (hence referred to by Sonoma) and Sun StorEdge[TM] 3310/3510/3511 (hence referred to by Minnow).
For managing the Minnow array(s) from the host, you must have Sun StorEdge Configuration Service software, which has:
  SUNWsccli -> Sun StorEdge 3000 Family CLI

as one of the components for managing the Sun StorEdge 3310/SE351x using command line interface.

It has been observed that you may have SCSI timeouts reported by the LUNs of the Sonoma array in the following circumstances:

  • SCCLI is invoked on the host to manage the Minnow array(s).
  • Sonoma arrays are connected on the same host.
  • One of the following is true:
  1. /usr/bin/osa/bin/parityck is being run for the LUNs or is being run from the GUI of Raid Manager software (Rm6). Note: Raid Manager software manages the Sonoma devices.
  2. /usr/bin/osa/bin/parityck is being run at the same time from both hosts to the same LUN's on the Sonoma array ( where the configuration includes multi-host connections to the Sonoma array ).
  3. There is a heavy I/O load on the LUNs of Sonoma. Note: This problem happens only if you invoke SCCLI without any options.

The following are examples of error messages generated when invoking SCCLI on a system where parityck is running for LUN 0:

Example 1: These messages are generated if the Sonoma is connected on a SBUS system with ISP1000U host bus adapter (HBA), which uses isp HBA driver:

    Apr  7 14:22:31 unix: WARNING: /sbus@1f,0/QLGC,isp@0,10000/sd@5,0 (sd5):
Apr  7 14:22:31  SCSI transport failed: reason 'timeout': retrying command
Apr  7 14:22:33 unix: WARNING: /sbus@1f,0/QLGC,isp@0,10000/sd@5,0 (sd5):
Apr  7 14:22:33  Error for Command: verify                  Error Level: Retryable
Apr  7 14:22:33 unix:    Requested Block: 0                         Error Block: 0
Apr  7 14:22:33 unix:    Vendor: Symbios                            Serial Number:   <    M@k
Apr  7 14:22:33 unix:    Sense Key: Unit Attention
Apr  7 14:22:33 unix:    ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0

Example 2: These messages are generated if the Sonoma array is connected on a PCI-based system with GLM host based adapter(HBA).

    Mar 14 01:41:39 scsi: [ID 365881 kern.notice] /pci@8,700000/scsi@2 (glm2):
Mar 14 01:41:39        Cmd (0x16fda10) dump for Target 5 Lun 0:
Mar 14 01:41:39 scsi: [ID 365881 kern.notice] /pci@8,700000/scsi@2 (glm2):
Mar 14 01:41:39                cdb=[ 0x2f 0x8 0x1 0x35 0x7d 0x95 0x0 0x7f 0xff 0x0 ]
Mar 14 01:41:39 scsi: [ID 365881 kern.notice] /pci@8,700000/scsi@2 (glm2):
Mar 14 01:41:39        pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0x7
Mar 14 01:41:39 scsi: [ID 365881 kern.notice] /pci@8,700000/scsi@2 (glm2):
Mar 14 01:41:39        pkt_scbp=0x0 cmd_flags=0xe1
Mar 14 01:41:39 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/scsi@2 (glm2):
Mar 14 01:41:39        Disconnected tagged cmd(s) (1) timeout for Target 5.0
Mar 14 01:41:39 genunix: [ID 408822 kern.info] NOTICE: glm2: fault detected in device; service still available
Mar 14 01:41:39 genunix: [ID 611667 kern.info] NOTICE: glm2: Disconnected tagged cmd(s) (1) timeout for Target 5.0
Mar 14 01:41:39 glm: [ID 401478 kern.warning] WARNING: ID[SUNWpd.glm.cmd_timeout.6018]
Mar 14 01:41:39 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/scsi@2/sd@5,0 (sd35):
Mar 14 01:41:39        SCSI transport failed: reason 'reset': retrying command
Mar 14 01:41:39 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/scsi@2/sd@5,0 (sd35):
Mar 14 01:41:39        SCSI transport failed: reason 'timeout': retrying command

The root cause of the problem seems to be either

  • When SCCLI is invoked without arguments, it scans all the devices in the device tree, causing the SCSI timeouts (above).

or

  • The settings Monitor_ParityTime and Monitor_ParityDay in the rmparams
    file on both hosts are set to the same values. You should not run parityck to the same storage at the same time.


Resolution
The above mentioned problem doesn't happen if one of the following is true:
  • SCCLI is invoked out of band by specifying the IP address of the Minnow.
  • SCCLI is invoked with the device path for the SE3x10/SE351x device. For example, # sccli /dev/rdsk/c1t0d0s2


Additional Information
A new version 3310/3510/3511 management software SUNWsscs 2.0 is now available and we can download it from Sun Download Center.

In the new 2.0 software, we no more have SUNWsccli package and it has been replaced by just one SUNWsscs package which contains everything. It has also been found after testing is that the probability of this problem happening with 2.0 SUNwsscs is less so the user should upgrade to 2.0 to mimimize the chances of being hit by this problem.

#pkginfo -l SUNWsscs
  PKGINST:  SUNWsscs
  NAME:  Sun StorEdge(tm) Configuration Service
  CATEGORY:  application
      ARCH:  sparc
   VERSION:  2.4.0,REV=2007.07.04.18.18
   BASEDIR:  /opt
    VENDOR:  Sun Microsystems, Inc.
      DESC:  Sun StorEdge(tm) Configuration Service
    PSTAMP:  2007/07/04 at 18:18
  INSTDATE:  Nov 11 2009 08:24
   HOTLINE:  Please contact your local service provider
    STATUS:  completely installed
     FILES:      352 installed pathnames
                   8 shared pathnames
                  23 directories
                  21 executables
               24994 blocks used (approx)

RAID Manager 6.22 Software
RAID Manager 6.22.1
Sun StorageTek 3510 FC Array
Sun StorageTek 3310 SCSI Array
Sun StorageTek 3320 SCSI Array
Netra st A1000 Array
Sun StorageTek A3500 Array
Sun StorageTek A3000
Sun StorageTek 3511 SATA Array

A1000, A3500, SE3310, SE3510, SE3511, sccli, SCSI error, timeout, parityck, cluster, multihost, glm
Previously Published As
75533

Change History
Date: 2010-04-26
User Name: 79977
Action: Currency check
Comment: Added product 3320 as per CL [email protected]
Verified Keywords - ok

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback