Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1002435.1
Update Date:2010-07-15
Keywords:

Solution Type  Problem Resolution Sure

Solution  1002435.1 :   Disconnecting any SCSI cable in a dual-hosted StorEdge[TM] D1000 causes SCSI bus errors in remaining host due to loss of termination.  


Related Items
  • Sun Netra st D1000 Array
  •  
  • Sun Storage D1000 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - Other
  •  

PreviouslyPublishedAs
203408


Applies to:

Sun Storage D1000 Array
Sun Netra st D1000 Array
All Platforms

Symptoms

Fatal SCSI bus errors occurred in one of two SunCluster[TM] hosts while the other host was down.

The following messages were observed:

(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@9,0 (sd53):
(timestamp) test unix: SCSI transport failed: reason 'reset': retrying command
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@8,0 (sd52):
(timestamp) test unix: SCSI transport failed: reason 'reset': retrying command
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@0,0 (sd45):
(timestamp) test unix: SCSI transport failed: reason 'reset': retrying command
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@1,0 (sd46):
(timestamp) test unix: SCSI transport failed: reason 'reset': retrying command
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000 (isp3):
(timestamp) test unix: Received unexpected SCSI Reset
:
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@9,0 (sd53):
(timestamp) test unix: Error for Command: write(10) Error Level: Retryable
(timestamp) test unix: Requested Block: 37131864 Error Block: 37131864
(timestamp) test unix: Vendor: SEAGATE Serial Number: -x-x-x-x-x
(timestamp) test unix: Sense Key: Unit Attention
(timestamp) test unix: ASC: 0x29 (), ASCQ: 0x2, FRU: 0x2
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@0,0 (sd45):
(timestamp) test unix: Error for Command: read(10) Error Level: Retryable
(timestamp) test unix: Requested Block: 70729813 Error Block: 70729813
(timestamp) test unix: Vendor: SEAGATE Serial Number: -x-x-x-x-x
(timestamp) test unix: Sense Key: Unit Attention
(timestamp) test unix: ASC: 0x29 (), ASCQ: 0x2, FRU: 0x2
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@1,0 (sd46):
:
:
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000 (isp3):
(timestamp) test unix: Target 8 disabled wide SCSI mode
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000 (isp3):
(timestamp) test unix: Target 0 reducing transfer rate
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000 (isp3):
(timestamp) test unix: Target 1 reducing transfer rate
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000 (isp3):
(timestamp) test unix: Target 9 reducing transfer rate
:
:
(timestamp) test unix: WARNING: /sbus@49,0/QLGC,isp@0,10000/sd@1,0 (sd46):
(timestamp) test unix: Error for Command: write Error Level: Fatal
(timestamp) test unix: Requested Block: 18 Error Block: 19
(timestamp) test unix: Vendor: SEAGATE Serial Number: -x-x-x-x-x
(timestamp) test unix: Sense Key: Hardware Error
(timestamp) test unix: ASC: 0x44 (internal target failure), ASCQ: 0x0, FRU: 0xbe
:
:
(timestamp) test vxvm:vxconfigd: Offlining config copy 1 on disk c1t0d0s2:
(timestamp) test vxvm:vxconfigd: Reason: Disk write failure
:
:
(timestamp) test vxvm:vxconfigd: Detached plex data1-01 in volume data1
(timestamp) test vxvm:vxconfigd: Detached plex data4-01 in volume data4

If data redundancy was not properly configured, the above event could potentially result in data lost and/or service down time.

Changes

In the customer's situation, during the maintenance of one of 2 clustered hosts, customer/service personnel unwittingly disconnected a host SCSI cable from the D1000's host port (either the far right or far left port). The remaining host that is still connected to the D1000 started experiencing recurring and varied SCSI bus errors, that eventually turned fatal. As this host's application were live, i/o were active, and disks were managed by VxVM, the fatal bus errors caused active plexes to be detached.

Cause

The StorEdge[TM] D1000 is a JBOD where the enclosure can be configured as a single SCSI bus for connection to a single host scsi bus adaptor with access to all of the disks; or configured as 2 separate buses for connection to 2 separate hosts with each allowed to access half of the enclosed disks. A third configuration allows for 2 separate but "clustered" hosts (with Sun Cluster software) to share the entire enclosure and all its disks in a single SCSI bus.

This dual-hosted/clustered configuration implies that the single bus is connected to 2 SCSI HBA on separate hosts, where the SCSI bus termination is provided by the host connections since D1000 requires external bus termination.

For diagrams of configuration, refer to "Sun StorEdge A1000 & D1000 Installation, Operation & Service Manual (805-2624)".

For a description of the cabling, refer to  Document 1018089.1  "Sun StorEdge[TM] D1000 array cabling and address" (formerly SunSolve Doc 18451).

Solution

To resolve the problem, the SCSI cable has to be reconnected, or if the cable need to be disconnected for an extended period of time, an external SCSI terminator must be installed in place of the cable.

The connected host may need to be rebooted for the SCSI devices to be re-scanned and properly recognized since device sync speed may have been reduced, or marked offline.

To avoid the above situation, any SCSI cable should only be disconnected when BOTH hosts are shut down or at least ensure that no application is accessing the D1000 during the time.


Internal Comments
For internal Sun use only.
Service Request ID: 10775412

Escalation ID: 1-13232796
Solution ID: 1-13411978

D1000, SunCluster, dual-host, scsi termination
Previously Published As
83317

Change History
Date: 2005-12-05
User Name: 95826
Action: Approved
Comment: - fixed typo
- verified metadata
- changed review date to 2006-12-05
- checked for TM - none added
- checked audience : contract
Publishing
Version: 3
Date: 2005-12-05
User Name: 95826
Action: Accept
Comment:
Version: 0

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback