Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1332352.1
Update Date:2011-12-27
Keywords:

Solution Type  FAB (standard) Sure

Solution  1332352.1 :   Customers configuring HW RAID with 16-Slot Disk Backplanes on a SPARC T3-1 system could corrupt data.  


Related Items
  • SPARC T3-1
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun FAB
  •  




In this Document
  Symptoms
  Changes
  Cause
  Solution


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: FABs are available to Internals and Partners only

Applies to:

SPARC T3-1 - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Information in this document applies to any platform.
__________

SUNBUG: 6999386, 6999411, 6999436, 6952042

Affected X-Options:

SE3Y5BB1Z - 16-Slot Disk Backplane

Symptoms

A single disk drive can be configured in multiple HW RAID arrays on a SPARC T3-1 systems with 16-Slot Disk Backplanes.  16 slot trays need two firmware updates to handle partitioning disks correctly across multiple controllers and a data restore is required after firmware updates.

It is possible to create two HW RAID volumes using different onboard LSI 2008 disk controllers containing the same physical disk in each volume, as the example below shows.

Controller 0

------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 2
Volume ID : 905
Status of volume : Okay (OKY)
RAID level : RAID1
Size (in MB) : 285148
Physical hard disks :
PHY[0] Enclosure#/Slot# : 2:14
PHY[1] Enclosure#/Slot# : 2:15


Controller 1

------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 2
Volume ID : 905
Status of volume : Okay (OKY)
RAID level : RAID1
Size (in MB) : 285148
Physical hard disks :
PHY[0] Enclosure#/Slot# : 2:13
PHY[1] Enclosure#/Slot# : 2:14

This occurs since both LSI 2008 SAS disk controllers have connectivity to ALL disks via a SAS expander chip (located on SPARC T3-1 16-Slot Disk Backplanes). That being said, each controller can access each disk independently but yet has no visibility of the HW RAID configurations on the other controller.  When using the underlying disk this is not an issue since Solaris provides multipath access via the SCSI VHCI driver, but when using a HW RAID volume we are able to use the same physical region of a disk in multiple mounted filesystems.

This is not an issue with SPARC T3-1 systems with an 8-Slot Disk Backplanes, because the two onboard LSI 2008 disk controllers are directly connected to only 4 of the disks and do not have connectivity to the 4 disks connected to the other controller. Since there is no expander on the 8-Slot Disk Backplane, it is not a multi-initiator configuration.

Creation of IR volumes in Solaris using "sas2ircu" (LSI Utility) doesn't remove disks from /dev .

A IR HW RAID volume gets created and is enabled in a two or more disk RAID (ie. RAID0, RAID1 or RAID1E), but since the two disks used in the volume are not removed:

- They are listed in FORMAT
- Remain active disks
- Can be newfs'd/ mounted and used along with any other HW RAID volume just create
that used them.

=================================================================================

# sas2ircu 0 create raid0 max 2:3 2:4 stripe-slot3-4
LSI Corporation SAS2 IR Configuration Utility.
Version 4.00.00.00 (2009.10.12)
Copyright (c) 2009 LSI Corporation. All rights reserved.

You are about to create an IR volume.

WARNING: Proceeding with this operation may cause data loss or data
corruption. Are you sure you want to proceed (YES/NO)? yes

WARNING: This is your last chance to abort this operation. Do you wish
to abort (YES/NO)? no
Please wait, may take up to a minute...
SAS2IRCU: Volume created successfully.
SAS2IRCU: Command CREATE Completed Successfully.
SAS2IRCU: Utility Completed Successfully.

=================================================================================
# sas2ircu 0 status
LSI Corporation SAS2 IR Configuration Utility.
Version 4.00.00.00 (2009.10.12)
Copyright (c) 2009 LSI Corporation. All rights reserved.

Background command progress status for controller 0...
IR Volume 1
Volume ID : 904
Current operation : None
Volume status : Enabled
Volume state : Optimal
Physical disk I/Os : Not quiesced
IR Volume 2
Volume ID : 905
Current operation : None
Volume status : Enabled
Volume state : Optimal
Physical disk I/Os : Not quiesced
SAS2IRCU: Command STATUS Completed Successfully.
SAS2IRCU: Utility Completed Successfully.

========================================================
# format
Searching for disks...done

c0t5000C5001D0C49C7d0: configured with capacity of 279.38GB
c0t5000C5001D096987d0: configured with capacity of 279.38GB
c2t3CE5552A2C028206d0: configured with capacity of 555.97GB


AVAILABLE DISK SELECTIONS:
0. c0t5000C5001D0C3D9Bd0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000c5001d0c3d9b
1. c0t5000C5001D0C49C7d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> <<----
/scsi_vhci/disk@g5000c5001d0c49c7
2. c0t5000C5001D0CA947d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000c5001d0ca947
3. c0t5000C5001D0CACD3d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000c5001d0cacd3
4. c0t5000C5001D0D1283d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000c5001d0d1283
5. c0t5000C5001D0D2857d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000c5001d0d2857
6. c0t5000C5001D096987d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> <<-----
/scsi_vhci/disk@g5000c5001d096987
7. c0t5000CCA00A02E4A8d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca00a02e4a8
8. c0t5000CCA00A02F5D0d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca00a02f5d0
9. c0t5000CCA00A02F114d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca00a02f114
10. c0t5000CCA00A4BFC38d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca00a4bfc38
11. c0t5000CCA00A0100B0d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca00a0100b0
12. c0t5000CCA00A01014Cd0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca00a01014c
13. c0t5000CCA00A4984C8d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca00a4984c8
14. c1t4d0 <ATA-MARVELLSD88SA02-D10R cyl 23435 alt 2 hd 16 sec 128>
/pci@400/pci@1/pci@0/pci@4/scsi@0/iport@10/disk@p4,0
15. c2t3CE5552A2C028206d0 <LSI-LogicalVolume-3000 cyl 65533 alt 2 hd 128 sec 139> <-----
/pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0
16. c2t34266AB7BB43E993d0 <LSI-LogicalVolume-3000 cyl 65533 alt 2 hd 64 sec 139>
/pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w34266ab7bb43e993,0
17. c3t4d0 <ATA-MARVELLSD88SA02-D10R cyl 23435 alt 2 hd 16 sec 128>
/pci@400/pci@2/pci@0/pci@4/scsi@0/iport@10/disk@p4,0
============================================================================
# sas2ircu 0 display
LSI Corporation SAS2 IR Configuration Utility.
Version 4.00.00.00 (2009.10.12)
Copyright (c) 2009 LSI Corporation. All rights reserved.

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller type : SAS2008
BIOS version : 0.00.00.00
Firmware version : 5.00.00.00
Channel description : 1 Serial Attached SCSI
Initiator ID : 0
Maximum physical devices : 831
Concurrent commands supported : 1871
Slot : Unknown
Segment : 0
Bus : 1024
Device : 0
Function : 0
RAID Support : Yes
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 1
Volume ID : 904
Volume Name : stripe-slot3-4
Status of volume : Okay (OKY) <<-----
RAID level : RAID0
Size (in MB) : 570296
Physical hard disks :
PHY[0] Enclosure#/Slot# : 2:3
PHY[1] Enclosure#/Slot# : 2:4
============================================================================
Console messages from creation...

May 12 12:22:03 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0):
May 12 12:22:03 wgs48-116 PhysDiskNum 2 with DevHandle 0xe in slot 0 for enclosure with handle 0x0 is now , active, write cache enabled
May 12 12:22:03 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0):
May 12 12:22:03 wgs48-116 PhysDiskNum 3 with DevHandle 0xf in slot 0 for enclosure with handle 0x0 is now , active, write cache enabled
May 12 12:22:05 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0):
May 12 12:22:05 wgs48-116 Volume 2 is now , enabled, inactive
May 12 12:22:05 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0):
May 12 12:22:05 wgs48-116 Volume 0 is now , enabled, active
May 12 12:22:05 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22):
May 12 12:22:05 wgs48-116 Corrupt label; wrong magic number
May 12 12:22:05 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22):
May 12 12:22:05 wgs48-116 Corrupt label; wrong magic number
May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d096987 (sd12):
May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number
May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d096987 (sd12):
May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number
May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d0c49c7 (sd13):
May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number
May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d0c49c7 (sd13):
May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number
May 12 12:22:12 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22):
May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number
May 12 12:22:12 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22):
May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number

The SPARC T3-1 has two SAS2 controllers connecting up to sixteen disks by one expander (multipath configuration). The two controllers share access to these same disks. When you create HW RAID on one controller, the other controller won't get notification of this, but the path from this controller is still workable. This is why the HW RAID physical disk stays available to the OS.

After system reboots, the non-RAID SAS2 controller rescans all disks and finds the two physical disks with RAID meta data. The two drives will be marked inactive and invisible to OS then.

Impact


Possible risk to data integrity if customer uses Solaris commands to configure HW RAID using the LSI2008 controllers if a single disk drive is configured in multiple HW RAID arrays.

A customer can also "lose" access to disks that are configured in multiple RAID volumes after a reboot.

SPARC T3-1 systems (w/ 16-Slot Disk Backplane) ship from the factory with both onboard LSI2008 disk controllers connected to the backplane, which is a multi-initiator configuration.

LSI's Integrated RAID (IR) firmware does not support multi-initiator with a volume present.  However, IR may be used WITHOUT volumes present in a multi-initiator environment. The OEM must ensure the user does not create volumes in this case.

Changes

Contributing Factors

SPARC T3-1 systems with a 16-Slot Disk Backplane that does not have "zoning" enabled.

Cause

Root Cause

SPARC T3-1 systems (with 16-Slot Disk Backplane) ship from the factory with both onboard LSI2008 disk controllers connected to the backplane, which is a multi-initiator configuration where both controllers have connectivity to all of the disks in the system.  So, a single disk drive can be configured in multiple HW RAID volumes using the onboard disk controllers.

There is a possible risk to data integrity if customer uses Solaris commands to configure hw RAID using the 2008 controllers if a single disk drive is configured in multiple HW RAID volumes.

Also, because the onboard LSI 2008 disk controllers have Integrated RAID (IR) firmware, this configuration with a 16-Slot Disk Backplane has been deemed unsupported by one of Oracle's partners (LSI) and that corrective action is needed to rectify the issue.

Solution

Workaround

No workaround available - see Resolution section below.

Resolution

The below remediation is normally done by the customer, therefore, the intent of this FAB is to instruct the service representative on what to tell the customer to do should they encounter this issue.

Backup the data on the system and install Patch 147034-01 from My Oracle Support (MOS), which will update the LSI SAS2 Expander firmware to force "zoning" on the 16-Slot Disk Backplane. Install system fw 8.1.0.c (147315-02 or later) on the SPARC T3-1 system and then restore the data per the details below.

Note: See README.147034-01 from Patch 147034-01 for detailed step-by-step procedures
for implementing the fix accompanied by examples.
1. Backup the customer data on the SPARC T3-1 system.

2. Apply patch 147034-01 from My Oracle Support (MOS), which will update the
LSI SAS2 Expander firmware to enable zoning on the 16 Disk Backplane, which
   is available at <SunPatch:147034-01>.

Patch 147034-01 causes the backplane to be partitioned into two disk zones, which
have the following characteristics:

- Zone A consists of backplane slots 0 through 7. Disks in zone A are manage
exclusively by onboard SAS-2 controller 0. Disks in zone A are visible only
to each other and to controller 0. Disks in zone A are not visible to any
devices in zone B.

- Zone B consists of backplane slots 8 through 15. Disks in zone B are managed
exclusively by onboard SAS-2 controller 1. Disks in zone B are visible only
to each other and to controller 1. Disks in zone B not visible to any devices
in zone A.

The patch contains two images for the expander firmware, the manufacturing image
and the base firmware image. Both images must be updated.

To update the expander firmware, use the 'fwupdate' command from the Hardware
Management Pack (HMP) 2.1.1 for SPARC.

See "Oracle Hardware Management Pack 2.1.1 Supports SPARC Platforms" in the T3-1
Product Notes for more information.

3. Apply patch 147315-02 (or later) on the SPARC T3-1 system, which can be found
   at <SunPatch:147315-02>.

4. Configure any RAID volumes on the T3-1 system.

The below should be considered when setting up RAID volumes:

- Since there are now 2 zones of 8 disks each, no HW RAID volumes can be created
using the onboard 2008 disk controllers that contain more than 8 disks.

- HW RAID volumes cannot be created using the onboard 2008 disk controllers that
contain disks from both zones.

5. Restore the data on the SPARC T3-1 system.

References

BugID: 6999386, 6999411, 6999436, 6952042
Resolution Patches: 147315-02, 147034-01


For information about FAB documents, its release processes, implementation strategies and billing information, click here.  Please note that this is an Internal Only link.

In addition to the above you may email:

   [email protected]



Contacts

@ Contributor: [email protected]
@ Responsible Engineer: [email protected], [email protected]
@ Responsible Manager: [email protected]
@ Business Unit Group: Systems Group-SVS
    (SPARC Volume Systems, Horizontal Systems,(includes T2000/Ontario)


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback