![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type FAB (standard) Sure Solution 1332352.1 : Customers configuring HW RAID with 16-Slot Disk Backplanes on a SPARC T3-1 system could corrupt data.
In this Document
Oracle Confidential (PARTNER). Do not distribute to customers
Applies to:SPARC T3-1 - Version: Not Applicable to Not Applicable - Release: N/A to N/AInformation in this document applies to any platform. __________ SUNBUG: 6999386, 6999411, 6999436, 6952042 Affected X-Options: SE3Y5BB1Z - 16-Slot Disk Backplane SymptomsA single disk drive can be configured in multiple HW RAID arrays on a SPARC T3-1 systems with 16-Slot Disk Backplanes. 16 slot trays need two firmware updates to handle partitioning disks correctly across multiple controllers and a data restore is required after firmware updates.It is possible to create two HW RAID volumes using different onboard LSI 2008 disk controllers containing the same physical disk in each volume, as the example below shows. Controller 0 ------------------------------------------------------------------------ IR Volume information ------------------------------------------------------------------------ IR volume 2 Volume ID : 905 Status of volume : Okay (OKY) RAID level : RAID1 Size (in MB) : 285148 Physical hard disks : PHY[0] Enclosure#/Slot# : 2:14 PHY[1] Enclosure#/Slot# : 2:15 Controller 1 ------------------------------------------------------------------------ IR Volume information ------------------------------------------------------------------------ IR volume 2 Volume ID : 905 Status of volume : Okay (OKY) RAID level : RAID1 Size (in MB) : 285148 Physical hard disks : PHY[0] Enclosure#/Slot# : 2:13 PHY[1] Enclosure#/Slot# : 2:14 This occurs since both LSI 2008 SAS disk controllers have connectivity to ALL disks via a SAS expander chip (located on SPARC T3-1 16-Slot Disk Backplanes). That being said, each controller can access each disk independently but yet has no visibility of the HW RAID configurations on the other controller. When using the underlying disk this is not an issue since Solaris provides multipath access via the SCSI VHCI driver, but when using a HW RAID volume we are able to use the same physical region of a disk in multiple mounted filesystems. This is not an issue with SPARC T3-1 systems with an 8-Slot Disk Backplanes, because the two onboard LSI 2008 disk controllers are directly connected to only 4 of the disks and do not have connectivity to the 4 disks connected to the other controller. Since there is no expander on the 8-Slot Disk Backplane, it is not a multi-initiator configuration. Creation of IR volumes in Solaris using "sas2ircu" (LSI Utility) doesn't remove disks from /dev . A IR HW RAID volume gets created and is enabled in a two or more disk RAID (ie. RAID0, RAID1 or RAID1E), but since the two disks used in the volume are not removed: - They are listed in FORMAT - Remain active disks - Can be newfs'd/ mounted and used along with any other HW RAID volume just create that used them. ================================================================================= # sas2ircu 0 create raid0 max 2:3 2:4 stripe-slot3-4 LSI Corporation SAS2 IR Configuration Utility. Version 4.00.00.00 (2009.10.12) Copyright (c) 2009 LSI Corporation. All rights reserved. You are about to create an IR volume. WARNING: Proceeding with this operation may cause data loss or data corruption. Are you sure you want to proceed (YES/NO)? yes WARNING: This is your last chance to abort this operation. Do you wish to abort (YES/NO)? no Please wait, may take up to a minute... SAS2IRCU: Volume created successfully. SAS2IRCU: Command CREATE Completed Successfully. SAS2IRCU: Utility Completed Successfully. ================================================================================= # sas2ircu 0 status LSI Corporation SAS2 IR Configuration Utility. Version 4.00.00.00 (2009.10.12) Copyright (c) 2009 LSI Corporation. All rights reserved. Background command progress status for controller 0... IR Volume 1 Volume ID : 904 Current operation : None Volume status : Enabled Volume state : Optimal Physical disk I/Os : Not quiesced IR Volume 2 Volume ID : 905 Current operation : None Volume status : Enabled Volume state : Optimal Physical disk I/Os : Not quiesced SAS2IRCU: Command STATUS Completed Successfully. SAS2IRCU: Utility Completed Successfully. ======================================================== # format Searching for disks...done c0t5000C5001D0C49C7d0: configured with capacity of 279.38GB c0t5000C5001D096987d0: configured with capacity of 279.38GB c2t3CE5552A2C028206d0: configured with capacity of 555.97GB AVAILABLE DISK SELECTIONS: 0. c0t5000C5001D0C3D9Bd0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000c5001d0c3d9b 1. c0t5000C5001D0C49C7d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> <<---- /scsi_vhci/disk@g5000c5001d0c49c7 2. c0t5000C5001D0CA947d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000c5001d0ca947 3. c0t5000C5001D0CACD3d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000c5001d0cacd3 4. c0t5000C5001D0D1283d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000c5001d0d1283 5. c0t5000C5001D0D2857d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000c5001d0d2857 6. c0t5000C5001D096987d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> <<----- /scsi_vhci/disk@g5000c5001d096987 7. c0t5000CCA00A02E4A8d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000cca00a02e4a8 8. c0t5000CCA00A02F5D0d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000cca00a02f5d0 9. c0t5000CCA00A02F114d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000cca00a02f114 10. c0t5000CCA00A4BFC38d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000cca00a4bfc38 11. c0t5000CCA00A0100B0d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000cca00a0100b0 12. c0t5000CCA00A01014Cd0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000cca00a01014c 13. c0t5000CCA00A4984C8d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /scsi_vhci/disk@g5000cca00a4984c8 14. c1t4d0 <ATA-MARVELLSD88SA02-D10R cyl 23435 alt 2 hd 16 sec 128> /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@10/disk@p4,0 15. c2t3CE5552A2C028206d0 <LSI-LogicalVolume-3000 cyl 65533 alt 2 hd 128 sec 139> <----- /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 16. c2t34266AB7BB43E993d0 <LSI-LogicalVolume-3000 cyl 65533 alt 2 hd 64 sec 139> /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w34266ab7bb43e993,0 17. c3t4d0 <ATA-MARVELLSD88SA02-D10R cyl 23435 alt 2 hd 16 sec 128> /pci@400/pci@2/pci@0/pci@4/scsi@0/iport@10/disk@p4,0 ============================================================================ # sas2ircu 0 display LSI Corporation SAS2 IR Configuration Utility. Version 4.00.00.00 (2009.10.12) Copyright (c) 2009 LSI Corporation. All rights reserved. Read configuration has been initiated for controller 0 ------------------------------------------------------------------------ Controller information ------------------------------------------------------------------------ Controller type : SAS2008 BIOS version : 0.00.00.00 Firmware version : 5.00.00.00 Channel description : 1 Serial Attached SCSI Initiator ID : 0 Maximum physical devices : 831 Concurrent commands supported : 1871 Slot : Unknown Segment : 0 Bus : 1024 Device : 0 Function : 0 RAID Support : Yes ------------------------------------------------------------------------ IR Volume information ------------------------------------------------------------------------ IR volume 1 Volume ID : 904 Volume Name : stripe-slot3-4 Status of volume : Okay (OKY) <<----- RAID level : RAID0 Size (in MB) : 570296 Physical hard disks : PHY[0] Enclosure#/Slot# : 2:3 PHY[1] Enclosure#/Slot# : 2:4 ============================================================================ Console messages from creation... May 12 12:22:03 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0): May 12 12:22:03 wgs48-116 PhysDiskNum 2 with DevHandle 0xe in slot 0 for enclosure with handle 0x0 is now , active, write cache enabled May 12 12:22:03 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0): May 12 12:22:03 wgs48-116 PhysDiskNum 3 with DevHandle 0xf in slot 0 for enclosure with handle 0x0 is now , active, write cache enabled May 12 12:22:05 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0): May 12 12:22:05 wgs48-116 Volume 2 is now , enabled, inactive May 12 12:22:05 wgs48-116 scsi: /pci@400/pci@1/pci@0/pci@4/scsi@0 (mpt_sas0): May 12 12:22:05 wgs48-116 Volume 0 is now , enabled, active May 12 12:22:05 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22): May 12 12:22:05 wgs48-116 Corrupt label; wrong magic number May 12 12:22:05 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22): May 12 12:22:05 wgs48-116 Corrupt label; wrong magic number May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d096987 (sd12): May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d096987 (sd12): May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d0c49c7 (sd13): May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number May 12 12:22:12 wgs48-116 scsi: WARNING: /scsi_vhci/disk@g5000c5001d0c49c7 (sd13): May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number May 12 12:22:12 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22): May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number May 12 12:22:12 wgs48-116 scsi: WARNING: /pci@400/pci@1/pci@0/pci@4/scsi@0/iport@v0/disk@w3ce5552a2c028206,0 (sd22): May 12 12:22:12 wgs48-116 Corrupt label; wrong magic number The SPARC T3-1 has two SAS2 controllers connecting up to sixteen disks by one expander (multipath configuration). The two controllers share access to these same disks. When you create HW RAID on one controller, the other controller won't get notification of this, but the path from this controller is still workable. This is why the HW RAID physical disk stays available to the OS. After system reboots, the non-RAID SAS2 controller rescans all disks and finds the two physical disks with RAID meta data. The two drives will be marked inactive and invisible to OS then. Impact Possible risk to data integrity if customer uses Solaris commands to configure HW RAID using the LSI2008 controllers if a single disk drive is configured in multiple HW RAID arrays. A customer can also "lose" access to disks that are configured in multiple RAID volumes after a reboot. SPARC T3-1 systems (w/ 16-Slot Disk Backplane) ship from the factory with both onboard LSI2008 disk controllers connected to the backplane, which is a multi-initiator configuration. LSI's Integrated RAID (IR) firmware does not support multi-initiator with a volume present. However, IR may be used WITHOUT volumes present in a multi-initiator environment. The OEM must ensure the user does not create volumes in this case. ChangesContributing FactorsSPARC T3-1 systems with a 16-Slot Disk Backplane that does not have "zoning" enabled. CauseRoot CauseSPARC T3-1 systems (with 16-Slot Disk Backplane) ship from the factory with both onboard LSI2008 disk controllers connected to the backplane, which is a multi-initiator configuration where both controllers have connectivity to all of the disks in the system. So, a single disk drive can be configured in multiple HW RAID volumes using the onboard disk controllers. There is a possible risk to data integrity if customer uses Solaris commands to configure hw RAID using the 2008 controllers if a single disk drive is configured in multiple HW RAID volumes. Also, because the onboard LSI 2008 disk controllers have Integrated RAID (IR) firmware, this configuration with a 16-Slot Disk Backplane has been deemed unsupported by one of Oracle's partners (LSI) and that corrective action is needed to rectify the issue. SolutionWorkaroundNo workaround available - see Resolution section below. Resolution The below remediation is normally done by the customer, therefore, the intent of this FAB is to instruct the service representative on what to tell the customer to do should they encounter this issue. Backup the data on the system and install Patch 147034-01 from My Oracle Support (MOS), which will update the LSI SAS2 Expander firmware to force "zoning" on the 16-Slot Disk Backplane. Install system fw 8.1.0.c (147315-02 or later) on the SPARC T3-1 system and then restore the data per the details below. Note: See README.147034-01 from Patch 147034-01 for detailed step-by-step procedures for implementing the fix accompanied by examples. 1. Backup the customer data on the SPARC T3-1 system. References For information about FAB documents, its release processes, implementation strategies and billing information, click here. Please note that this is an Internal Only link.
Attachments This solution has no attachment |
||||||||||||
|