Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1390444.1
Update Date:2012-01-27
Keywords:

Solution Type  Problem Resolution Sure

Solution  1390444.1 :   T3-1 : Sub-optimal I/O write performance on multi initiator disks  


Related Items
  • SPARC T3-1
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T3
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>CMT Servers
  •  




In this Document
  Symptoms
  Cause
  Solution
  References


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: This process is intended for Oracle and Partner support personnel only

Created from <SR 3-4908878978>

Applies to:

SPARC T3-1 - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Information in this document applies to any platform.

Symptoms

T3-1 systems have dual LSI disk controllers and when configured with the x16 slot SAS backplane provides multi initiator access to all internal drives, allowing multipath access via MPxIO when SAS zoning is not enabled.

By default MPxIO uses round robin based load balancing which means I/O will be spread across both target ports and can lead to poor write performance when the cache is overrun.
# time dd if=/dev/zero of=/seagate/1g bs=1024k count=1024
1024+0 records in
1024+0 records out

real 0m52.638s
user 0m0.009s
sys 0m2.072s
#
This is only seen when performing writes to disks under ZFS control, raw and UFS writes report faster completion times;
# time dd if=/dev/zero of=/dev/rdsk/c0t5000C5000A913CEFd0 bs=1024k count=1024
1024+0 records in
1024+0 records out

real 0m14.370s
user 0m0.007s
sys 0m0.339s
#
During testing it was found that Seagate drives performed significantly worse than Hitachi models;
# time dd if=/dev/zero of=/hitachi/1g bs=1024k count=1024
1024+0 records in
1024+0 records out

real 1.2
user 0.0
sys 1.2
#

Cause

Sequential I/O is not streamed when writes are sent down multiple paths in round robin based load balancing, this causes poor write throughput.

Solution

The fix is to switch to logical block based load balancing (LBA) - data up to a certain size range will be pushed down a single path, this range is determined by a variable defined in the scsi_vhci configuration.

Firstly add the following to /kernel/drv/scsi_vhci.conf for the impacted drive, in this example we use Seagate (ST product ID);
device-type-mpxio-options-list =
"device-type=SEAGATE ST", "load-balance-options=logical-block-options";
logical-block-options="load-balance=logical-block", "region-size=16";
'region-size' determines how much data goes down a single path - region in this instance refers to the multipathed device. This parameter is to the power of 2 which gives the byte blocks, divided by 2 for the kbytes and then 1024 for megabytes - for example;

16 to the power of 2 = 65,536 byte blocks /2 = 32,768 kbytes /1024 = 32MB is the region size, meaning data blocks of 32MB will be sent to a single path before switching to the second target.

Tuning of the region-size will be dependent on customer requirements and load, however the default is 18 which gives a 128MB region size. This is in contrast to sequential writes in round robin that will ping pong between paths for each block size write.

Reboot the system for the changes to take effect, review dmesg and mpathadm output to confirm the correct policy is applied;
0. c0t5000C5000A913CEFd0 <SEAGATE-ST930003SSUN300G-0D70-279.40GB>
/scsi_vhci/disk@g5000c5000a913cef

Dec 29 19:33:40 t3-1 genunix: [ID 834635 kern.info] /scsi_vhci/disk@g5000c5000a913cef (sd6) multipath status: optimal, path /pci@400/pci@2/pci@0/pci@4/scsi@0/iport@f (mpt_sas2) to target address: w5000c5000a913ced,0 is online Load balancing: logical-block, region-size: 18

mpathadm list lu | grep -i 5000C5000A913CEF
/dev/rdsk/c0t5000C5000A913CEFd0s2

mpathadm show lu /dev/rdsk/c0t5000C5000A913CEFd0s2
Logical Unit: /dev/rdsk/c0t5000C5000A913CEFd0s2
mpath-support: libmpscsi_vhci.so
Vendor: SEAGATE
Product: ST930003SSUN300G
Revision: 0D70
Name Type: unknown type
Name: 5000c5000a913cef
Asymmetric: no
Current Load Balance: logical-block <<<<<< LBA enabled
Logical Unit Group ID: NA
Auto Failback: on
Auto Probing: NA

Paths:
Initiator Port Name: 5080020000d6ab40
Target Port Name: 5000c5000a913ced
Override Path: NA
Path State: OK
Disabled: no

Initiator Port Name: 5080020000d6ab41
Target Port Name: 5000c5000a913ced
Override Path: NA
Path State: OK
Disabled: no

Target Ports:
Name: 5000c5000a913ced
Relative ID: 0
Multi initiator is disabled for systems with SAS zoning enabled, so this only impacts systems prior to 147034-01 being applied - or when the zoning FW has been updated but has been disabled via 'zoningcli'.

References

<BUG:6930636> - USE LBA LOAD BALANCING TO WORK AROUND 6929352 AND SIMILAR
<NOTE:1332352.1> - Customers configuring HW RAID with 16-Slot Disk Backplanes on a SPARC T3-1 system could corrupt data.

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback