Troubleshooting Disk Performance Issues on Sun Storage Flexline 240/280/380 and Sun Storage 2500 and 6000 Arrays

Asset ID:	1-75-1411763.1
Update Date:	2012-07-18
Keywords:

Solution Type Troubleshooting Sure

Solution 1411763.1 : Troubleshooting Disk Performance Issues on Sun Storage Flexline 240/280/380 and Sun Storage 2500 and 6000 Arrays

Applies to:

Sun Storage 6130 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage Flexline 380 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage Flexline 240 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6180 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2540-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Purpose

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Storage Disk 2000, 3000, 6000 RAID Arrays & JBODs Community.

The purpose of this document is to help users identify potential performance issues in the Sun Storage Flexline 240/280/380 and Sun Storage 2500 and 6000 Arrays. Understand that performance issues are not covered under a support contract unless there is a hardware failure. For further information regarding this please see Document 166650.1 Working Effectively With Global Customer Support.

Troubleshooting Steps

Identify Fault.

We need to identify any critical Storage Array faults that may be occurring by using <Document 1021057.1> How to verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 and J4000 Critical Faults via the User Interface. If you require help in identifying the Storage Array type first follow <Document 1021066.1> Verify Sun Storage[TM] Array type via User Interface.

The following table lists the most common faults that may cause a potential performance problem on the Storage Array. Find the fault that is listed in your Sun Storage Common Array Manager (CAM) Alarms page or Sun StorageTek SANtricity Recovery Guru and proceed to the step listed next to it. If there are no faults found or the fault listed is not found in the table below please go to Step 9.

Critical Fault Found in CAM Alarm	Critical Fault Found in SANtricity Recovery Guru	Remedy
REC_BATTERY_NEAR_EXPIRATION	Battery Nearing Expiration	Go to step 2
REC_FAILED_BATTERY	Battery Failed	Go to step 2
REC_OFFLINE_CTL	Controller Failed or Offline	Go to step 3
REC_FAILED_DRIVE	Drive Failed	Go to step 4
REC_LOST_REDUNDANCY_DRIVE	Individual Drive Lost Redundancy	Go to step 4
REC_VOLUME_HOT_SPARE_IN_USE	Hot Spare In Use	Go to step 4
REC_DRIVE_BYPASSED_CAUSE_UNKNOWN	Drive Bypassed	Go to step 4
REC_DEGRADED_VOLUME	Volume Degraded	Go to step 4
REC_FAILED_VOLUME_INTERRUPTED_WRITE	Volume Failed Due to Interrupted Write	Go to step 4
REC_FAILED_VOLUME	Volume Failed	Go to step 4
REC_NON_PREFERRED_PATH	Volume Not On Preferred Path	Go to step 5
REC_FAILED_TRANSCEIVER_MODULE	Failed SFP/GBIC	Go to step 6
REC_FAILED_ESM	Failed ESM	Go to step 7
REC_FAILED_DRIVE_SCSI_CHANNEL	Failed Drive Channel	Go to step 8
REC_CHANNEL_DEGRADED	Degraded Drive Channel	Go to step 8
REC_CHANNEL_FAILED	Failed Drive Channel	Go to step 8
REC_LOST_REDUNDANCY_DRIVE	Drive Path Redundancy Lost	Go to step 8
REC_LOST_REDUNDANCY_TRAY	Drive Tray Path Redundancy Lost	Go to step 8
REC_LOST_REDUNDANCY_ESM	ESM Path Redundancy Lost	Go to step 8
Any Fault Not Listed Above	Any Fault Not Listed Above	Go to step 9
No Fault Listed	No Fault Found	Go to step 9

Battery Faults.

A battery fault can cause write cache to become disabled. Write cache will not be re-enabled until the battery has been replaced and fully charged. A newly replaced battery can take upwards of 12 hours to recharge. Depending on your Storage Array model please select one or more of the following documents for troubleshooting procedures depending on the fault found in the table listed in Step 1.
- 25xx, 25xx-M2, 6x80 - <Document 1021054.1> Troubleshooting Sun Storage[TM] Array SMART Battery Faults and
- FLX2x0, FLX380, 6130, 6x40 - <Document 1392919.1> Troubleshooting Sun Storage[TM] Array non-SMART Battery Faults.
If the battery issue has been resolved and there are no further faults found on the storage array yet the performance issue persists or the battery issue is still unresolved then please go to step 9.

This section is Internal Only.
NOTE for L1/L2 engineers - Sometimes it's best to check the status of the ccmShowState (07.xx.xx.xx) and ccmStateAnalyze (06.xx.xx.xx) in the stateCaptureData.dmp file during a battery issue (and sometimes even if there is no perceived battery issue) to see if the both controllers agree on the state of the cache for each volume as well as to see if Mirroring is enabled. Always check the output for both controllers.

Example for 06.xx with Cache properly enabled for all volumes: Controller A Executing ccmStateAnalyze(99,0,0,0,0,0,0,0,0,0): Controller: B Array Mode: A/A Controller Mode: Active Alternate Controller Mode: Active Controller Flags: BPR ABPR BOK ABOK ACMA Battery Status: OK <---Sometimes this will show as "Unknown" when there is an issue. Alternate Battery Status: OK <---Sometimes this will show as "Unknown" when there is an issue. CHECK-IN STATUS: Local Checked In Alternate Checked In Mirror Device Open <---May show closed due to problems. VOL 0: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 1: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 2: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 3: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 4: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 5: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 6: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 7: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 8: Flags: 0x04e6 RCA WCE WCA CME CMA VOL 9: Flags: 0x04e6 RCA WCE WCA CME CMA RCA = Read Cache Active WCE = Write Cache Enabled WCA = Write Cache Active CME = Cache Mirroring Enabled CMA = Cache Mirroring Active Example for 7.xx with Cache properly enabled for all volumes: Controller A Executing ccmShowState(0,0,0,0,0,0,0,0,0,0) on controller A: Controller: A # Volumes Mirroring: 128 # Volumes w/ECD: 0 MirrorReady: Yes <---Check here to see if it says No. AltMirrorReady: Yes <---Check here to see if it says No. BatteryEnabledLocally: Yes BatteryEnabledByAlt: Yes Battery Status: Okay <---May show "Unknown" when there is an issue. Alt Battery Status: Okay <---May show "Unknown" when there is an issue. CacheDeviceFlags: Dev 0: Unowned Open RCE RCA WCE WCA CME CMA Dev 1: Owned Open RCE RCA WCE WCA CME CMA Dev 2: Unowned Open RCE RCA WCE WCA CME CMA Dev 3: Unowned Open RCE RCA WCE WCA CME CMA Dev 4: Unowned Open RCE RCA WCE WCA CME CMA Dev 5: Unowned Open RCE RCA WCE WCA CME CMA Dev 6: Unowned Open RCE RCA WCE WCA CME CMA Dev 7: Unowned Open RCE RCA WCE WCA CME CMA Dev 8: Unowned Open RCE RCA WCE WCA CME CMA Dev 9: Unowned Open RCE RCA WCE WCA CME CMA . . etc RCE = Read Cache Enabled RCA = Read Cache Active WCE = Write Cache Enabled WCA = Write Cache Active CME = Cache Mirroring Enabled CMA = Cache Mirroring Active
Controller Faults.

A controller that is offline or failed will cause all volumes to fail over to the alternate controller. This will also cause write cache to become disabled until the controller is placed back online. Please refer to <Document 1021113.1> Sun Storage[TM] Arrays: Troubleshooting RAID Controller Failures. If this does not resolve the problem proceed to step 9.
Drive and Volume Faults.

A single drive fault in a RAID-1, RAID-5, RAID-6 and RAID-10 group should not cause a significant performance impact on the storage array but will cause a volume group to become degraded until the device has reconstructed to an available Hot Spare. A double drive fault in a RAID-5 volume group (triple drive fault in RAID-6) can potentially lead to a failed volume group (and loss of access to data) depending on time of failure and Hot Spare availability. If there is a drive or volume problem it is best to resolve the issue and retest for any further performance problems. Please refer to the following troubleshooting procedure.
- <Document 1021055.1> Troubleshooting Sun Storage[TM] 2500 and 6000 Array Disk Failures.
If the drive issue has been resolved and there are no further faults found on the storage array yet the performance problem persists or the drive issue is still unresolved then please go to step 9.
Volume Not On Preferred Path Fault.

A Volume Not On Preferred Path is an indicator that a volume or volumes have failed over to the non-owning controller of the Storage Array. When a volume fail over event occurs the fail over request takes priority in the array and the write cache and write cache with mirroring for that volume or volumes is momentarily disabled until the volume(s) has finished the move. A single fail over event on a single volume will not cause a performance issue. However, if there are fail over requests streaming to the array, performance can be severly impacted. There are numerous reasons why this can occur of which can be due to any of following: Storage array controller(s) failure, storage array host type configuration used, storage array volume-to-lun mapping used, fibre paths (cables and SFP's), switch(s), host HBA physical issues, HBA configuration, Failover Driver (Solaris MPXIO, Windows MPIO/DSM or RDAC, Linux RDAC, AIX Cambex and SunDAC and VMware) configuration and so on. Please refer to the following document for troubleshooting Volume Not On Preferred Path issues for all Sun Storage 6000 and 2500 Storage Arrays and StorageTek Flexline Arrays.
- <Document 1136186.1> Troubleshooting Sun StorageTek 2500, Sun Storage 2500-M2, Sun Storage 6000: Volume Not On Preferred Path.
If the Volume Not On Preferred Path issue has been resolved and there are no further faults found on the storage array yet the performance issue persists or the Volume Not On Preferred Path problem is still unresolved then please go to step 9.
Transceiver/SFP/GBIC Faults.

A failed Transceiver/SFP/GBIC on a drive channel will cause a loss of path redundancy for one or more external drive trays. On a host channel it may cause volumes to fail over to the alternate path or non-owning controller if a second host port on the same controller is not being used. Please refer to step 9 for this issue.
ESM Fault.

A failed ESM/IOM in an external drive tray will cause a loss of path redundancy for one or more external drive trays. Please refer to step 9 for this issue.
Drive Path Redundancy Lost Related Faults.

If redundancy has been lost on a drive path refer to <Document 1388897.1> Troubleshooting Sun Storage[TM] 2500 and 6000 Array Drive Tray Lost Redundancy Events.

If the Drive Path Redundancy Lost issue has been resolved and there are no further faults found on the storage array yet the performance issue persists or the Drive Path Redundancy Lost problem is still unresolved then please go to step 9.
Open a Service Call with Oracle Support for further research.

If all critical faults have been resolved and the performance issue still remains or a critical fault cannot be resolved a Service Call should be opened with Oracle Support. In order to provide a complete picture of the perceived performance problem, we should collect information about the nature of the issue to include a synopsis of the performance problem being seen as well as all logs from the Storage Array, Switches and Operating System being used. Understand that a performance problem in general is not an easy issue to tackle. This is because it can be caused by many issues that may not be disk storage related.

Please refer to the following documents for Storage Array support data collection.
- If using CAM please collect the Storage Array Support Data using <Document 1002514.1> Collecting Sun Storage Common Array Manager Support Data.
- If using SANtricity please collect the Storage Array Support Data using <Document 1014074.1> Collecting Support Data for Arrays Using Sun StorageTek SANtricity Storage Manager.
Please refer to the following documents for SAN Switch support data collection.
- If using Brocade switches use <Document 1003754.1> Brocade: What logs are required to troubleshoot a Brocade switch?.
- If using Qlogic switches use <Document 1270583.1> Qlogic Switch - What logs are required to troubleshoot a Qlogic Fibre Channel switch.
- If using McData switches use <Document 1006133.1> How to collect McData switch information.
And please refer to the following documents depending on which Operating System is being used on the affected server/host.
- If using Solaris use <Document 1273941.1> How To Collect and Send Explorer Data to Oracle SAN Support.
- If using SuSE Linux Enterprise Systems use <Document 1010057.1> How to Gather Information on SuSE Linux Enterprise Systems.
- If using Red Hat Enterprise Linux use <Document 1010058.1> How to Gather Information on Red Hat Enterprise Linux Systems.
- If using MS Windows use <Document 1006608.1> Microsoft Windows(R) operating System: How to obtain troubleshooting information for storage issues.
Lastly a few questions should be answered specifically for the performance problem.
- Indicate the date and time it was first seen.
- If possible identify specific arrays, hosts and host LUNs experiencing the issue.
- Has this system performed properly in the past?
- Has anything changed recently in the Host, SAN or Storage environment?
- If known, what type of I/O is being driven to the Storage Array?
- What software application is being used that is experiencing the slow down?
- How is the performance issue being measured and can a sample be provided?
- Are all affected file systems less than 80% full?

References

<NOTE:1021055.1> - Troubleshooting Sun Storage[TM] 2500 and 6000 RAID Array Disk Failures
<NOTE:1388897.1> - Troubleshooting Sun Storage[TM] 2500 and 6000 Array Drive Tray Lost Redundancy Events
<NOTE:1021057.1> - How to verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 and J4000 Critical Faults via the User Interface
<NOTE:1021066.1> - Verify Sun Storage[TM] Array Array Type via the User Interface
<NOTE:1021113.1> - Sun Storage[TM] Arrays: Troubleshooting RAID Controller Failures
<NOTE:1066650.1> - WLS 9.2 MP3-PATCH REQUEST
<NOTE:1136186.1> - Troubleshooting Sun StorageTek 2500, Sun Storage 2500-M2, Sun Storage 6000: Volume Not on Preferred Path
<NOTE:1270583.1> - Qlogic Switch - What logs are required to troubleshoot a Qlogic Fibre Channel switch
<NOTE:1273941.1> - SAN: How To Collect and Send Explorer Data to Oracle SAN Support
<NOTE:1392919.1> - Troubleshooting Sun Storage[TM] Array non-SMART Battery Faults
<NOTE:1002514.1> - Collecting Sun Storage Common Array Manager Array Support Data
<NOTE:1003754.1> - Brocade: What logs are required to troubleshoot a Brocade switch?
<NOTE:1006133.1> - How to collect McData switch information
<NOTE:1006608.1> - Microsoft Windows(R) operating system: How to obtain troubleshooting information for storage issues
<NOTE:1010057.1> - How to gather information on SuSE Linux Enterprise Systems
<NOTE:1010058.1> - How to Gather Information on Red Hat Enterprise Linux Systems
<NOTE:1014074.1> - Collecting Support Data for Arrays Using Sun StorageTek SANtricity Storage Manager
<NOTE:1021054.1> - Troubleshooting Sun Storage[TM] Array SMART Battery Faults

Attachments

This solution has no attachment