Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1473515.1
Update Date:2012-10-01
Keywords:

Solution Type  Technical Instruction Sure

Solution  1473515.1 :   Pillar Axiom: How to Run Pitman on AxiomOne R5.X Systems  


Related Items
  • Pillar Axiom 600 Storage System
  •  
  • Pillar Axiom 500 Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Pillar Axiom>SN-DK: Ax600
  •  




In this Document
Goal
Fix
References


Applies to:

Pillar Axiom 600 Storage System - Version Not Applicable to Not Applicable [Release N/A]
Pillar Axiom 500 Storage System - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Goal

This document defines the procedure to run the internal Private Interconnect Fabric diagnostic utility Pitman Diagnostics on a field system as part of an installation, significant hardware reconfiguration, or health check.  This diagnostic is superior to PI Stats because traffic is sent on all paths simultaneously at levels higher than can typically be achieved by the host based generated traffic used with PI Stats.  The Axiom can be online and does not interfere with data traffic during the run of this diagnostic utility.  The diagnostic does not use user file systems or LUNs to function, so it has no effect on user data.  Pitman Diagnostics can only be run on AxiomONE release 05.00.xx and higher.

Summary


This procedure requires an Axiom system to be running AxiomONE release 05.00.xx or higher.  It will utilize the Pitman Diagnostic utility commands that are build into the AxiomONE software.  The commands can be run in either the GUI or by the CLI.

Overview

This procedure invokes commands that are part of the Axiom software to generate traffic on the Private Interconnect Fibre channel fabric connections.  This includes the connections on the slammer control unit PIM to the buddy control unit PIM and all the bricks that are attached.  Pitman diagnostics are to be executed as part of a post installation validation, as part of a post maintenance validation when significant changes are made to the PI network, or as part of a system health check.

  • The initial run of the commands should run for 5 minutes.  This will identify any immediate PI issues.  If the test is free of errors, then proceed with running a 15 minute test.
  • If the initial 5 minute run produces errors, reseat the PI cables identified in the path producing errors and run another 5 minute test.  If that test runs clean, a full 30 minute test needs to run with no errors reported.
  • If the second 5 minute test fails after reseating the cables in the chain, contact the Support Center for assistance in identifying hardware replacement.  These will usually be Cables, RAID Controllers, or Private Interconnect Modules.  You may be asked to disconnect a given subset of the hardware to isolate difficult problems.
  • After replacing any hardware, run another 5 minute test.  If that test fails, contact the Support Center for assistance.  If the 5 minute test succeeds, run a 30 minute test which must complete without errors.

 

Fix

Pitman Diagnostics Testing Runs in the GUI

NOTE: The maxtraffic option should never be used if there is customer I/O on the system.   The maxtraffic option will interfere with client host access.  Even though Pitman can be run while all data is being served to hosts, it is recommended to find a time to run the testing at a time when there is less I/O running on the Axiom.
NOTE: A more intrusive Pitman diagnostic can be executed using the special maxtraffic option outlined below. The maxtraffic option requires a maintenance window as the amount of traffic will cause disruption to hosts. 
IMPORTANT: If you move or replace cables on an Axiom with Optical PI cabling, always be sure to remove and reseat the SFP.   If only the optical cable is removed and replaced or re-inserted, the connection may not come back online.  There is a scan of all SFPs by the PIM internal fabric switch every 4 seconds, and changes are detected by that scan.   The link status change caused by just removing and reseating or replacing the optical cable is not enough to invoke the Private Interconnect port database topology scan.       As a matter of practice, you should always remove and re-insert the SFP rather than just the cable.  

  

 

  1. Open the Axiom Storage Service Manager to view the GUI of the Axiom.  The hardware should be green and normal before proceeding.
  2. Select the Support Tab.
  3. Select System Trouble in the list.
  4. Right mouse click in the right hand window and choose Run Pitman Diagnostics.  Additionally this can be invoked from the Action menu.
  5. In the command parameter window type SetAutoModeOff and click Execute.  This sets the operational state to manual mode.
  6. Next type TrafficGenOn mode=auto peer=all and click Execute.  This will start the traffic generator in auto mode.  This mode automatically selects the initiator and target ports to equally distribute the generated traffic through a particular FC network.
  7. Start the statistics monitor by typing StartRecordingStats interval=x duration=x.  The interval and duration can be changed for each test being run.  The time is listed in seconds.
    1. For the 5, 15, and 30 minute test the commands would look like the following.  The interval timing represents how often the system records the system stats for that time frame.

      StartRecordingStats interval=60 duration=300
      StartRecordingStats interval=180 duration=900
      StartRecordingStats interval=360 duration=1800
  8. Once the time has expired on each test, run StopRecordingStats.
  9. Then it's time to collect the data.  Type GetMarksDb RecordCount=all and execute.  It will work for a few moments and put the output in the window.  Then select all the text and copy/paste to a text editor where it can be saved, zipped, and emailed to Pillar Support for evaluation.
  10. Once the file is safely captured run TrafficGenOff to stop the extra traffic on the PI network.

 

Repeat steps 7 through 9 if additional testing runs are needed.

 


NOTE: The use of maxtraffic mode is disruptive to data access.  A maintenance window must be scheduled when using maxtraffic mode

 

Special Handling Instructions for Disruptive Pitman, using the maxtraffic parameter

The maxtraffic mode requires Axiom Release 05.02.04 and higher.  The maxtraffic mode sends traffic to a single destination port at a time, but changes that port every two seconds.

CAUTION:  The maxtraffic mode generates enough internal traffic to interfere with host access. Do not attempt to run maxtraffic mode unless user traffic is stopped. 

Releases 05.02.04 up to 05.03.05 have a software issue in the switching of destination ports.  As a result, maxtraffic mode will only send traffic to a single, non-selectable port on these versions Do not use maxtraffic on releases from 05.02.04 up to and including 05.03.05.  

The fix for this issue is in 05.03.06.   REF:   Bug 14346013

  1. Open the Axiom Storage Service Manager to view the GUI of the Axiom.  The hardware should be green and normal before proceeding.
  2. Select the Support Tab.
  3. Select System Trouble in the list.
  4. Right mouse click in the right hand window and choose Run Pitman Diagnostics.  Additionally this can be invoked from the Action menu.
  5. In the command parameter window type SetAutoModeOff and click Execute.  This sets the operational state to manual mode.
  6. Next type TrafficGenOn mode=auto peer=all maxtraffic=on and click Execute.  This will start the traffic generator in auto mode.  This mode automatically forces maximum traffic.
  7. Start the statistics monitor by typing StartRecordingStats interval=x duration=x.  The interval and duration can be changed for each test being run.  The time is listed in seconds.
    1. For the 5, 15, and 30 minute test the commands would look like the following.  The interval timing represents how often the system records the system stats for that time frame.

      StartRecordingStats interval=60 duration=300
      StartRecordingStats interval=180 duration=900
      StartRecordingStats interval=360 duration=1800
  8. Once the time has expired on each test, run StopRecordingStats.
  9. Then it's time to collect the data.  Type GetMarksDb RecordCount=all and execute.  It will work for a few moments and put the output in the window.  Then select all the text and copy/paste to a text editor where it can be saved, zipped, and emailed to Pillar Support for evaluation.
  10. Once the file is safely captured run TrafficGenOff to stop the extra traffic on the PI network.

 

Repeat steps 7 through 9 if additional testing runs are needed.

There is a diagnostic and monitoring interface on every Slammer CU to see which port the traffic generator is sending traffic to at the time.  This information should change approximately every 2 seconds unless options are given that restrict the traffic.    This utility may be used to see how much traffic is being generated, to which port, and how long it has been since the last switch of target port.   The interface is at /pi/pitm/traffic/getPortsInfo.  To use the interface "cat getPortsInfo".  The Slammer will indicate that the information has been dumped to /tmp/getPortsInfo.out.    Then, just cat /tmp/getPortsInfo.out to read the information.  

 

Pitman Diagnostic Testing Runs using the CLI


Download the CLI from the Axiom GUI in the Support tab Utilities or from the http page of the Axiom IP address.  Once installed here are the commands to run to obtain the same information as above.

  1. pcli.exe submit -H <AxiomIP> PerformPitmanCommand CommandParameter="setAutoModeOff"  This sets the operation state to manual.
  2. pcli.exe submit -H <AxiomIP> PerformPitmanCommand CommandParameter="TrafficGenOn mode=auto peer=all"  This will start the traffic generator in auto mode.  This mode automatically selects the initiator and target ports to equally distribute the generated traffic through a particular FC network.
  3. pcli.exe submit -H <AxiomIP> PerformPitmanCommand CommandParameter="StartRecordingStats interval=x duration=x"  The interval and duration can be changed for each test being run.  The time is listed in seconds.
    1. For the 5,15,and 30 minute test the commands would look like the following. The interval timing represents how often the system records the system stats for that time frame.

      StartRecordingStats interval=60 duration=300
      StartRecordingStats interval=180 duration=900
      StartRecordingStats interval=360 duration=1800
  4. Once the time has expired on each test, run pcli.exe submit -H <AxiomIP> PerformPitmanCommand CommandParameter="StopRecordingStats"
  5. To collect the statistics file for review, run this command and then redirect the output to a text file.  pcli.exe submit -H <AxiomIP> -u <username> -p <password> PerformPitmanCommand CommandParameter="GetMarksDb RecordCount=all" > testing_5min.txt
  6. Once the text file has been generated the traffic can then be turned off.  pcli.exe submit -H <AxiomIP> PerformPitmanCommand CommandParameter="TrafficGenOff"

 

Interpreting the Database output File


Each connection will be listed and with the number and kinds of errors listed below it.  Here is a sample output.  The important values to note are the crcErrRate, crcCnt, and ItwCnt fields.  These would be indicative of errros on that path.

============[IN:FC0]=========
* (0x2108000b080466e2) SLAMMER-01:CU0
* PIM_SOC422
*   (perGB)crcErrRate=0
*            crcCnt=0
*            ItwCnt=0
*           dstbCnt=0

* rdBytes=8191934584
* wrBytes=8199160832
* (sample: cnt=27 totalSec=299)
============[OUT:FC0]========
|
============[IN:FC2]=========
* (0x2309000b080466ea) SLAMMER-01:CU1
* PIM_SOC422
* (perGB)crcErrRate=0
*            crcCnt=0
*            ItwCnt=0
*           dstbCnt=0
* rdBytes=8151931904
* wrBytes=8145507456
* (sample: cnt=27 totalSec=299)
============[OUT:FC2]========
|
============[IN:FC0]=========
* (0x2109000b080466ea) SLAMMER-01:CU1
* PIM_SOC422
* (perGB)crcErrRate=0
*            crcCnt=0
*            ItwCnt=0
*           dstbCnt=0
* rdBytes=8204316672
* wrBytes=8221593728
* (sample: cnt=27 totalSec=299)
============[OUT:FC0]========
|
============[IN:FC2]=========
* (0x2308000b080466e2) SLAMMER-01:CU0
* PIM_SOC422
* (perGB)crcErrRate=0
*            crcCnt=0
*            ItwCnt=0
*           dstbCnt=0
* rdBytes=7711291452
* wrBytes=7690093568
* (sample: cnt=27 totalSec=299)
============[OUT:FC2]========

As the file is quite lengthy, it is recommended to search the file for the respective fields to find errors quickly, below is a sample command that can be used:

egrep "crcCnt=|ItwCnt=|dstbCnt=" <file_name.txt> | grep -v =0

Troubleshooting tips for PI cabling


  1. **If there is any other result then Zero Errors, reseat the suspect cabling and re-run the 5 minute test and review again, if errors are still present, contact Support for assistance in hardware isolation **
  2. If the errors detected are at the head of a brick string, all cables within that string should be reseated according to the appropriate wiring guide and utilizing the current cabling best practices.
  3. Be careful to always use the Physical Position of each Slammer and Brick and not the logical name from the GUI when installing or checking cables.  The name as displayed in the GUI is a user settable logical name only - it has no mandatory relationship to the hardware name.  The physical or hardware name is based entirely on the Physical position of the hardware in the cabinet, as shown in the Wiring Diagrams, and has no mandataory relationship with the name displayed in the GUI or system logs.  You may be asked to create a mapping of the physical name to the GUI name.
  4. Do not use Tie Wrapts to secure the cables.  Velcro straps are the only recommended cable management.  Do not tighten the Velcro tightly.
  5. Minimum Bend Radius is 2 inches (4 inches diameter) for the fibre channel cables.  Be careful when working with the cables to never bend a cable to a radius smaller than 2 inches.
  6. Do NOT pull the cables taut.  Leave a large curve in all installed cables to prevent damage to the cable or connectors.
  7. Do NOT route Power Cables over the Fibre Channel Cables.  The power cables are heavy enough to cause damage to the Fibre Channel Cables if the power cables rest on the fibre cables.  Do not rest any bundles of cables on any Fibre Channel cable.
  8. Do NOT support loops of the Fibre Channel cables by their own weight.  No more than half a meter of Fibre Channel cable should ever be supported by the weight of that cable.
  9. Make sure there are no twists or knots in the Fibre Channel or Ethernet cables.
  10. Do NOT support the Slammer to Brick or Brick to Brick Fibre Channel Cables with the Brick RAID Controller Crossover Cables.  This will cause connection problems for the Crossover Cables.
  11. Check the Cabling at least Twice.  It is strongly recommended that at least two people perform a full cabling audit to make sure that no cabling errors are introduced.  The best audit is to have one person provide the connections for one end of each cable, then have someone trace that cable and provide the connection for the other - to be checked against the wiring diagram.

 

References

@<NOTE:1473492.1> - Pillar Axiom: How to interpret a PITMAN output from a PSG_PITMAN_EVENT_DIAGNOSTIC_RESULTS_AVAILABLE Callhome event

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback