Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1000792.1 : Failure to properly tighten System/Motherboard or PDB on the Sun Fire X4100 and X4200 can result in a system outage or thermal event.
PreviouslyPublishedAs 201070 Product Sun Fire X4100 Server Sun Fire X4100 M2 Server Sun Fire X4200 M2 Server Sun Fire X4200 Server Date of Resolved Release 03-JAN-2007 Impact Major system outage and damage may result, as well as potential customer safety implication, extended downtime and high cost of recovery. Contributing Factors When servicing the Sun Fire X4100 or X4200 platforms for System Board or Power Distribution Board issues, it is IMPORTANT that care and diligence are applied to the bus bar assembly. The following FRU Part Numbers are impacted by this issue: Affected Part Numbers Description ____________ ___________ 501-7261 System Board, Sun Fire X4100 501-7644 System Board, RoHS, Sun Fire X4100 501-7513 System Board, RoHS, Sun Fire X4100 501-6974 System Board, Sun Fire X4200 501-7645 System Board, RoHS, Sun Fire X4200 501-7514 System Board, RoHS, Sun Fire X4200 501-7590 System Board, RoHS, Sun Fire X4200 M2 501-7668 System Board, RoHS, Sun Fire X4100 M2 501-6920 DC Power Distribution Board, Sun Fire X4100/X4200
Symptoms Failure to properly tighten the System/Motherboard or the DC Power Distribution Board bus bar connections on the Sun Fire X4100 and X4200 has resulted in major Thermal events on customer systems and result in a Sun requirement to provide full system exchanges. These events can be characterized as a burning smell and/or failure to power on, or in some instances severe smoke exhausting from the product. Workaround Resolution *** DO NOT SERVICE THESE FRUs WITHOUT THE PROPER TOOLS *** Only attempt to remove and refit the System/Motherboard and/or the Power Distribution Board board in the systems indicated above using the correct tools, preferably a properly calibrated Torque driver and bits. DO NOT use pliers to try to secure the nut on the bus bar. Where the Galaxy Motherboard has the closed acorn style nuts (Sun p/n 240-4779-01), replace with open flange nuts (Sun p/n 240-5984-01) included in the Motherboard/PDB FRU kit. A picture showing both the old acorn and new flange nut types can be seen via the below URL; http://sdpsweb.central/FIN_FCO/FAB/102770/SPE/Nut_compare.pdf If using a torque driver, select one that is adjustable at a minimum over the range of 7 to 20 in/lbs, with an accuracy of better than 6%, and which accepts 1/4" bits for use with nut and screwdriver bits as specified below. All screws should be torqued to the factory settings of 7.5in/lbs (0.847385 Newton Meters) and 18in/lbs (2.03372 Newton Meters) for nuts. Be aware that a new flange type nut offers greater resistance due to the non-metallic insert, therefore auto-torque drivers may torque-out prematurely due to initial resistance. If using a nut driver this would be an 8mm Nut Driver with a 1/4" Hex Bit for chuck type collet. These are available from local hardware distributors. For the screw side, a #2 Phillips Hex Bit with a 3 inch length x 1/4" Hex Shank. After replacing or servicing the product, the field service representative must test the integrity of the bus bar connection by running the diagnostic released to validate this part of the product. The bus bar Diagnostic and ReadMe file can be downloaded from the below Internal Only link; http://nsgrelease.sfbay/galaxy12/releases/G12x-SW1.3-rc38/ops/061215/ For Service Partners without SWAN access the bus bar Diagnostic and ReadMe files can be downloaded from the below links; + busbar Diagnostic; http://sdpsweb.central/FIN_FCO/FAB/102770/SPE/busbar + busbar ReadMe; http://sdpsweb.central/FIN_FCO/FAB/102770/SPE/busbar.README A PASS result from this diagnostic should be demonstrated after servicing ANY part of the system. This completion should be recorded in the radiance notes. Note: This diagnostic should NOT be left with the customer! How to install and run the BusBar Test: 1) Copy the latest busbar tool to the service processor coredump directory. scp busbar sunservice@?sp_ip?:/coredump <cr> where ?sp_ip? is the target IP address .....continue conection (yes/no)? yes <cr> password:changeme <cr> 2) ssh into the targeted system ssh sunservice@?sp_ip? <cr> password:changeme <cr> cd /coredump 3) Run the busbar test (see example How to install/run Busbar test example below.) #./busbar <loopcnt> <system name> loopcnt - This is the number of time you wish busbar to run. If this value is 0 then busbar will run forever. Recommendation is to set this to '1' when executing the diag in the field. System name - This specifies the machine type to test. Below is a list of systems known to busbar. system name g1 = Galaxy1 g2 = Galaxy2 g1e = Galaxy1e g2e = Galaxy2e g1f = Galaxy1f g2f = Galaxy2f cnst = Constellation 4) After the test, reboot SP to get normal SP functionality and its state back. #/etc/init.d/reboot <cr> Test Description: The busbar diagnostic was developed to find systems with poor busbar connections. This is done by reading the 12 volt sensor twice. The first time the 12 volt sensor is read with the system in reset and the fans spun down so as to minimize the load on the system. The second time the 12 volt sensor is read with the system running and the fans at their highest rpm so as to maximize the load on the system. The two numbers are compared, if the difference between the two is greater than 5% then there may be an issue with the bus bar connection and an error will be generated. How to install/run BusBar Test example: $ cd cygwin $ cd /busbar/ $ cd busbartest/ $ scp busbar [email protected]:/coredump ****** STEP # 1****** The authenticity of host '10.6.78.122 (10.6.78.122)' can't be established. RSA key fingerprint is 55:c9:05:b4:84:f2:33:6a:26:0b:22:cd:67:ca:02:9e. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '10.6.78.122' (RSA) to the list of known hosts. [email protected]'s password: busbar 100% 211KB 211.2KB/s 00:00 $ ssh [email protected] ****** STEP # 2****** [email protected]'s password: [(flash)root@SUNSP0003BAF20AF0:~]# cd /coredump [(flash)root@SUNSP0003BAF20AF0:/coredump]# ls 0ABGA037.ROM busbar spdiag.log 0ABGA938.ROM hdt still_mounted 50_busbar_spdiag.log messages bbar.sh spdiag [(flash)root@SUNSP0003BAF20AF0:/coredump]# ./busbar 1 g1 ****** STEP # 3****** Parsing command line . . . Storing test info in spdiag.log Machine Type: G1 killall: Could not kill pid '685': No such process killall: cdserver: no process killed killall: fdserver: no process killed Stopping IPMI Stack....Done. sh: /etc/init.d/cmm: not found execute BusBar Shutting down system Powering system on Running BusBar test on mb.v_+12 voltage MINIMUM LOAD 12V MAXIMUM LOAD 12V RESULT ---------------- ---------------- ------ +12.12V +12.12V PASS DONE. Please, reboot SP to get normal SP functionality and its state back. [(flash)root@SUNSP0003BAF20AF0:/coredump]# /etc/init.d/reboot ****** STEP # 4****** Rebooting... The system is going down NOW !! Sending SIGTERM to all processes. Connection to 10.6.78.122 closed by remote host. Connection to 10.6.78.122 closed.
Modification History Date: 30-JAN-2007
Date: 05-APR-2007
Previously Published As 102770 Internal Comments Definition of collet: a cone-shaped chuck used for holding cylindrical pieces in a lathe. Related Information
Internal Contributor/submitter [email protected] Internal Eng Business Unit Group KE Authors Internal Eng Responsible Engineer [email protected] Internal Services Knowledge Engineer [email protected] Internal Kasp FAB Legacy ID 102770 Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: 2007-01-17 Avoidance: Service Procedure Responsible Manager: [email protected] Original Admin Info: WF - Initial draft started on 1/5/07 and had to recreate this asset due to a bug in the way xoptions were being displayed. Awtg external link to the referenced Diagnostic and ReadMe - Joe WF - Submitter reviewed and requested minor changes to Resolution section on 1/10/07. Will release with internal only link and update once the external link to the Diag and ReadMe are provid - Joe WF - sent off to extended review on 1/10/07 - Joe WF - added ECO reference per Mike Persichetty on 1/10/07 - Joe WF - added link to Nut_compare pic now that sdpsweb server is back up 1/17/07 - Joe WF - sending to publish on 1/17/07 - Joe WF - after publication sponsor requested TNS host diag and readme files for Partners to have access to via SPE. Put files up on sdpsweb and added links to these two files and republished FAB. WF - added more specific diag instructions and example in Corrective Action section - Joe 1/30/07 WF - corrected open flange nut part number in Corrective Action section. - Joe Apr/05/07 Product_uuid 54e2ac49-df71-11d9-89e6-080020a9ed93|Sun Fire X4100 Server 5b03d0ed-216d-11db-a023-080020a9ed93|Sun Fire X4100 M2 Server c15f7881-216e-11db-a023-080020a9ed93|Sun Fire X4200 M2 Server c6e795ef-df6f-11d9-89e6-080020a9ed93|Sun Fire X4200 Server Attachments This solution has no attachment |
||||||||||||
|