Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1004364.1 : Sun Fire[TM] Midrange Server: Safari Port Error may be caused by a resetting SC
PreviouslyPublishedAs 206035 In this Document
Applies to:Sun Netra 1280 ServerSun Netra 1290 Server Sun Fire V1280 Server Sun Fire 4800 Server Sun Fire 6800 Server All Platforms SymptomsA reset on the Sun Fire[TM] v1280/E2900/3800/4800/4810/E4900/6800/E6900 or Netra[TM] 1280/1290) System Controller (SC) or LOM may cause false hardware failures.CauseIf these types of resets happen during a jtag scan, hardware may be disabled due to Safari Port Errors.SolutionHardware errors seen after a SC resets are usually false. Look for SC resets with any of these SBBC Reset Reason(s):
Resolution There is currently no resolution to the hardware errors seen after a reset. The errors are false. For resolution, the root cause of the SC reset must be found. The hardware disabled immediately after the SC has reset is good and can be re-enabled. The hardware errors are the result of the SC being interrupted during a critical operation. Resolution is to fix the cause of the SC reset. Please gather an explorer with scextended or 1280extended and as much information as possible about the customers SC network configuration. The serial console output from the system controller at the time the reset occured is also very useful. Relief/Workaround Known Causes of SC resets are:
Additional Information The important thing to look at is the messages. Look for messages indicating a reset before failed hardware is called out. Reset message types include Reset Reason(s): Peer Reset, Watchdog Reset, SC Reset Button, Software Reset before the falsely failed hardware is called out. Usually errors of the type seen after an SC reset are the result of failed hardware, so the first inclination is to replace the hardware. Several instances of replaced hardware may occur before this is diagnosed properly giving the impression of a quality issue. If hardware errors are preceded by the Reset, they are the result of that reset and NOT bad hardware. Not all Resets cause Safari errors. The errors depend on what the SC was doing at the time of the reset. Example of the SC reset from /var/adm/messages on a E2900: Mar 3 06:53:57 sppuap01 lw8: [ID 128070 kern.notice] Main, up 94 days 08:01:42, Memory 6,968,824 Mar 3 10:06:01 sppuap01 lw8: [ID 190882 kern.notice] Unretrieved lom log history follows ... Mar 3 10:06:01 sppuap01 Mar 3 10:06:03 sppuap01 lw8: [ID 650827 kern.notice] 3/3/07 7:04:14 AM Boot: ScApp 5.20.2, RTOS 45 Mar 3 10:06:03 sppuap01 lw8: [ID 811040 kern.notice] 3/3/07 7:04:16 AM SBBC Reset Reason(s): Peer Reset, Watchdog Reset Below are some of the hardware errors that may appear: From showerrorbuffer: ErrorData[0] Date: Fri Aug 17 20:54:58 PDT 2007 Device: /RP0/dx1 ErrorID: 0x31273023 Register: Safari Port Error Status 3[0x22] : 0x00000004 SafPar [02:02] : 0x1 Safari input parity error ErrorData[7] Date: Fri Aug 17 20:54:59 PDT 2007 Device: /SB2/sdc0 ErrorID: 0x60171010 Register: SafariPortError0[0x200] : 0x00000002 ParSglErr [01:01] : 0x1 ParitySingle error CHS Error example: Component : SB2 Time Stamp : Fri Aug 17 23:55:36 EDT 2007 New Status : FAULTY Old Status : OK Event Code : 01000006 (unrecognized value) Initiator : SCAPP Message : 1.E2900.FAULT.ASIC.DX.SERD.SAF_IN_PAR_ERR.31271023.20-3.2.5406679000200 Showlogs Example: Fri Aug 17 20:54:23 v1280-lom lom: [ID 434738 local0.error] /N0/SB2 encountered the first error Fri Aug 17 20:54:23 v1280-lom lom: [ID 277478 local0.error] DxSbAsic reported first error on /N0/SB2 Fri Aug 17 20:54:23 v1280-lom lom: [ID 357707 local0.error] /SB2/sdc0: SafariPortError0[0x200] : 0x00020002 AccParSglErr [17:17] : 0x1 ParSglErr [01:01] : 0x1 ParitySingle error Fri Aug 17 20:54:23 v1280-lom lom: [ID 539903 local0.error] >>> SafariPortError1[0x210] : 0x00028002 AccParSglErr [17:17] : 0x1 ParSglErr [01:01] : 0x1 ParitySingle error FE [15:15] : 0x1 Third party Security scanning software is known to cause Peer Reset / Watchdog Resets We have seen this issue with products from several companies which exploit a weakness in the SC's ssh version. The companies we have seen this are Nessus and Forescout. A workaround in this case is to disable scanning of the SC, and/or switching to telnet as a connection type on the sc. The CR related to this is network induced reset is SunBug 6539431 . Peer, reset, SafariPortError, sbbc, ParSglErr, ParitySingle, Watchdog, ssh, serengeti, lw8, lom, quality Previously Published As 90665 Change History Date: 2007-12-22 User Name: 71396 Action: Approved Comment: Performed final review of article. No changes required. Publishing. Version: 7 Date: 2007-12-20 User Name: 71396 Action: Accept Comment: Version: 0 Date: 2007-12-20 User Name: 103287 Action: Approved Comment: Gave the doc some formatting to make it look better, added links, etc. Changes look good. Ready to go. Josh Version: 0 Date: 2007-12-19 User Name: 24214 Action: Approved Comment: Added internal note with a pointer to an action plan for next time someone sees this. Version: 0 Attachments This solution has no attachment |
||||||||||||
|