Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1001848.1 : Sun Fire[TM] 12K/15K Server: Terminating a hpost reset/recovery loop test cycle on a domain
PreviouslyPublishedAs 202532 Symptoms A failed hardware component can cause a Sun Fire[TM] 12K/15K Server (SF12K/SF15K) domain to loop in hpost as dsmd tries to identify the failed component via hpost testing. When this happens, the hpost level is incremented to a higher level (more testing) after every unsuccessful boot attempt so that each successive hpost takes longer to complete. However, it is possible that a domain can appear to be looping in hpost, but the hpost level is NOT incrementing, i.e. hpost 16 is executed every time. This can be an indication of a problem during boot, rather than a hardware failure, the most common cause being an incorrect boot-device or boot path. Resolution Following is an Example of the recommended course of action to remedy this issue. First, keyswitch off the domain has-sc0:sms-svc:4> setkeyswitch -d A off Wait for all boards to power down. Then, from the SC as the sms-svc user, change the auto-boot? param to false via the setobpparams. In the following example we are making changes to OBP params for domain A. has-sc0:sms-svc:2> setobpparams -d A auto-boot?=false Note: It is recommended that the setobpparams command be run even if showobpparams already shows that 'auto-boot?' is set to false. Then check via showobpparams to see if the changes have been made. has-sc0:sms-svc:3> showobpparams -d A auto-boot?=false diag-switch?=true fcode-debug?=false use-nvramrc?=true security-mode=none Now, keyswitch the domain back on. has-sc0:sms-svc:5> setkeyswitch -d A on After powering on, the domain may go through a quick(-Q) hpost which may fail, depending on the previous failure cause. After the next hpost, the domain will go to OBP. Standard troubleshooting practices can now be followed to determine the cause: check post logs for hardware failures, etc. Note: Please consult the man pages for more information on <showobpparams>, <setobpparams>, and <setkeyswitch> Product Sun Fire 12K Server Sun Fire 15K Server Internal Comments See http://has.central for more information on hpost levels and timing. auto-boot, hpost, loop test, setobpparams Previously Published As 71598 Change History Date: 2007-10-02 User Name: 97961 Action: Approved Comment: - Converted to STM formatting for better readability - Applied trademarking where it is missing - Corrected use of trademarking Version: 4 Date: 2007-10-02 User Name: 97961 Action: Accept Comment: Version: 0 Date: 2007-10-02 User Name: 101984 Action: Approved Comment: Done review and added some changes, mainlt to verbatim. Technical content is correct. Thanks Morgan Version: 0 Date: 2007-10-01 User Name: 101984 Action: Accept Comment: Version: 0 Date: 2007-10-01 User Name: 125045 Action: Approved Comment: Back to Tech Review! Version: 0 Date: 2007-10-01 User Name: 125045 Action: Rejected Comment: had to fix my own error - Left the has.central link in the public section. Doh. Version: 0 Date: 2007-10-01 User Name: 125045 Action: Accept Comment: Version: 0 Date: 2007-10-01 User Name: 125045 Action: Approved Comment: Updated ordering of procedure for greater edge case coverage, also fixed a few typos and moved from internal to contract. Version: 0 Date: 2007-10-01 User Name: 125045 Action: Update Started Comment: update for minor typos and ordering. also changing ordering for keyswitch / setobpparams Version: 0 Date: 2003-11-10 User Name: 43660 Action: Approved Comment: Minor grammatical changes. Changed title to be consistent with other docs. Version: 0 Date: 2003-11-10 User Name: 116819 Action: Approved Comment: Changed problem description text to differentiate boot failures from post failures. Version: 0 Date: 2003-11-09 User Name: 106757 Action: Approved Comment: Need review Version: 0 Date: 2003-10-19 User Name: 116819 Action: Rejected Comment: Clarify when to keyswitch Version: 0 Date: 2003-10-19 User Name: 106757 Action: Approved Comment: Please review Version: 0 Date: 2003-09-24 User Name: 103287 Action: Rejected Comment: Several reasons to send this document back to draft stage. I have emailed the author with detailed comments and offerred to work with him if he desires. Here's the reasons: 1) Title should be changed to reflect the real purpose of the article, which isn't that some domains go into a HPOST loop, but in fact how to stop the HPOST loop. 2) The problem description that this HPOST loop (incremental POST level) behavior is most commonly caused by a disk access problem is not correct. Disk access problems are at OBP, therefore would result in domain panic, or the domain to stay at OBP. Problems at HPOST cause the incremental post level loop cycle (as described in Infodoc 48395). 3) I think the focus of the article should be regarding interupting the HPOST loop process as described by turning off the error reset recovery flag. But, it needs to be explained why we want to do this. Like, we already know why the domain can't bringup properly and want to interupt the hposts so we can blacklist a part to get the domain back up quickly. The main problem is that the document describes the disk access problem as causing the hpost loop, but in fact it should be something in HPOST which causes the issue. Second, the title needs to better reflect the true nature of the document. Ultimately this document should be a resource on how to disable the error/reset recovery bit, allowing the administrator to manually intervene. Reference to Infodoc 48395 should also be included. Version: 0 Date: 2003-09-23 User Name: 106757 Action: Approved Comment: Ready to review Version: 0 Date: 2003-09-23 User Name: 106757 Action: Created Comment: Version: 0 Product_uuid 077fd4c5-df8f-4320-ad69-7d01603a674d|Sun Fire 12K Server 29e4659c-0a18-11d6-9fa1-e67bbc033df8|Sun Fire 15K Server Attachments This solution has no attachment |
||||||||||||
|