Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1004879.1 : Sun Fire[TM] 3800, 48x0, 6800, E2900, E4900, E6900, v1280 or Netra[TM] 1280, or 1290: Resetting a component's CHS status using setchs
PreviouslyPublishedAs 206842
Applies to:Sun Fire E6900 ServerSun Fire E4900 Server Sun Fire 3800 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Fire 4810 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Fire V1280 Server All Platforms GoalDescriptionThis document describes how to re-enable a component that has been marked Faulty or Suspect by Component Health Status (CHS). This document is relevant to the Sun Fire[TM] 3800, 4800, 4810, 6800, E2900, E4900, E6900, v1280 or Netra[TM] 1280, 1290 family of systems. The System Controller (SC) or lom command showchs might report Faulty or Suspect component(s) similar to the following example: Prtdiag may also reflect component(s) as failed or disabled, such as the following example:lom>showchs Important Notes: Disabled hardware should be investigated by Support Services prior to resetting any CHS status for any component(s).Fru Operational Status:
SolutionProcedure to reset a component's CHS status.1. As stated before, a support engineer should first validate that the component(s) CHS status should be reset. The support engineer should perform analysis of the data and determined whether the CHS status should be reset or whether the component should be replaced. Assuming that a Support Services engineer verified that the CHS status needs to be reset, the following options exist to reset it's status: 2. If the system is running ScApp < 5.20.15 (ie. 5.20.14 or lower and 5.21.x IS lower) The CHS status can only be reset by the support services engineer using Sun Shared Shell if that option exists. This is because the setchs command is ONLY available in a restricted access mode on the SC for which a service engineer is required to perform the procedure. You are encouraged to upgrade ScApp to avoid this inconvenience (See STEP 3). If Shared Shell is not an option for a particular site, a field engineer must be dispatched, but this may involve Time and Material charges depending on contract terms. 3. If the system is running ScApp 5.20.15 or higher The CHS status can be reset from the SC or lom prompt by anyone who can login to the Main SC and no special access is required at all. Perform the following steps: 6800-sc1:SC> showchs -b 6800-sc1:SC> setchs -s OK -r "service_request_number" -c SB3/p0 6800-sc1:SC> showchs -b 4. If a component was marked Faulty it will not be back in the configuration until it is run through POST. The component must be 'DR'd' (Dynamic Reconfiguration) out and then back into the domain, or the domain must be rebooted (sometimes known as 'keyswitched') to prompt this testing. Assuming the component runs through POST testing, it should be configured back into the domain. Contact Support Services if this presents any problems. Make sure to provide the console log showing the POST execution so they can diagnose any issues that remain. Internal Only Instructions for Support Service engineers Engineers should validate why the component's status has been marked CHS Faulty or Suspect prior to resetting its status. Utilize Document 1010056.1 to validate whether the component that is currently disabled, Faulty, Suspect, or Missing is defective or not. If it is defective, the FRU should be replaced instead of having its status reset. If it is determined that a component(s) CHS status needs to be reset, do so depending on which version of ScApp is installed. If the system is running ScApp 5.20.15 or higher: Follow the procedure documented in the public section of this knowledge article (STEP 3 up above). Customers can reset the CHS status themselves using setchs in 'normal mode' (without a service mode password) on ScApp 5.20.15 or higher. If the system is running ScApp 5.20.14 or lower (5.21.x IS lower): You MUST generate a service mode password and then reset the status of the device for the customer. The customer should not be given access to service mode themselves if at all possible. You should make every attempt to perform this procedure yourself using Sun Shared Shell. If you need to reset the status using service mode perform the following steps: 1. Obtain the System Controller's HostID, ScApp version, and RTOS version. To obtain this information, enter a carriage return in place of the password three times: Connected to Hostname-sc. Escape character is '^]'. Enter Password: <--- Enter Return Here Invalid password. Enter Password: <--- Enter Return Here Invalid password. Enter Password: <--- Enter Return Here Invalid password. HostID: 83195a96 ScApp version: 5.13.0009 RTOS version: 23 2. Generate a Service Mode password. Take the information from step 1 and visit the Service Mode Password Generator to generate a service mode password. A back up is here: Backup Service Mode Password Generator 3. Utilize Sun Shared Shell to connect to the customer's system and perform the reset procedure. Where Sun Shared Shell is not possible to use, follow the recommendations in Document 1010655.1 and directly supervise the customer's use of this access to reset the CHS status. 4. Verify what is currently marked as Suspect or Faulty. 6800-sc1:SC[service]> showchs -b Component Status --------------- -------- SB3/p0 Faulty 5. Reset the CHS status of the component in question. 6800-sc1:SC[service]> setchs -s OK -r "service_request_number" -c SB3/p0 6. Validate that the component's status has been reset. 6800-sc1:SC[service]> showchs -b Component Status --------------- -------- 7. The component will have to have POST executed to return it to service. This can be accomplished by executing a setlkeyswitch on or DR operation. When performing this action, monitor POST to assure that no errors are detected on this newly reset device. Background Information on why you might be need to reset CHS status. There may be times when a good component, such as a CPU or system board, is marked as faulty. Here are some reasons good components get marked as bad: Example1: CR 4868106 - Upgrading to 5.15.0 without following upgrade procedures can lead to a "ParitySingle error" and a CHS disabled SB. Example2: POST fails test ID 6.1, with an error in like: ERROR: TEST=Memory Tests,SUBTEST=Memory Addressing ID=61.1 In this situation, the CPU is failed in order to disable the memory it controls, but the CPU is fine. It is the memory DIMM(s) which need to be replaced. For the case of bad memory, here is what you need to do: - Use setchs to re-enable the cpu. - Verify with showchs that the pending status is 'ok'. - DR the SB out and replace the memory. - DR the SB back in. When you reinsert the SB, the local tests will be sufficiant to make the chs status 'current' vs 'pending'. You don't need to do a setkeyswitch off on the domain ! Example3: Some repair action that is later corrected, but the component is left marked as failed by CHS. For instance, one recent example involved a customer moving memory around on the board. It was unknown exactly which DIMMs were involved, or how many times the customer tried to setkeyswitch the domain on, etc. One of the CPUs on the board was marked as CHS failed. It was highly possible that a DIMM had been mis-seated or something to that effect. The DIMMs were reseated and the CPU was re-enabled. The system board was tested to the satisfaction of the engineers involved which saved an unnecessary board replacement. These cases are not the only ones where a good component can be marked as faulty. The point is, question the recent history of the machine and any maintenance activity when you have a CHS disabled component, before proceeding to re-enable it. A word of caution - do not just 'blindly' re-enable a component, since the system disabled it for a reason. When in doubt, seek the advice of a senior engineer by collaborating with the next level of technical support. Previously Published As 72066 Attachments This solution has no attachment |
||||||||||||
|