Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1006249.1 : Sun Fire[TM] 12K/15K/E20K/E25K: Solaris[TM] 8 domain hangs on resuming JNI I/O device driver
PreviouslyPublishedAs 208764
Applies to:Sun Fire 12K ServerSun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server All Platforms ***Checked for relevance on 17-Feb-2011*** SymptomsOn a Sun Fire[TM] 12K/15K/E20K/E25K domain, an attempt to DR out a System Board which contains permanent memory results in the domain becoming hung. The last entry in the domain's /var/adm/messages file or console log indicates that the domain is attempting to resume an I/O device driver. Even after over an hour, the domain is still unresponsive and remains at this device resumption stage.NOTE: When DR'ing a System Board that contains permanent memory out of a domain, the Soalris[TM] OS is suspended temporarily to allow for the reallocation of the permanent memory to other System Board resources which will remain in the domain. This suspension is not the "hang" described in this document. The domain's console log shows the following (SB7 is the board which contains permanent memory): Aug 5 01:05:46 2004 root@domainA # cfgadm -v -c disconnect SB7 Aug 5 01:05:49 2004 System may be temporarily suspended, proceed (yes/no)? yes Aug 5 01:05:49 2004 request suspend SUNW_OS Aug 5 01:05:51 2004 request suspend SUNW_OS done Aug 5 01:05:51 2004 request delete capacity (4 cpus) Aug 5 01:05:51 2004 request delete capacity (1048576 pages) Aug 5 01:05:51 2004 request delete capacity SB7 done . . . Aug 5 01:09:11 2004 resuming pci108e,8001@3d,600000 (aka pcisch) Aug 5 01:09:11 2004 resuming JNI,FCR@1,1 (aka jnic146x) Aug 5 01:09:12 2004 resuming JNI,FCR@1 (aka jnic146x) In the example above, the domain was forced to the OBP via the reset command. CauseDR'ing out a System Board which does not contain permanent memory works with no problem whatsoever. Because this operation requires no Solaris[TM] OS suspension, the I/O device driver does not need to be resumed, and therefore the domain does not hang.This is important information, because this tends to prove that DR is not in fact the root cause of the problem. DR in general works fine. In this event, DR of permanent memory (Solaris[TM] OS suspension) and how that interacts with the resumption of the device driver in question is really the problem. SolutionThe solution to this specific case is to make sure kernel patches and the st driver patch are at certain revisions or higher to take advantage of the cfgadm fixes contained in the kernel patches and the specific st driver fixes contained in the st driver patch:
See below for a brief description of why the st driver patch is part of the fix for this particular case. Prior to DR'ing out a System Board which contains permanent memory, one must modunload the st driver (assuming a tape device is attached to the domain). The procedure to do this follows: # modinfo |grep tape 144 10301cab 19c8c 33 1 st (SCSI tape Driver 1.218) # modunload -i 144 # cfgadm -c unconfigure SBXX Assuming the st driver and Solaris Kernel Jumbo Patches (KJP) are up to date, this DR should work just fine. But, if you are downrev on the st driver patch or KJP, then the DR might hang as shown in the "Symptoms" section of this article. After applying the st driver fixes and the KJP, and confirming that the st driver is now unloaded, the DR unconfigure of the System Board containing permanent memory should work with little delay. This problem may occur on drivers other than the described JNI driver above, and the fix in this case may be slightly different in your situation. In this specific case, the domain was configured with the following hardware/software:
As you can see, the st driver is included in the fix because the card is attached to a tape device. If a site had trouble resuming a device driver attached to a disk device, the sd driver would be suspect. Internal Comments Related Documents <Document: 1010363.1> "Sun Fire[TM] 12K/15K/E20K/E25K Servers: Dynamic Reconfiguration Considerations" <Document 1001683.1> "Sun Fire[TM] 12K/15K/E20K/E25K: Location and Relocation of Kernel for DR Operations" Article written as a result of Radiance case 64199977, Escalation ID 1-2917411 DR, dynamic reconfiguration, cfgadm, rcfgadm, JNI, HBA, I/O, resume, permanent memory, 12k, 15k, 20k, 25k Previously Published As 77660 Attachments This solution has no attachment |
||||||||||||
|