Asset ID: |
1-77-1001016.1 |
Update Date: | 2011-02-09 |
Keywords: | |
Solution Type
Sun Alert Sure
Solution
1001016.1
:
Sun Fire 12K/15K/E20K/E25K Domains Running Solaris 8 2/04 May Experience Bus Error When Using Dynamic Reconfiguration
Related Items |
- Sun Fire E25K Server
- Sun Fire E20K Server
- Sun Fire 12K Server
- Sun Fire 15K Server
|
Related Categories |
- GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
- GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
|
PreviouslyPublishedAs
201342
Product
Sun Fire 12K Server
Sun Fire E20K Server
Sun Fire 15K Server
Sun Fire E25K Server
Bug Id
<SUNBUG: 6532060>
Date of Workaround Release
24-JUL-2007
Date of Resolved Release
20-Jun-2008
1. Impact
When using Dynamic
Reconfiguration (DR) to detach the board hosting the permanent memory
for a Sun Fire 12K/15K/E20K/E25K domain running Solaris 8 2/04, and the
domain is composed of one or more HsPCI+ assemblies, the domain may be
interrupted by a "Safari Bus Error" causing a domain outage.
2. Contributing Factors
This issue can occur on the
following platforms:
SPARC Platform
- Sun Fire 12K/15K/E20K/E25K domains running Solaris 8 2/04 without
patch 116962-13
Note: Sun Fire 12K/15K/E20K/E25K domains running Solaris 9 and
10 are not affected by this issue.
This issue will only occur if
both the following conditions are true:
- A Dynamic Reconfiguration (DR) is attempted on the board
hosting the permanent memory (kernel)
- One or more HsPCI+ boards are installed in the domain
To determine that the domain is
composed of HsPCI+ assemblies, the following command can be run:
sms-svc% showboards -v -d 0 | grep HPCI
IO0 On HPCI+ Active Passed 0
IO1 On HPCI+ Active Passed 0
The board to be detached hosts
the kernel memory board, as in the following example:
May 10 10:21:08 2007 root# cfgadm -av | grep perm
May 10 10:21:10 2007 SB1::memory
connected configured ok
base address 0x1e000000000, 8388608 KBytes total, 2313832 KBytes permanent
3. Symptoms
During the copy/rename operation, the domain will experience a "Safari
Bus Error" causing a domain outage, as in the following example:
May 10 10:26:01 2007 root# cfgadm -c disconnect SB1
May 10 10:26:21 2007 System may be temporarily suspended, proceed (yes/no)? yes
May 10 10:26:30 2007 May 10 10:26:23 DATA01 dr: OS unconfigure dr@0:SB1::cpu0
May 10 10:26:32 2007 May 10 10:26:25 DATA01 dr: OS unconfigure dr@0:SB1::memory
May 10 10:28:14 2007
May 10 10:28:14 2007 DR: checking devices...
May 10 10:28:14 2007 DR: suspending user threads...
May 10 10:28:15 2007 DR: suspending kernel daemons...
May 10 10:28:15 2007 DR: suspending drivers...
May 10 10:28:15 2007 suspending pci108e,c416@2 (aka sbbc)
May 10 10:28:15 2007 suspending pci100b,35@0 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@1 (aka ce)
May 10 10:28:15 2007 suspending sd@8,0
May 10 10:28:15 2007 suspending sd@9,0
May 10 10:28:15 2007 suspending pci1000,b@2 (aka glm)
May 10 10:28:15 2007 suspending pci1000,b@2,1 (aka glm)
May 10 10:28:15 2007 suspending pciclass,060400@1 (aka pci_pci)
May 10 10:28:15 2007 suspending pci108e,1101@3,1 (aka eri)
May 10 10:28:15 2007 suspending pciclass,0c0310@3,3 (aka ohci)
May 10 10:28:15 2007 suspending pciclass,060400@1 (aka pci_pci)
May 10 10:28:15 2007 suspending pci108e,8002@1c,700000 (aka pcisch)
May 10 10:28:15 2007 suspending pci100b,35@0 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@1 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@2 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@3 (aka ce)
May 10 10:28:15 2007 suspending pciclass,060400@1 (aka pci_pci)
May 10 10:28:15 2007 suspending pci108e,8002@1c,600000 (aka pcisch)
May 10 10:28:15 2007 Safari bus error: CSR=0155555501c01e77 ErrCtrl=f8000000000003e0
May 10 10:28:15 2007 IntrCtrl=80000000000fc017 ErrLog=0000000000080000
May 10 10:28:15 2007 ECC_Ctrl=8000000000000000
May 10 10:28:15 2007 UE_AFSR=000001025b890138 UE_AFAR=0000088276090900
May 10 10:28:15 2007 CE_AFSR=0000000d86890111 CE_AFAR=0000014296d76a00
May 10 10:28:15 2007 FirstErrLog=0000000000080000 FirstErrorAddr=0000000000000000
May 10 10:28:15 2007 LeafStatus=0000000000000000
May 10 10:28:15 2007 panic[cpu3]/thread=2a10034fd20: Safari bus error: CSR=0155555501c01e77 ErrCtrl=f8000000000003e0
May 10 10:28:15 2007 IntrCtrl=80000000000fc017 ErrLog=0000000000080000
May 10 10:28:15 2007 ECC_Ctrl=8000000000000000
May 10 10:28:15 2007 UE_AFSR=0000010
May 10 10:28:16 2007 syncing file systems... done
In the above example, the CSR
value points to one of the HsPCI+ assemblies installed in the domain
(in this case, CSR=0155555501c01e77 ==> IO0/P0).
In general, some
'dsmd.hwconfig' and 'dsmd.dump' files are dumped as a consequence.
Using the 'redx' on the 'dsmd.hwconfig' dump file reports a parity
error on the internal memory on the I/O controller pointed to by the
CSR value:
redxl> shioc 0 1 0
xmits IO00/P0 (0.1.0) Component ID = 34651049 TO_2.1
...
Safari_Err_Log[63:0] = 00000000.00080000
Safari_1st_Err_Log[63:0] = 00000000.00080000
Safari_Err_Enbl[63:0] = F8000000.000003E0
Safari_Err_Int_Enbl[63:0] = 80000000.000FC017
ErrLog[19]: 1E Intrupt Internal Parity Error in PCI-B Leaf Logic
1st_Err_Data[59:0] = 0000000.00000000
...
...
Note: Data is displayed
from the currently loaded dump file.
4. Workaround
Until the patch can be applied
(or the system is upgraded to a later Solaris OS version), it
is recommended to avoid detaching the system board hosting the
kernel memory of domains running Solaris 8 2/04 and composed of HsPCI+
assemblies.
5. Resolution
This issue is addressed in the following release:
- Solaris 8 with patch 116962-13 or later
Modification History
20-Jun-2008: Updated Contributing Factors and Resolution sections; now
Resolved
References
<SUNPATCH: 116962-13>
Previously Published As
103016
Internal Comments
Internal Contributor/submitter
[email protected]
Internal Eng Business Unit Group
SSG ES (Enterprise Systems)
Internal Eng Responsible Engineer
[email protected]
Internal Services Knowledge Engineer
[email protected]
Internal Escalation ID
1-21332753, 1-21669795, 1-21893336, 1-21890882, 1-21805750
Internal Sun Alert Kasp Legacy ID
103016
Internal Resolution Patches
116962-13
Internal Sun Alert & FAB Admin Info
Critical Category: Availability ==> Pervasive
Significant Change Date: 2007-07-24
Avoidance: Workaround
Responsible Manager:
[email protected]
References
SUNPATCH:116962-13
Attachments
This solution has no attachment