Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003585.1
Update Date:2012-02-27
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003585.1 :   Sun Fire [TM] 12K/15K: Cards Fail to be Configured in I/O Board Slot 1 if Slot 3 is Populated  


Related Items
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
205060


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
All Platforms

Symptoms

Adding a Crystal-2A card (2GB/Sec PCI Dual FC Network Adapter) into a domain's configuration using cfgadm results in errors, and then a domain panic.

#  cfgadm -c configure pcisch0:e09b1slot1
- error message - "Hardware specific error"

Then the /var/adm/messages or the domain console show a panic:

WARNING: pcisch-0: PCI fault log start:
PCI excessive retry error
PCI error ocurred on device #6
pcisch-0: PBM AFSR=0x20800003.20000000 dwordmask=0 bytemask=3
pcisch-0: PCI primary error (8):
Excessive Retries
pcisch-0: PCI secondary error (8):
Excessive Retries
pcisch-0: PBM AFAR 0.00124040:WARNING: [AFT1] Bus Error (BERR) Event
detected by CPU291
Privileged Data Access at TL=0, errID 0x000001d0.41c71c38
AFSR 0x00100800.00000000 AFAR 0x00000451.00124000
Fault_PC 0x10372f4
...
^Mpanic[cpu291]/thread=2a1004d7d40: [AFT1] errID 0x000001d0.41c71c38 BERR
Error(s)

Cause

Suspecting that the I/O card itself might be bad, the card is replaced. The cfgadm error and panic persist.

Solution

Resolution steps:

1. Confirm that the card in question is a Crystal-2A card (problem has also been seen on a third party "Venus" card and Qlogic card, but this document specifically covers a Crystal-2A card):

https://support.oracle.com/handbook_private/Devices/Fibre_Channel/FIBRE_Dual_2GB_FC_AL.html

2. Confirm this card's location is in I/O board Slot 1 (c5v0 slot; Top Right Slot).

3. Confirm that another card (type unimportant) is installed in the same I/O board, Slot 3 (c5v1; Top Left Slot).

If all are true, this document applies.
If all are not true, this document does not apply.

Ultimately, the problem is based on Bug ID 4830665, "hsPCI board does not implement 33Mhz slots M66EN signal correctly." The bug is a design problem with the 5 volt I/O cassettes hardware revision <09, which prevents I/O board slot1 cards from being configured properly if I/O board slot 3 is populated with any card (type unimportant). The bug was filed for issues
with "Venus" I/O cards.

Also:

  • Bug ID 4993711 exists for this same issue with Qlogic cards.
  • Bug ID 4987200 "Crystal2a's may not initialize properly if installed in slot1 w/slot3 populated," was filed on Crystal-2A cards for the same issue.

The resolution is to use pn 501-5600-09 or higher rev I/O cassettes.


Relief/Workaround

If replacement of the I/O cassette is not currently an option, install the Crystal-2A card in question in a different I/O Slot (non-Slot1). Or, you may remove (if possible) the I/O card, which is located on the same I/O board Slot 3, and leave Slot 3 empty.

Additional Information
This solution is intended for use by Sun IT and Sun IT Partner Engineers only.

Bug ID 4987200 shows that a symptom of this issue appears at OpenBoot(R) PROM (OBP) on the domain console.
Note: This message will only be logged in the domain's console log, not in /var/adm/messages, and will only be seen when the OBP setting "diag-switch?=true" is set. When diag-switch? is set to false, the "Probing" messages are not logged to the console.
From the bug:
The hard failure always shows up in OBP as:
*******************************************
Probing /pci@5d,600000 Device 1  Nothing there
When OBP does find the cards the output is always:
**************************************************
Probing /pci@5d,700000 Device 1  SUNW,qlc fp disk SUNW,qlc fp disk

Another symptom of this issue not already documented in the bug, is an error message that may be logged during the boot-up process:
May 16 20:58:21 2004 WARNING: POST status for card in /IO9/C5V0 is good but OBP failed to probe it!
NOTE: The I/O board location in error may change depending on your domain configuration, but the slot (c5v0) will be the same.
This message simply states that HPOST (Hardware Power On Self Test) configured the I/O card into the domain successfully (and postlogs will confirm that), but OBP can not find/see/configure the card in the slot specified in the domain.


Product
Sun Fire 12K Server
Sun Fire 15K Server

Internal Section

There is FCO A0246-1 for the 5-volt I/O cassettes pn 501-5600-08 and below: https://support.us.oracle.com/handbook_internal/fin-fco/1-6-A0246-1-1.html.

Reference Radiance case 64077490, Apollo Escalation 1-1031362, which encountered all the symptoms described above in this document. After replacing the cassettes with rev 09, all symptoms ceased and cfgadm operations worked smoothly.

Keywords: 12k, 15k, 12K, 15K, Crystal2A, Crystal-2A, crystal-2a, cfgadm, OBP, I/O cassette, 5volt

Previously Published As 76434


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback