Replacement disk not recognized on X4540

Asset ID:	1-72-1386502.1
Update Date:	2012-05-11
Keywords:

Solution Type Problem Resolution Sure

Solution 1386502.1 : Replacement disk not recognized on X4540

Applies to:

Sun Fire X4540 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4500 Server - Version Not Applicable to Not Applicable [Release N/A]
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on x86 (32-bit)

Symptoms

The issue: Under very rare circumstance, replacement disk would not be recognized by the system, and when #cfgadm -al is issued customer would see sd instance under Attachment Point (Ap_Id) instead of the actual target.

adcmohtsm01{root}% raidctl -l
Controller: 1
Disk: 0.0.0
'
'
Disk: 0.7.0
Controller: 2
Disk: 0.0.0
Disk: 0.1.0
Disk: 0.2.0
Disk: 0.3.0
Disk: 0.5.0
Disk: 0.6.0
Disk: 0.7.0

Discrepancy in cfgadm -alv

Ap_Id Type Receptacle Occupant Condition
c2 scsi-bus connected configured unknown
c2::dsk/c2t0d0 disk connected configured unknown
c2::dsk/c2t1d0 disk connected configured unknown
c2::dsk/c2t2d0 disk connected configured unknown
c2::dsk/c2t3d0 disk connected configured unknown
c2::dsk/c2t5d0 disk connected configured unknown
c2::dsk/c2t6d0 disk connected configured unknown
c2::dsk/c2t7d0 disk connected configured unknown
c2::sd30 disk connected unconfigured unknown <<<<<<<<<<<<<
(SD Instance Number recorded as Attachment Point ID- Stale Entry )

This issue is specific to customer and is termed to be a administration issue. I thought its worth to be documented hence documenting it.

Changes

Customer had replaced the disk.

Cause

Cause 1.
Quiet Possible disk replacement was not done as per the procedure
(refer to the Links below for procedure)
Customer should have simply un-configured the disk when the ready to remove light was
lit they should have removed the disk and replaced the same with new disk

Verify that the blue LED on the disk turns off after one minute. If the blue LED does not turn off after one minute, you can have the OS re-enumerate device nodes and links by typing:
# devfsadm -c.

Cause 2.
Assumptions:
Buggy disk firmware, Disk controller (LSI) Firmware, SD, cfgadm driver patches

Solution

Most customers would not agree for Down Time: Hence possible ways to recover, from this situation is

Step a.
     - Pull out the disk
     - do devfsadm -Cv     #command to clear all stale enteries
     - cfgadm -c unconfigure c2::sd30 # command to unconfigure the device

Step b.
   - Insert the disk
      - do devfsadm -v               [#command to scan for new devices]
      - check format output to see if disk is getting detected or not
      - cfgadm -c configure c2    [#configure all the underlined device in controller 2]

If   Steps 'a' and 'b' step fails

Step c.
Redo the below steps once gain
1.# cfgadm -c unconfigure c2::sd30     [command to unconfigure the device, if fails try option 2]
2.# cfgadm -x remove_device c2::sd30 [Forceful removal]
3.# cfgadm -al #and see if the c2::sd30 is gone
4.# cfgadm -x insert_device c2::dsk/c2t4d0 or
   if the above command fails run
4a # cfgadm -x insert_device c2
verify the device c2::dsk/c2t4d0 is added by running cfgadm -al
5.# cfgadm -c configure c2::dsk/c2t4d0

If the above step fails Customer is requested to reboot the server (cold reboot preferred)

Why:
Because devfsadm has not responded even after preforming the unconfigure command
the corruption persist and entries of cx::sdxx is not removed

This shows something is messed up with cfgadm, we request customer to either
reboot the server or cold power-cycle (preferred) the server. Only Resolution

If the issue does not get resolved following a reboot / cold powercylce engage Oracle

TSC is requested to engage the drivers team SN-DK Storage Drivers (MOS Group)

Why: Because its Driver mess-up.

Important Document Link
Replacing a drive that has not been explicitly failed by ZFS
Server Maintenance Documentation

Note: 1.
This types of issue are basically due to incorrect procedures followed during disk replacement or a genuine driver bug (SN-DK Storage Divers)

Customer should be advised of keeping their.
- System Firmware, Disk Controller Firmware, Disk Firmware up-to date.

Attachments

This solution has no attachment