Resetting failed drive in linux mdadm raid array

2012-04-12 23:05

Today I was greeted with a failed drive in a mdadm raid array. The drive had some transient errors and was kicked out of the array, but testing showed that the drive still seemed to work just fine.

Harddisks — Image by Martin Abegglen (https://www.flickr.com/photos/twicepix/3333710952)

The following procedure will remove the drive from the array, remove it from the system, re-probe for disks, and then re-add the drive back into the array(s).

Remove the failed drive from the array, in this case, it was /dev/sdb:
- mdadm --manage --set-faulty /dev/md0 /dev/sdb1
Make sure nothing on this disk is being used (mounts, other arrays, etc)
Reseat the drive from the system, either physically, or using the following commands:
- echo 1 > /sys/block/sdb/device/delete
- echo "- - -" > /sys/class/scsi_host/host1/scan
Check if the drive is found again, and check if it works correctly
- check dmesg output, or look at /proc/partitions
- try running: ‘pv < /dev/sdb of=/dev/zero‘
Re-add the drive to the array(s)
- mdadm /dev/md0 -a /dev/sdb1
- cat /proc/mdstat

That should do the trick…