Resetting failed drive in linux mdadm raid array

Today I was greeted with a failed drive in a mdadm raid array. The drive had some transient errors and was kicked out of the array, but testing showed that the drive still seemed to work just fine.

Harddisks

Image by Martin Abegglen (https://www.flickr.com/photos/twicepix/3333710952)

The following procedure will remove the drive from the array, remove it from the system, re-probe for disks, and then re-add the drive back into the array(s).

  • Remove the failed drive from the array, in this case, it was /dev/sdb:

    • mdadm --manage --set-faulty /dev/md0 /dev/sdb1

  • Make sure nothing on this disk is being used (mounts, other arrays, etc)

  • Reseat the drive from the system, either physically, or using the following commands:

    • echo 1 > /sys/block/sdb/device/delete

    • echo "- - -" > /sys/class/scsi_host/host1/scan

  • Check if the drive is found again, and check if it works correctly

    • check dmesg output, or look at /proc/partitions

    • try running: ‘pv < /dev/sdb of=/dev/zero‘

  • Re-add the drive to the array(s)

    • mdadm /dev/md0 -a /dev/sdb1

    • cat /proc/mdstat

That should do the trick…