Resetting failed drive in linux mdadm raid array
Today I was greeted with a failed drive in a mdadm raid array. The drive had some transient errors and was kicked out of the array, but testing showed that the drive still seemed to work just fine.
The following procedure will remove the drive from the array, remove it from the system, re-probe for disks, and then re-add the drive back into the array(s).
-
Remove the failed drive from the array, in this case, it was /dev/sdb:
mdadm --manage --set-faulty /dev/md0 /dev/sdb1
Make sure nothing on this disk is being used (mounts, other arrays, etc)
-
Reseat the drive from the system, either physically, or using the following commands:
echo 1 > /sys/block/sdb/device/delete
echo "- - -" > /sys/class/scsi_host/host1/scan
-
Check if the drive is found again, and check if it works correctly
check dmesg output, or look at /proc/partitions
try running: ‘pv < /dev/sdb of=/dev/zero‘
-
Re-add the drive to the array(s)
mdadm /dev/md0 -a /dev/sdb1
cat /proc/mdstat
That should do the trick…