Resetting failed drive in linux mdadm raid array

Today I was greeted with a failed drive in a mdadm raid array. The drive had some transient errors and was kicked out of the array, but testing showed that the drive still seemed to work just fine.

The following procedure will remove the drive from the array, remove it from the system, re-probe for disks, and then re-add the drive back into the array(s).

  • Remove the failed drive from the array, in this case, it was /dev/sdb:
    • mdadm --manage --set-faulty /dev/md0 /dev/sdb1
  • Make sure nothing on this disk is being used (mounts, other arrays, etc)
  • Reseat the drive from the system, either physically, or using the following commands:
    • echo 1 > /sys/block/sdb/device/delete
    • echo "- - -" > /sys/class/scsi_host/host1/scan
  • Check if the drive is found again, and check if it works correctly
    • check dmesg output, or look at /proc/partitions
    • try running: ‘pv < /dev/sdb of=/dev/zero
  • Re-add the drive to the array(s)
    • mdadm /dev/md0 -a /dev/sdb1
    • cat /proc/mdstat

That should do the trick…

2 comments to Resetting failed drive in linux mdadm raid array

  • Luka

    I will not loose any data?

  • sig-io

    If all goes well, you should still have your data intact, these commands only deal with the presence of the disks and the state of the array, they do not deal with the data on the drives themselves.

    If the data is still intact, this procedure could help bring it back online. Though you should really know what you are doing, and what these commands do before performing any type of invasive operation on your raid-arrays. Running the wrong commands can ruin your chances of succesfull recovery.