ZFS endless resilvering

view story

http://serverfault.com – I have a large (> 100TB) ZFS (FUSE) pool on Debian that lost two drives. As the drives failed, I replaced them with spares until I could schedule an outage and physically replace the bad disks. When I took the system down and replaced the drives, the pool started resilvering as expected, but when it gets to about 80% complete (this usually takes about 100 hours) it restarts again. I'm not sure if replacing two drives at once created a race condition, or if due to the size of the pool the resilver takes so long that other system processes are interrupting it and causing it to restart, but th (HowTos)