Have been alerted [scared] by @jeyare about RAID 5 (SHR-1) potential for failure during rebuild due to unrecoverable read error (URE) when accessing disk sectors. The likelihood of this occurring increases the larger the RAID size. So given that in RAID 5 you have to reconstruct the missing data by combining the remaining disks' data and associated parity data then that gives a URE exposure as all of the data (real and parity) has to be read without error.
So now I'm in a dilemma for when I want to add more storage (5-bay NAS): add a fourth 8TB Ironwolf to SHR-1, or start a new SHR-1 with two 8TB. It's about the recovery time of a bigger RAID vs splitting risk across two. Also the risk of it happening increasing with a bigger array. What needs to be backed up is and what doesn't isn't, so rebuild from scratch is the last resort. I'd rather mitigate the risk as best I can.
Now to the subject: running data scrubbing on the storage pool would seem to be a wise thing to do, while the array is working well. Other than it can take a long time to complete and can affect overall performance, it will check and repair the RAID so hopefully that will reduce the risk somewhat. The more frequent it is run the higher the likelihood that all data will be accessible.
I've now setup a data scrubbing schedule that pauses during normal NAS usage hours. It'll be interesting to see how long it takes to complete.
I'm looking for more knowledgeable people to advise
So now I'm in a dilemma for when I want to add more storage (5-bay NAS): add a fourth 8TB Ironwolf to SHR-1, or start a new SHR-1 with two 8TB. It's about the recovery time of a bigger RAID vs splitting risk across two. Also the risk of it happening increasing with a bigger array. What needs to be backed up is and what doesn't isn't, so rebuild from scratch is the last resort. I'd rather mitigate the risk as best I can.
Now to the subject: running data scrubbing on the storage pool would seem to be a wise thing to do, while the array is working well. Other than it can take a long time to complete and can affect overall performance, it will check and repair the RAID so hopefully that will reduce the risk somewhat. The more frequent it is run the higher the likelihood that all data will be accessible.
I've now setup a data scrubbing schedule that pauses during normal NAS usage hours. It'll be interesting to see how long it takes to complete.
I'm looking for more knowledgeable people to advise