Bad sectors while repairing RAID 5 volume. Advice needed!

Currently reading
Bad sectors while repairing RAID 5 volume. Advice needed!

17
1
NAS
918+
Operating system
  1. Windows
Mobile operating system
  1. Android
Hi looks like I am unlucky.

I am in the process of replacing all 4 discs. (DS 918+)
After each replace I am doing a volume repair.
Up to 3 disks have been replaced and no problems with the repair.
After inserting the 4th and last disk and repairing the volume. the process is extremely slow and keeps going slow around 26%. It constantly shows notifications bad sector on disk[3].

There was also a 'file system notification' window visible that suggested me to reboot. And gave the choice to 'ignore' this or the choice to reboot with and without a 'remap'.
However just before restarting, synology informs me that there is still a repair going on and restarting is not recommended.

What should I do? Wait and see till rebuild is complete? Or restart in the meantime?

Final question. Can I assume that disk[3] is literally disk 3 in the array? Or litterally disk number 3?
(As a former programmer I am used to arrays from 0 to x?)

Thanks for your help!
 
For now I would wait and see if it finishes, plus submit a Synology support ticket.

Disk[3] would be Drive 3 as in this, and is the third drive bay (from the left, looking at SM's Overview image) ...
1660570474852.png
 
Update:
Disk manager reports already 708 bad sectors so far while trying to repair the array after inserting the 4th disk.
The disks are brand new by the way.

I don't get that DSM did not report any problems after the rebuilding the disk array after disk 3 was completed.
So only while repairing the array with the 4th disk inserted, there are a massive amount of errors on disk 3.
-- post merged: --

For now I would wait and see if it finishes, plus submit a Synology support ticket.

Disk[3] would be Drive 3 as in this, and is the third drive bay (from the left, looking at SM's Overview image) ...
View attachment 10506
Thanks! I already submitted a support ticket at synology. Unfortunately they are not that quick compared to all the helpful knowledgeable people here.

Do you have any comments on my update ?
 
Before attempting another repair, ensure your backups are good. If using btrfs format, run a file scrub as well to ensure data integrity.
I use ext4. I have performed scrub before the the entire swapping procedure. Should I also use scrub after each disk replacement (and after each volume repair has finished?)
 
Okay extra episode of 'project disaster':
Bad sectors were also reported on disk 4.

Disk 3 has already 1300+ bad sectors reported by the storage manager.

BUT storage manager did not report the bad sectors found on disk 4??

This while the 'notification center" shows a log where the bad sector are mentioned.

How is this inconsistency possible?

4 brand new Western digital disks replacing 4 faulty western digital disks. (I should have listened to some of you suggesting to buy a better brand)
 
When your rebuild is "complete", shutdown the NAS, and test the drives using WD's Data Lifeguard software, on a PC.

With those kinds of error counts, the drives do seem suspect.

FWIW, nearly all my drives are WDs (mostly Reds, and some shucked Whites). Only had one drive failure, and it was an Ironwolf.
 
Okay extra episode of 'project disaster':
Bad sectors were also reported on disk 4.

Disk 3 has already 1300+ bad sectors reported by the storage manager.

BUT storage manager did not report the bad sectors found on disk 4??

This while the 'notification center" shows a log where the bad sector are mentioned.

How is this inconsistency possible?

4 brand new Western digital disks replacing 4 faulty western digital disks. (I should have listened to some of you suggesting to buy a better brand)
One thing to keep in mind is that array rebuilds and expansion do stress the entire array while those events process. One would think that brand new drives would be very low risk for bad sectors/failure. Sounds like bad luck. Sorry for your experience.
 
have been in contact with synology. not the best experience so far. Had to turn of a stable working NAS to re-proof that there were bad sectors on the two disk using the manufacturer diagnosis tool. And guess what; the extended drive test failed. What a surprise!

Then I placed the discs back into the NAS. After restarting I could not access DSM anymore.
I tried the power button to power down. But even that seems to be too much to ask.

Any advice on how to proceed? Or is it just a lost cause and start with a fresh install and a fresh box of tissues :cry::)
 
I do hope you have an external backup? If it were me, I'd be returning the WD Reds and considering a different brand, or at the very least RMA'ing the faulty drives with the vendor you purchased them from or WD directly.
 
I replaced disk 3 first.

During the process op repair, the system shows a file system notification. See attachment.



Should I remap after reboot?

And if yes, should I reboot after repair of disk 3 has finished? Or should I first focus on replacing disk 4 and *then* do a reboot with a remap?
 

Attachments

  • zhapiaBoJSA-NKgVOwPWlJdXc4_6nKHklOWTWxx9DPk.png
    zhapiaBoJSA-NKgVOwPWlJdXc4_6nKHklOWTWxx9DPk.png
    100.9 KB · Views: 35
Don't know, sorry. And a google for 'synology dsm "remap the drive after reboot"' doesn't really help either. It finds people asking the same question and no real answer.

How is the repair progressing? Did you have to start from a fresh storage pool or were you able to somehow keep the original.
 
Thank you for asking!

"Remapping the disk runs a task of writing/reading data to every sector of the suspect disk. If a sector comes back with a bad read the NAS will tell the drive to mark the sector as bad and use a spare sector from the pool of hidden spares that all drives have. "

I contacted Synology about this. The support engineer had to pass this question to development. They just advised me to do so.

As a result in de Log all the files that are (or might?) be corrupted are listed. These are only something like 30 to 40, which seems like good news. But I am suspicious. Both drives had a total of 1300+ bad sectors.
Anyway I just have to restore these files from a backup and I expect the Synology to be in the same state as just before the disk errors occurred.

But there is a bigger problem now. I ran a binary file comparison between (hyper) backup and the current state of the NAS (after repair). That is, after the faulty drives have been swapped and after remap procedure. The file comparison proofs that the files mentioned in the Synology Log to be damaged (as a result of the remap) are indeed binary different from the backup. (I even tried a second file comparison tool. But even the second tool shows exactly the same binary differences.)

*BUT* the file comparison proofs there are much more files on the (hyper) backup that are binary unequal to the current state of the files on the NAS.

This makes me wonder is the remap procedure of Synology, and mores specific the logging of affected files due to bad sectors is faulty. Or the hyper backup is faulty and not capable of creating a binary equal backup of the source files before I started changing any hardware. (As I mentioned before I started swapping SMR disks vor CMR. So the backup was made before any of the SMR was swapped for a CMR). So when the system was still in good state I made a Hyper backup. And I wonder to what extend I can rely on this backup right now.

At this stage I don't know what to trust. Should I rely on the current state the system is after the remap and just restore the affected files in the REMAP log?

And that is not even the last thing that puzzles me. I have let's say 50% of the data in the cloud. If I sync this data to a PC and compare this data to both the the data on the nas or the hyper backup, some files just remain binary different.
This is weird. I hoped that this final step would proof where I can rely on.

Ay idea's why any of the three copies of data might have some files that are binary unequal?
 
The remap explanation makes some sense and was one of the suggested meanings I found when googling.
Ay idea's why any of the three copies of data might have some files that are binary unequal?
I’m no expert, far from it. But when doing a Get Info on a copied file from Mac internal storage to USB HDD I have noticed that there is usually a difference between the absolute bytes used. Mostly likely as a result of the file systems on the particular devices.

Of course differences could be corruption but then I would think that for the system to notice it would have to actively check and re-catalogue the used storage (how would it know unless checksums etc are used, but that’s on the whole file) or if the file is copied or moved around. Generally you’ll find corrupted files when you want to use them. So do you want to check them?

Btrfs has bit-rot dectection and snapshots, which may be a reason to move from ext4?
 
I played around with file comparison tools (double commander en winmerge). And I played around with md5 hash.

Conclusion for some reason Hyper Backup is unable to backup to a USB connected external disk. Hyper Backup concludes it made a prober backup. But for some files both binary comparison en hash comparison fails.

And that is not very helpful as I cannot determine anymore which of the two I can rely on.
 
So you are using the USB single-version task type? I do use that as a get-out-of-jail for quick acces to some data, but it's never been needed (yet!).

I use the multi-version task types and they don't create directly accessible cloned files/folders. But I still have both task types directed to USB drives connected to a 'backup' DS NAS. I have been able to restore from the multi-version backup when I rebuilt the DS1520+ with btrfs file system, so far haven't encountered a problem from using those files.

May be you could query Synology Support why there is a difference between the different variants the a file?
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Similar threads

Thanks for that elaboration. I think I'll just replace the drive to avoid future issues happening at a...
Replies
4
Views
8,274
  • Question
In my old 215J I had 2x 2TB Red in SHR, and one kept informing it had errors... I replaced one with...
Replies
3
Views
1,405
Replies
4
Views
1,340
I followed your advice.. I've now upgraded the last of my three drives. But, it does a "repair" in two...
Replies
3
Views
1,604
Maybe a stupid question: can I work parallel while USB Copy is copying fils on an External SSD in...
Replies
0
Views
1,022

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Trending threads

Back
Top