Currently on DS918+: my plan for storage expansion & request for comments/constructive feedback

Currently reading
Currently on DS918+: my plan for storage expansion & request for comments/constructive feedback

Hey everyone, I’m a beginner to intermediate user of Synology DS918+ so please forgive my ignorance or lack of knowledge in many areas of this whole space of data redundancy, backups, NAS hardware and home networking/cybersecurity - I’m here to ask for help from this knowledgeable community about how to best go about my current situation. Thank you in advance to all of you who make it till the end and provide your valuable feedback, knowledge and experience and to all the trolls - skip this one out please; thanks! :) Get ready for a long read and without further ado, let’s jump straight in.

Background:

Bought my first Synology DS918+ in 2019 and populated it with 4× WD 12TB Ultrastar DC HC520 SATA HDDs and created one RAID6 Volume on EXT4 filesystem (had no clue back then that BTRFS might be a better idea - didn’t know any better at the time...). Been running regular monthly data scrubs, auto S.M.A.R.T. tests and extended drive tests and thus far there were no errors detected on HDDs or anything else that my DS918+ would complain about in terms of RAID volume or drive health.

The main issue is that I’m currently at 82% (17.8 TB/21.7 TB) of usable capacity on my RAID 6 Pool. I currently still have about 7 TB of data spread across 5 different hard drives and SSDs and would like to move all that to RAID to protect the data from disk failure - some of the drives are 5+ years old and I now have mental bandwidth (and courage to create this Reddit post) and adequate amount of money saved purposely to sort the current situation properly.

My aim:

I had a couple of HDD failures with data loss in the last 15 years (resulting in almost complete loss of data) and getting a Synology DS918+ in 2019 was a game changer as I built my first RAID 6 array and was able to move all of the data I had back then to DS918+. I am aware that DS918+ by itself does not replace backup and I am also aware I have not yet implemented a proper 3-2-1 backup strategy because of the lack of time and finances to do it (will address that later in this post). To accommodate about 7 TB extra data and move them to redundancy storage, here’s my plan:
  1. Buy 8-bay Synology DS1821+
  2. Buy brand new 5× WD Gold 22TB drives
  3. Upgrade stock 4GB RAM with these 2×32GB ECC RAM sticks: https://www.amazon.com/gp/product/B08MT3SLN6/ (other Reddit posts indicate that these should work well) and run RAM memory test to check everything is ok with RAMs then proceed to inserting new HDDs
  4. Insert 4 drives and save 5th drive as a hot spare in case one of the drives fails, leave other 4 bays to populate them later on (once I save up enough money for purchase of the next 4×22TB WD Gold HDDs)
  5. Run Synology HDD db script (https://github.com/007revad/Synology_HDD_db) to prevent Synology annoying me with drive incompatibility messages and reduced functionality that comes with that
  6. Once Synology HDD db script successfully does its magic and Synology DSM is happy with new drives being fully compatible, then run extended tests on all 4 drives - wait for that to complete without an error on all drives
  7. Create SHR2 BTRFS Volume (44 TB usable space) and run extended tests on all 4 drives again - just to be 100% certain
  8. Copy DS918+ 17.8 TB Volume to DS1821+ with software that would do file verification of copied data and wouldn’t change file & folder creation dates (I was looking at rclone https://rclone.org/ - any suggestions on which other software with similar reliability and GUI would also do the job properly?)
  9. Use the remainder of approx. 26.2 TB on newly created 44TB BTRFS Volume to copy approx. 7TB of data currently spread across different HDDs/SSDs using same or similar method as in previous step
  10. I would leave the remainder of approx. 19.2 TB for more data to come (my current estimations based on current data production are 10TB in the next 5-7 years which would also leave me with about 20% of volume space for proper hybrid RAID functioning) and a couple of VMs + Docker containers (to keep this post shorter, I’ll open a separate topic for discussion around that)
Use case - now and plans for the future:
  • Current use case:
I use DS918+ as primarily as an archive NAS for critical data (family videos, photos and work-related large video and other data files). One user of DS918+ (no kids or other tech-unsavvy users fiddling around those two PCs), just storage, no bells and whistles (e.g. mail server, VMs, Docker containers etc.) - NAS is merely a device to look after RAID6 and disks. NAS is connected to PC and laptop via Cat6 UTP cabling (via home router) from which files are copied onto NAS via Windows Network Drives feature. Files are uploaded remotely onto PC via FTPS file server, where they are thoroughly scanned with ESET NOD32 Antivirus primarily (I also use Malwarebytes and hashes of potentially dodgy files to be uploaded directly to VirusTotal for screening) and I also use Quad9 as default DNS provider across my networks to reduce risk of any viruses/malware/ransomware. UPS-es are connected consecutively as shown on the diagram below.
current_setup.png

  • Future use case:
Similar as currently but I would use 8-bay DS1821+ where I currently have DS918+ and I would relocate DS918+ to a different location as per diagram below. After this intro (hopefully clear enough), please see my questions below regarding the proposed setup.
future_setup.png

My questions (labelled A to E):

A)
Applying 3-2-1 backup rule in practice, here’s what my short- to mid-term plan is:

  • Move all my data to the newly created SHR2 BTRFS 44TB Volume spread across 4×22TB WD Golds on DS1821+ which would be my first (primary) copy of data.
  • Second copy of crucial data would at this stage remain on the original DS918+; would that be wise given that volume has less than 20% free space? What is the lowest tolerable percentage of free space that Synology does what it’s supposed to do (maintain healthy RAID6) and that parity calculations are done properly? Would very low percentage of remaining space on the volume in any way affect RAID6 parity calculations? I read somewhere on Reddit that 15% is the critical still tolerable space for RAID to function normally.
  • Third copy for all my data might be an overkill so I would appreciate community sanity check here: Third copy of data would eventually be created as I get a hold of at least 4×22TB WD Gold HDDs (would aim to squeeze in 1 spare 22TB WD Gold as hot spare) and stick them into the remaining 4 bays in DS1821+; once that would be done, I would create a mirror of Volume 1 (first pool of 4×22TB WD Gold HDDs) to Volume 2 (second pool of 4×22TB WD Gold HDDs). Would that be a complete overkill and if so, what would be reasonable and acceptably more risky alternatives for a second copy of the 44TB Volume 1? By acceptably more risky alternatives I want to acknowledge that any RAID higher than 1 is better than no RAID (no fault tolerance) so I guess simply a huge copy of data to 22TB WD Gold sitting in an external enclosure would do the job. I also acknowledge that there may be a different alternative that I am not aware of and would greatly appreciate community feedback on this.
  • I would sell current 4×12TB drives and use that money again towards buying 4×22TB WD Gold HDDs (again, would aim to squeeze in 1 spare) and repeat steps 5 and 6 as above on DS918+ to create a third 44TB SHR2 BTRFS pool. I would then sync DS918+ with DS1821+ mirrored volumes and move DS918+ to a different location with fast FTTH connection for incremental backups (via Tailscale or similar but that’s a story for another time). That would ultimately conclude my 3-2-1 backup plan. Again, all comments and suggestions on alternative and equally sound/”safe” approaches to this would be greatly appreciated.
B) In terms of HDDs - what are the main differences between WD Gold Series and WD Ultrastar Series? Are all the drives in both series CMR drives and helium filled? I was unable to find a proper technical breakdown of differences and similarities between two Series and they seem to be the same (just different sticker and target audience) but for some reason there’s a higher price tag on Golds - does anyone have any insider info on what’s really the main difference (and if so - is it worth the extra $$?) between Gold and Ultrastar Series?

C) What are the potential risks of upgrading RAM from stock Synology 4GB ECC RAM to Non-Synology 2×32GB ECC RAM? Suppose I get the sticks, run a memory test and it turns out ok after a first run. Are there any other risks in the long term in terms of RAM failing or causing data corruption? Is it a case of binary type of outcome - it either works flawlessly or it doesn’t at all or is there a spectrum of working/working up to some point (e.g. up to, say, 48GB used and then there could be some corrupted memory cells on RAM that would be undetected by standard memory test and the machine would reboot amidst some data reading/writing/disk scrubbing/ testing)? I would really appreciate some valuable insights around this - so far, a couple of people upgraded their RAMs in DS1821+ to 64GB and have had no issues thus far but I would like to know more about some theoretical scenarios of what could go wrong so I can do my own risk assessment around whether that upgrade would be worth the money/risk.

D) What would be the optimal BTRFS file system settings for archival/mostly storage purposes (data that are rarely changed so revisions of files would not be frequent)? I don’t want too much overhead (parity/checksum?) data for file healing as I believe that SHR2 pool with regular scrubbing and extended disk tests would take care of that part equally well or I am not understanding the benefits of BTRFS fully? I would appreciate some thoughts around that.

E) The idea behind 2 consecutively/successively connected UPS-es is clearly if one fails the other one takes over (1 had one such situation before, luckily without any noticeable consequences, and I’m a bit paranoid because of that). UPS 1 is connected to the power plug and then UPS2 directly to UPS1 and Synology, both computers, my home router and ISP modem to UPS2. What are the potential flaws of this setup and what can I do to improve power redundancy?

Thanks again for all the valuable feedback peeps, much appreciated in advance! :)
 
do you keep running extended disk tests for a month or populating drives/array with actual data (e.g. moving a backed up copy of large amount of data to new array)
Extended tests on all drives, and then actual data transfers and data usage

data scrubbing cannot detect bit rot but data scrubbing on BTRFS can
You got it. Ofc, plus all that was said in terms of Synology apps and their need to "live" on the BTRFS volume.
 
Upvote 0
Extended tests on all drives, and then actual data transfers and data usage


You got it. Ofc, plus all that was said in terms of Synology apps and their need to "live" on the BTRFS volume.

Perfect, thank you for the info regarding the tests! 👍

So it's not possible to detect bit rot in RAID6 + EXT4 configuration or does bit rot concern apply to single disk instances of EXT4?
 
Upvote 0
So it's not possible to detect bit rot in RAID6 + EXT4 configuration or does bit rot concern apply to single disk instances of EXT4?
There are various factors that can cause bit rot such as physical degradation, electromagnetic radiation, or even software issues (bugs for example). Rot is not something that can be tied to a single disk or an array of drives, but a gradual decay or corruption of stored data.

Meaning, data integrity is a priority as is having a working backup.
 
Upvote 0
There are various factors that can cause bit rot such as physical degradation, electromagnetic radiation, or even software issues (bugs for example). Rot is not something that can be tied to a single disk or an array of drives, but a gradual decay or corruption of stored data.

Meaning, data integrity is a priority as is having a working backup.
Understood, thank you!
-- post merged: --

Bitrot is possible regardless of format type (yes, even on Windows and MacOS). To repair bitrot requires btrfs formatting AND drive redundant volumes.
Thank you Telos, now that really makes a highly compelling case to use BTRFS and minimum of RAID1 setup for data integrity!

To both you and Rusty - how do you detect a bit rot on certain file? Is it a simple case of file not opening (in respective software, e.g. mp4 file in VLC, docx file in Word) or are there other means of figuring out that bit rot happened?
 
Upvote 0
a highly compelling case to use BTRFS and minimum of RAID1 setup for data integrity!
Corollaries:
  • With one drive redundancy, before swapping a failing drive, or replacing a drive with a larger drive, consider a full scrub to ensure data integrity prior to drive change out. Otherwise, bitrotted data will be the baseline for the new volume.
  • Some advcate that your backups be bitrot protected as well. Backing up to a USB drive cannot assure this.
Regarding ext4 (hopefully this does not conflict with @Rusty's explanation).... data scrubbing on ext4 is basically a "RAID scrub" which verifies that the parity matches the stored data, and relies on drive reporting error reading sector to know which bits (data or parity) is corrupted to recover from the other one. If drive does not report error but returns bad data (bitrot) the scrub will see two conflicting bits of data. By default, the scrub presumes that the data is correct and that the parity is wrong and recompute parity. Practically speaking... this is a coin toss, whether the correct data is preserved, or the bitrotted data.
 
Upvote 0
@Telos thank you, very insightful information - I'll definitely be scrubbing my drives before any intentional drive replacements! USB drives cannot be formatted to BTRFS I guess?

EXT4 data scrubbing just triggers what RAID would do in case of drive failure but in this instance it's "on demand"?

In case of bit rot detected by RAID it's the drive that has to report bad sector but in case of BTRFS it doesn't rely on hard drive information per se but uses BTRFS-related parity calculation on top of the actual underlying RAID?

I hope I didn't mix things up but this is how I now understand the difference - in my analogy, BTRFS is just an extra set of "airbags" on a filesystem level on top of existing RAID to help reduce the unfortunate event of bit rot. Please correct me if I'm wrong with my understanding of the differences.

And it's very useful to know that RAID assumes data to be correct by default which means that data is used as "gold standard" for comparison of what right looks like. Meaning that corruption in data is propagated across all drives should it occur (software error, ransomware etc.), is that correct?
 
Upvote 0
Upvote 0

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Trending threads

Back
Top