Script for faster rebuilds

Currently reading
Script for faster rebuilds

7
6
NAS
DS1513+, DS1817+
Operating system
  1. Linux
Last edited:
I own 2 Synology devices for many years and have done raid rebuilds or expansion numberous times. Unfortunately the Synology GUI reports progress on only the current MD device in progress. When it finishes that MD device and advances to the next, the progress resets back to 0%. It can frustrating to wait and think the entire process is about the complete only to see the progress indicator go back to 0%.

I don't recall where I originally read this next tidbit I'm about to share, but I found it quite helpful. The idea is to log into your device and access the terminal. From the shell, one can then run some basic command(s) to get more information about the rebuild or expansion in progress. Since the Synology has a limited set of shell commands to run, it can be little tricky to construct a command line which provides useful output. I created a command line and have enhanced it over the years that I would like to share with anyone who would like more info about the rebuild/expansion of their device(s).

To get to the terminal/shell, you need to log in as follows: To ssh in:
- "ssh admin@IP", pswd "YourAdminPassword"
- "sudo su root -", pswd "YourAdminPassword"

Once logged in, then copy and paste this very long command line into the terminal/shell window:

- t1=0; t=0; for i in $(seq 1 100000); do h=$(($t/3600)); s=$(($t%3600)); m=$(($s/60)); if [ $m -lt 10 ]; then m=0$m; fi; s=$(($s%60)); if [ $s -lt 10 ]; then s=0$s; fi; for k in $(ls -d /sys/block/md*); do mdnum="${k/*md}"; stat=$(cat /proc/mdstat | sed -n "/^md$mdnum/,/^\s*$/p" | grep speed); if [[ $stat ]]; then if [[ $t1 -eq 0 ]]; then t1=$(date +%s); fi; d=$(date -d \@$t1 "+%Y-%m-%d/%T"); echo "$d $h:$m:$s md$mdnum:$stat"; scs="$k/md/stripe_cache_size"; if [ -e $scs ]; then $(echo 16384 > $scs); fi; break; fi; done; t2=$(date +%s); sleep $((30-$(($t2-$t1)))); let t=t+30; let t1=t1+30; done

This command line will report:

"Current Date/Time, Elapsed time, MD#, progress bar, operation (ie recovery/reshape), %progress for current MD#, MD# (completed/total bytes), ETA of the current MD# to finish and the operation speed.

The output looks similar to this:

2021-08-02/13:37:29 22:30:00 md8: [====>................] reshape = 24.3% (951929340/3906982848) finish=778.0min speed=63299K/sec

In addition to reporting more detailed progress, this script also tweaks the MD# stripe_cache_size to 16384, which under all circumstance I've seen, has always improved the operational speed, sometimes quite significantly.

Note: Bear in mind the ETA will fluctuate if the operation speed changes. Its best to limit any other activity to the device to maintain higher operation speeds.

Hope you find this useful!

Also, as I've tweaked and enhanced this command line countless times, I'm always looking for further improvements. Feel free to suggest/recommend changes.
 
Additional changes that reduce to total command line length:

t1=0; t=0; for i in $(seq 1 100000); do for k in $(ls -d /sys/block/md*); do mdnum="${k/*md}"; stat=$(cat /proc/mdstat | sed -n "/^md$mdnum/,/^\s*$/p" | grep speed); if [[ $stat ]]; then if [[ $t1 -eq 0 ]]; then t1=$(date +%s); fi; df=$(date -d \@$t1 "+%Y-%m-%d/%T"); ef=$(date -d\@$t -u "+%H:%M:%S"); echo "$df $ef md$mdnum:$stat"; scs="$k/md/stripe_cache_size"; if [ -e $scs ]; then $(echo 16384 > $scs); fi; break; fi; done; t2=$(date +%s); sleep $((30-$(($t2-$t1)))); let t=t+30; let t1=t1+30; done
 
Here's the quandary... If I understood exactly what this script does, I probably could have written it myself, but since I don't, I don't see the wisdom in blindly applying it to my machine. Maybe others will do so.

OTOH, there's a trade-off between faster rebuilds, and machine responsiveness during rebuilds. So while I'd love to see the rebuild time halved, I still want unfettered, usable access to the NAS.

That all said... congrats on have the skills, perseverance, and fortitude to work this out. Even if I had the skills, I would not have the patience or time to bring this together, for what is, fortunately, a relatively infrequent event in my experience.
 
These command line commands do not change any setting established in the DSM GUI. It does change a setting temporarily in the NAS OS storage system component called MD (Multi-Disk) to provide more memory to cache more of the MD data stripes, resulting in higher efficiency by reducing redundant IO operations to process each MD data stripe. Depending on the number the drives, the data stripes can be too large to fit within the default MD stripe cache size. By increasing this by a relatively small margin vs total memory in the NAS, significant overall speed increases can be achieved. The NAS typically resets this value back to the default periodically, which is why the commands repeatedly set it to a larger size while the commands are running.

Another point to consider, these commands do not restrict the concurrent use of the NAS, other than creating more IO operations. Assuming the default behavior of favoring user responsiveness over rebuild time, the trade-off shouldn't change noticeably. I have tested the impacts of user interactions on rebuild time. What may seems like small user interactions can have a huge impact on rebuild times. I've seen where downloading a file at 50KB/s can reduce rebuild speeds to less than 10% of the undisturbed speed. Its for this reason, I try to plan for my rebuilds allotting time for dedicated operations.
 
Last edited:
Here is an unflattened version of the commands with comments so no ones has to feel they're using something blindly.

The main points of these commands:
  • loop through the MD data segments
  • extract and filter segment statistics from MD software component
  • only display updates when rebuild in progress
  • capture timestamps to display and precisely control/maintain progress update refresh interval (time doesn't drift)
  • increase the MD stripe cache memory size of currently rebuilding segment (to boost rebuild performance, the NAS will reset this)

Code:
t1=0                                                # initialize date/time counter
t=0                                                 # initialize elapsed time counter
while [[ 1 ]]; do                                   # create very long running loop
     for k in $(ls -d /sys/block/md*); do           # loop through all NAS MD segments
          mdnum="${k/*md}"                          # capture MD segment#
          stat=$(cat /proc/mdstat | sed -n "/^md$mdnum/,/^\s*$/p" | grep speed)  # query NAS for segment's stats
          if [[ $stat ]]; then                      # only if this segment currently recovering/reshaping ?
                                                    # also prevents displaying updates until rebuild in progress
               if [[ $t1 -eq 0 ]]; then             # the very first iteration ?
                    t1=$(date +%s)                  # capture the starting date/time
               fi
               df=$(date -d \@$t1 "+%F/%T")         # format the date/time
               h=$(($t/3600))                       # total elapsed hours (that doesn't wrap)
               ef=$h:$(date -d\@$t -u "+%M:%S")     # format the elapsed time
               echo "$df $ef md$mdnum:$stat"        # display the progress output line
               scs="$k/md/stripe_cache_size"        # shorten the MD stripe cache variable name
               if [ -e $scs ]; then                 # only if the NAS has a variable for this MD stripe cache ?
                    $(echo 16384 > $scs)            # set a larger memory case
               fi
               break                                # skip remaining MD segments since NAS only does 1 at a time
          fi
     done
     t2=$(date +%s)                                 # get current data/time
     sleep $((30-$(($t2-$t1))))                     # progress displayed every 30 secs, compute
                                                    # how much time remains & sleep that time
     let t=t+30                                     # advance elapse time by 30 secs display interval
     let t1=t1+30                                   # advance date/time by 30 secs display interval
done

If anything is unclear, feel free to ask for more info.
 
Last edited:
Previous update caused elapse time to wrap at 24 hrs. This update corrects that. (updated unflattened view above)

t1=0; t=0; while [[ 1 ]]; do for k in $(ls -d /sys/block/md*); do mdnum="${k/*md}"; stat=$(cat /proc/mdstat | sed -n "/^md$mdnum/,/^\s*$/p" | grep speed); if [[ $stat ]]; then if [[ $t1 -eq 0 ]]; then t1=$(date +%s); fi; df=$(date -d \@$t1 "+%F/%T"); h=$(($t/3600)); ef=$h:$(date -d\@$t -u "+%M:%S"); echo "$df $ef md$mdnum:$stat"; scs="$k/md/stripe_cache_size"; if [ -e $scs ]; then $(echo 16384 > $scs); fi; break; fi; done; t2=$(date +%s); sleep $((30-$(($t2-$t1)))); let t=t+30; let t1=t1+30; done

On a side note for those wondering, I'm in the middle of yet another rebuild, so this is when I typically review and find ways to update/enhance these commands.
 
@Robbie - These commands operate at the OS level and shouldn't affect the operation of other software running on the NAS. So DSM at any level should be unaffected. The commands only alter one Multi-Disk (MD) component setting, the raid stripe cache size and this gets automatically reset to the default value. The rest is simply data gathering, formatting, reporting and script reporting interval time management.
 
@PetaBytes the idea about the custom script is great. Thx for that.

except for the one part of the code:

In addition to reporting more detailed progress, this script also tweaks the MD# stripe_cache_size to 16384, which under all circumstance I've seen, has always improved the operational speed, sometimes quite significantly.

Background:
Stripe cache size is about the value of the cache responsible for completing a stripe of data, ready to be written to disk including parity data (RAID 4, 5 or 6). And here is the problem - the cache is responsible for a "handling" of unwritten parity data.
The size of the Stripe cache depends on additional parameters, e.g. a dirty bit cache size from memory. And here is an additional impact field: writeback cache, when used.

These values must be set by the RAID developer in strict dependency of the RAID architecture and known STRIPE WIDTH and STRPE DEPTH to keep a balanced environment for security/speed of write.
When Stripe Width is a number of drives in the operated RAID group.
When Stripe Depth is the size of the written Bytes performed by the controller (defined in FS setup).

So when your default Stripe Cache size is about 1kB and your default block size is 4kB and your Stripe Width is 5 (5 disk in RAID5), it means 1024 x 4096 x 5 = 20MB of the memory used ..... what could be simplified as 20 / 5 = 4MB per disk.
Some can think, that his single disk has 64MB cache and in this case, is possible to use up to 16,384 Bytes for the Stripe Cache increase. But this attitude is wrong. Because the full cache of the HDD 'will be used" just for the Stripe cache.

Synology uses 4kB default block size. You can check it by:
Bash:
sudo blockdev --getbsz </dev/sda1>
where </dev/sda1> is the variable defined by your choice (from the devices list).
Then you can use max. the possible value of the Stripe cache = 16,384 Bytes for the 4KB block size. Just theoretically.

Synology uses device-mapper based RAID which provides a bridge between DM and MD = it allows the MD RAID drivers to be accessed using a device-mapper interface. For ext4 FS it is easy.

The problem may occur when interfering with the custom-defined settings of BTRFS/device-mapper kind of RAID used by Synology - never described or never published - then no one except a few Synology developers has a clue about the strict dependencies and some customisation defined by users.

You can check current Stripe Cache size by:
Bash:
cat /sys/block/<mdx>/stripe_cache_size
where <mdx> is variable up to your choice md0, md1, ...

This can be a dangerous game, especially when it comes to affecting parity writing in such types of RAID, even during rebuilding service. You can't change just a single variable in this complex environment. Or yes, you can do it, to get more speed. Obviously yes. The question is if you will keep also the security purpose of the RAID. Only hard stress tests and long-term observation can approve the right value of Stripe cache in your setup. Don't use any "golden" values.
OFC you can also blindly hit the right value. What is OK from the performance achieved.

Think about it.

Maybe I'm wrong. Then we can discuss diff points. This content is interesting.
 
@PetaBytes the idea about the custom script is great. Thx for that.

except for the one part of the code:



Background:
Stripe cache size is about the value of the cache responsible for completing a stripe of data, ready to be written to disk including parity data (RAID 4, 5 or 6). And here is the problem - the cache is responsible for a "handling" of unwritten parity data.
The size of the Stripe cache depends on additional parameters, e.g. a dirty bit cache size from memory. And here is an additional impact field: writeback cache, when used.

These values must be set by the RAID developer in strict dependency of the RAID architecture and known STRIPE WIDTH and STRPE DEPTH to keep a balanced environment for security/speed of write.
When Stripe Width is a number of drives in the operated RAID group.
When Stripe Depth is the size of the written Bytes performed by the controller (defined in FS setup).

So when your default Stripe Cache size is about 1kB and your default block size is 4kB and your Stripe Width is 5 (5 disk in RAID5), it means 1024 x 4096 x 5 = 20MB of the memory used ..... what could be simplified as 20 / 5 = 4MB per disk.
Some can think, that his single disk has 64MB cache and in this case, is possible to use up to 16,384 Bytes for the Stripe Cache increase. But this attitude is wrong. Because the full cache of the HDD 'will be used" just for the Stripe cache.

Synology uses 4kB default block size. You can check it by:
Bash:
sudo blockdev --getbsz </dev/sda1>
where </dev/sda1> is the variable defined by your choice (from the devices list).
Then you can use max. the possible value of the Stripe cache = 16,384 Bytes for the 4KB block size. Just theoretically.

Synology uses device-mapper based RAID which provides a bridge between DM and MD = it allows the MD RAID drivers to be accessed using a device-mapper interface. For ext4 FS it is easy.

The problem may occur when interfering with the custom-defined settings of BTRFS/device-mapper kind of RAID used by Synology - never described or never published - then no one except a few Synology developers has a clue about the strict dependencies and some customisation defined by users.

You can check current Stripe Cache size by:
Bash:
cat /sys/block/<mdx>/stripe_cache_size
where <mdx> is variable up to your choice md0, md1, ...

This can be a dangerous game, especially when it comes to affecting parity writing in such types of RAID, even during rebuilding service. You can't change just a single variable in this complex environment. Or yes, you can do it, to get more speed. Obviously yes. The question is if you will keep also the security purpose of the RAID. Only hard stress tests and long-term observation can approve the right value of Stripe cache in your setup. Don't use any "golden" values.
OFC you can also blindly hit the right value. What is OK from the performance achieved.

Think about it.

Maybe I'm wrong. Then we can discuss diff points. This content is interesting.
Jeyare,
You bring up several good points, plus you introduce BTRFS in to the mix. I will tell you honestly that my 2 NAS have only used EXT4. I'm in the process of upgrading to DSM7, which may offer the option to BTRFS. I guess what I'm trying to say is that at this point, I have no ability to "try" BTRFS. All of my earlier descriptions were purely using EXT4. One thing I have found is that the cache_stripe_size does get automatically reset to a lower value, I assume a default value for that NAS. (Have no way to knowing if other NAS models used different values). The point is here is that the cache size is only altered while the script is running, then its reset back to normal. So the affects are only transient so the affects in other areas are also transient.

Now the idea that changing cache strip size may conflict with non-visible assumptions/variables/constants implemented by the coder is possible. But short of going in and analyzing the code on the NAS, there is no way to know. However, I would need to do more research to try to learn if internal code factors have been set by the developer and if those could be broken by temporarily changing the cache stripe size to potentially realize substantial reductions in rebuild times.

This script has worked flawless for me using EXT4 and has reduced the time to expand my 45 & 87 TB areas to ~48 & 42 respectively. Without the script these process would have easily double rebuild times. The scripts works well in my 2 environments. Others have reporting change this size and others and I've yet to hear of a data loss scenario. So if there are problems, I hope they get reported back and I will make every attempt to correct the issue.

Jayare, if you find a case where this causes problems, just let me know and I will work the issue.
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Similar threads

Hello, I’m looking for someone able to write script for Syno DSM having good knowledge of DSM systems...
Replies
0
Views
821
Saves the results of the successful processing of shared folder synchronization to a DB file through the...
Replies
0
Views
2,458
I find that the lowest brightness level is bright enough for me so my schedule doesn't change: 0700 LEDs...
Replies
6
Views
1,773
A simple trick to adjust the scheduled run time of a script is to use the unix sleep command as the top of...
Replies
0
Views
3,155
Thanks ! I overlooked the Triggered Task option
Replies
6
Views
5,513

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Back
Top