Hyper Backup to USB - no deduplication?

Currently reading
Hyper Backup to USB - no deduplication?

29
8
NAS
DS1517+, DS1515+, RS815+, RS1221+
I'm doing a Hyper backup to USB drive (dest: ext4, compressed, encrypted, no backup rotation, 8TB drive) for offsite, about 5 TB data. When I do the initial backup, it takes a long time (expected). When I do the second backup on that same job, it again takes a long time, and then runs out of space. I've tried both "Local Folder & USB" and "Local Folder & USB (single version)", with same results. My only alternative is to blow the job away and re-create it every time. Labour intensive.

This puzzles me. There is not much change to the data between the two backups, so I would expect a similar result to another backup we do ("Remote NAS device", compressed, not encrypted, no rotation) to another ext4 NAS connected to the same LAN (in another building); different data set, but a lot of data. Initial backup, long time, as expected. Second and subsequent, very little time or growth in destination file, presumably due to de-dupe, as advertised.

Is there a setting I'm missing? Is this by design? Is there a way to do this so that it will de-dupe (e.g., put the USB drive on the remote NAS, so that it will de-dupe?)

Obviously for offsite backup purposes, the drive needs to be dismounted and removed.

Thanks in advance.
 
Don't use single version, to be able to have older versions of files.
You are missing something.
After the first run, each time you back up, only the new and the changed data will be send to the backup.
A silly question, you don't make a new backup task every time?

Dedup is a different thing from incremental backups, even if sometimes (like move of files) could help minimize the "new" data.
 
Thanks for your reply, kiriak.

Understood, I don't care about older versions. I only need one version on the offsite disk. I'm just saying that I've tried both types, with similar results, which infers that the "de-dup" that is supposed to kick in for the non-single-version doesn't seem to be making any difference, and I can't figure out why.

"After the first run, each time you back up, only the new and the changed data will be send to the backup" - exactly what I expect. Then why, when it seems there is not much data change on the source, is HB adding 50-90% of the original size on the incremental?

HB advertises " block-level incremental backup". The data I'm backing up to the USB drive is a VEEAM backup. The biggest file is approx. 4TB and the "updated" date changes nightly. For "file-level" incremental backup, this would mean that every time I do the backup, this would have to be re-written to the USB drive. It seems, at least from the evidence, that is what is happening, but it is not what I should expect from block-level incremental.
 
what is the initial data?
could VEEAM backup cause extensive block level changes?

from their site, among other it says

  • Avoid low-end NAS and appliances: If you have one unavoidably, use iSCSI instead of NAS protocols.
  • Generally avoid CIFS and SMB: If it is a share backed by an actual Windows Server, add that instead.
etc.

I
 
Thanks for the VEEAM info, kiriak, but VEEAM itself is working just fine with our DA1517+. No issues there.

My issue is with a subsequent HB backup of that VEEAM-generated data to a USB drive.

In theory, it is possible that VEEAM creates extensive block level changes, but that seems unlikely since a typical nightly VEEAM incremental takes very little time; if VEEAM was doing a lot of block-level changes to the file residing on the NAS, over the LAN, that would necessarily take a lot of time.
 
My issue is with a subsequent HB backup of that VEEAM-generated data to a USB drive.

In theory, it is possible that VEEAM creates extensive block level changes, but that seems unlikely since a typical nightly VEEAM incremental takes very little time; if VEEAM was doing a lot of block-level changes to the file residing on the NAS, over the LAN, that would necessarily take a lot of time.
follow your description:
- first backup instance is based on Veeam, target is NAS
- second backup instance is based on HB of the first instance, target is USB device connected to the NAS

when it’s correct status of your backup operation, then there is my conclusion:

- multiplication of backup operation by two different (block based) technologies must step to a performance failures or future Restoration issues. Take it seriously. It’s out of any pragmatic strategies.
- what about a standardization of your backup processes?
 
jayare thanks for your reply.

Your understanding is accurate, and I appreciate your critique of our backup operation. We do have several other levels of backup that I haven't mentioned, since they are not relevant to my question.

I expect any reputable backup tool, including HB, to make an accurate copy of whatever data it backs up, and incrementally, accurate also, regardless of the mechanism (block-level vs file-level) it uses. Backup systems have to be accurate and trustworthy, otherwise don't use them. Are you suggesting HB is not trustworthy?

FYI, the reason we are doing this, in addition to other backup levels, is that all of our other backup targets are online in some way. We recently talked with a company that were targeted by a very sophisticated crypto attack. Even though the company had multiple levels of backup, including cloud, the attackers gained access to the server itself, and since the cloud backup was online, the attackers were even able to wipe out the cloud backup, as well as encrypt all local online backups, leaving the company in a vulnerable position. One sure way to prevent this scenario is to have a recent offline copy. We chose to use several USB external hard drive copies of our VEEAM backup in rotation offsite, encrypted, of course, so that if the external driver were physically stolen the data would still be inaccessible. It seemed the HB solution was ideal.

My initial question re: apparent lack of de-dupe has not been answered, and maybe never will. I would like a solution to our dilemma. I'm continuing to do some testing to see if I can figure out a solution, but if there's a better way to solve the backup-taken-then-target-offline scenario, I'm all ears.
 
Last edited:
First question:
What is the source(s) for the current Veeam process? Please define specific points, for better understanding of the scope.

Second:
What is current policy for the backup:
- define for each sources or same groups
- periods, triggers if any

What are your primary drivers:
- time of restore (RTO)?
- speed of the backup?
- cost.?

Nothing is for free
and
Nothing is 100% safe, there are just few steps to be near of the 100%.

Last:
Re dedupe at Syno HB side. No one, except author of the algorithm, knows how it really works in HB. I don’t like white papers. What works, and it needs a time for a tune, is really block level backup in HB. Seen in my backup increments. Most of then are processed during night silence. Time isn’t my enemy in this period.
Second point of view is for AB4B, there is really calculated dedupe.
Then right tool for the right target is necessary.
-- post merged: --

Regarding attackers:
- no one can guarantee 100% safety
- but you can request a penetration test to find vulnerabilities of existing operation
- question is if you have such “valuable” data, when you use Syno for the covering of them. Because Syno hasn’t support for business users. Just independent forums like this one or German Syno forum can help you. What is really shame for such vendor, who likes create an enterprise level NASes with DSM system without standard entire System backup of the system itself.

Don’t take it personally, but sometimes I need to really understand background. I don’t like shoot into dark.
 
I appreciate the discussion, jeyare, friendly and only with a view to help.

For sure, we've hardened our security after learning how this other company was hacked. Our security was good, it's better now. Nevertheless, in the cat-and-mouse game of hacking-for-profit, there are increasing levels of sophistication, and a good backup strategy should assume the ability to recover after a successful crypto attack. The company that was hacked thought that cloud backup, since not on premises, would save them. Not so. Hackers can't encrypt or delete something they can't access, so, recent and rotating offline backups are our last line of defense that we never expect we'll need to use.

For our backup, our Syno NAS units are merely storage for VEEAM, except for the HB to USB drive as previously described. VEEAM uses one NAS (DS1517+) as the primary backup target, a second NAS (DS1513+) on the same LAN in another building for VEEAM Backup Copy, and a third fireproof NAS (ioSafe 214).

HB policy: Local Folder and USB > usbshare1 > [folders to back up] > compress backup data > enable client-side encryption > no schedule (manual run at present) > no integrity check (at present) > no backup rotation. USB drive is ext4.

I was hoping I would get some wisdom from the community who are farther down the Syno journey than me before needing to do deep testing. I've loved and used Syno for more than a decade, so not a newbie. Anyways, going deeper on this one: I'm in the middle of a multiple-day test right now. On a smaller subset of the Veeam data (approx 450GB), I've set up a HB exactly like to the USB drive, except to a remote NAS. After 2 runs, with an update of a 300GB Veeam backup file between runs, it is clear that block-level de-dupe is active. Minor increment in size between runs, fast second time. I'm now doing HB on the same subset to the USB drive to see if we get the same result. Is it possible that HB can't handle de-dupe block level on a 4TB file? Or is block-level de-dupe only active on remote NAS and not Local USB? We'll see, maybe. I will post results.
 
Last edited:
How is connected Veeam to the NASes?
I hope for NFS. Then you have connected Volume in NAS.
DSM operates connected USB/eSATA target as another Shared folder only. There isn’t same relationship known for Volume.
Seems to be we have the reason.
-- post merged: --

Btw, successful crypto attacks (ransomware) are based on lack of security policies:
- no Snapshots/Replica in backup scenarios
- a strong appetite to click to every single email
- real utilisation of working resources is low, then they have a time to visit everything on web
- lack of firewall security. I mean a deep packet inspections or WAN/Local/LAN in/out
- ...

Yes, must be a high blood pressure when you see, that your data are definitely lost. Just bad experiences can make us better. Some of us.
 
How is connected Veeam to the NASes?
I hope for NFS. Then you have connected Volume in NAS.
DSM operates connected USB/eSATA target as another Shared folder only. There isn’t same relationship known for Volume.
Seems to be we have the reason.
-- post merged: --

Btw, successful crypto attacks (ransomware) are based on lack of security policies:
- no Snapshots/Replica in backup scenarios
- a strong appetite to click to every single email
- real utilisation of working resources is low, then they have a time to visit everything on web
- lack of firewall security. I mean a deep packet inspections or WAN/Local/LAN in/out
- ...

Yes, must be a high blood pressure when you see, that your data are definitely lost. Just bad experiences can make us better. Some of us.
Veeam is connected to NAS via SMB only. Looked at NFS, but that was some time ago, and at the time, decision was that SMB was adequate. So it is a shared folder scenario.

Completely agree with you re: crypto attacks. But even big MSoft was recently hacked. Not crypto, but nevertheless... And if the hackers have access to all of the online resources in your environment that you do, they can encrypt or eliminate every snapshot/replica you have online. So my attitude is prepare so that it won't happen, but plan the backup strategy as if it will. A company's data is its lifeblood.
 
uff,
Veeam and SMB is not great idea
described also here:
try to find a time to test NFS

Back to block level storage, in Syno environment there is just one and only option: iSCSI.
What about this scenario:
- Veeam backup to iSCSI LUN. Faster from any mentioned and really pure block level operation.
- Backup of the LUN to external device (or another NAS).
 
@kiriak, Veeam is backing up VMware VMs.

@kiriak, the Veeam blog you reference is very informative, thank you. While it says "generally avoid" SMB, our Veeam/NAS backup strategy was set up long before this blog was published. I wish we had such good information when the strategy was set up. Now that the strategy is in place, it's obviously a bit of work and disruption to change it. It may not be ideal, but it ain't broke. And Veeam/backup/NAS is only one of many hats I wear here, so finding time is a challenge.

Nevertheless, I agree with your suggestions @kiriak and @jeyare re: NFS/iSCSI and appreciate them. In a few months we're going to be doing an upgrade to our network infrastructure, so that might be good timing for such a change. In the meantime, some testing to detail the strategy.

Your suggestion @jeyare for Veeam to iSCSI, and then NAS backup (HB, or USB Copy) of LUN to external looks like a promising solution to our requirement for having a recent offline backup.
 
Update on test results, confirming that de-dupe in fact works for both network and USB local HB backups with largest file 318GB. Test conditions: set up two HB instances exactly the same (compressed, encrypted, no rotation), approx. 418GB, 15 files, largest 318GB, one to USB external drive, one to a remote NAS. Between first and second run, a Veeam backup was run, modifying the largest file, but likely not changing many of the actual blocks in the file.

First run: Size of backup: 388GB (both). Time: network - 3 hours. USB 2.5 hours.
Second run: Size of backup: 418 GB (both). Time: network - 45 min USB 39 min.
Third run was similar results.

Contrast this to the problem that prompted this post thread: Backup to 7.22TB (8 nominal) USB external drive approx. 6.2 GB source data. First run completes in over 24 hours, approx. 5.2 TB on external disk. Second run does not complete, after running for approx. 24 hours runs out of space on the external disk.

Cause: HB can't de-dupe a 4 TB file? HB needs more than 2TB of cache space on the external drive to de-dupe the 4TB file? Unknown. I will be getting a larger external HD (12TB nominal, enough for a 5.2TB backup? Hope so!) and see what results.
 
sounds reasonable, DD needs cache for recalculation of the DD blocks = checksum of all incoming blocks from source. There is also necessary to keep a space for encryption cache, even compression.

From standard best practice (Syno doesn’t explain it for the HB) there is a general math:
DD cache (GB) = source (GB) / 1000
then you need for your 5200 GB just 5.2GB of the DD cache (just take it as general explanation, Syno algorithm may be different)
and
you need 2x of the DD cache size available in RAM = 10.4GB

My another concern is also about:
6.2TB source in the NAS (from Veeam) to 5.2TB in USB with transfer over 24hours. For the first round (NAS to USB) it’s pure sequential write. then 5.2TB / 24h = average 60MB/s. What is ok for a file level transfer, but really slow for the block level transfer.
Q:
- what is the NAS source setup (drives, RAID) ?
- what is the USB drive target setup (encapsulation vendor, drive) ?
- RAM in the NAS, available during the backup process and also utilization ?
- CPU utilization ?
 
Thanks @jeyare. I'm in the middle of another test with the large Hyper backup, this time to the larger USB HDD. I will provide details, including answers to your questions, when I have something to report there.

On the issue of "generally avoid SMB" for Veeam, I tested 3 identical Veeam backups of a small-ish VMWare VM with the backup repository connected to a test NAS (RS815+) via CIFS (shared drive), NFS 4.1 (Veeam 10's connector) and iSCSI (VMWare iSCSI connector, think provisioned on NAS, thick provisioned VMFS on VM). Veeam server is Windows 7 VMWare VM on HP server.

Network: dual Gbe on each of HP Server and NAS (bonded). So, essentially 2Gb bandwidth available. Standard MTU. While monitoring the bond on the NAS, I saw 1Gb exceeded occasionally, but usually hovering in that range.
NAS drives: 3X10TB Ironwolf, SHR RAID5 (yes, RAID5 bad tsk tsk, this is just for testing).
1st run is full, 2nd run is incremental. Size of the transfer to NAS: 40GB on first run, 1GB on second run. Time-to-complete gives a simple indication of relative performance.

CIFS: 1st run: 10:20 2nd run: 2:41
NFS: 1st run: 13:44 2nd run: 5:50
iSCSI: 1st run: 11:17 2nd run: 5:51

So in our environment, for this test, which I think is reasonably representative of our environment, CIFS actually performs best, not by an overwhelming margin, but certainly not the worst, which "generally avoid" might suggest we should see. Do you know if there's another reason, besides performance, why Veeam would recommend against CIFS? Would jumbo frames help NFS and iSCSI better than CIFS? At a quick glance, I think no.
 
Last edited:
you did excellent job
so magic of the network tune is based on try/error strategy. I have few sites in operation and everywhere I have different LAN setup (include packet value for a transfer). Jumbo frame is mandatory for the iSCSI. Check also by CLI.

You need to be 100% that you have E2E network topology prepared for exact defined packet size (1500-Jumbo). Otherwise, specially for iSCSI you will get slow performance. Make sure you have really fixed dynamic LACP.
-- post merged: --

so what I found now is really crazy for me. Now I’m speechless 🤭
This totally changes my points of view to next iSCSI Syno operation. WTF!
This is due fantastic strategy of single DSM for entire NAS range (from j class to ...).

it’s from Synology official support:

The luns which were directly on the storage pool were known as block level luns. After DSM 6.2, our developers removed the option to create block level luns. Once which previously existed will continue to work.

The block level luns did not perform as well as file level luns, so you should actually be better off with a file level lun. We would recommend making sure you are on the latest version of DSM, creating a btrfs volume, then create a thick provisioned file level lun with the default features chosen. We recommend not creating luns which consume more than about 90-95% of the volume to prevent the possibility of the volume filling to capacity, since the system does need some space on the volume for system files and general operation, and also it is possible for a lun to consume slightly more than the allocated amount due to any fragmentation.

You mentioned "it will need to be formatted ReFS but on the NAS the Volume that the LUN would be on is BTRFS and that (even if it would appear to work) seems like it'd be asking for trouble..."

This shouldn't be asking for any trouble. The NAS just obeys the commands from the iscsi initiator, it does not actually read the data or even know what filesystem is on it, or if there is a filesystem on it at all. So the choice of filesystem will make no difference on a block or file level lun.
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Similar threads

Unfortunately, HB does not support QC. It's either, a public IP, DDNS, or LAN IP.
Replies
1
Views
506
  • Question
That will work with 4.1 version. Looks like user feedback has been implemented.
Replies
4
Views
1,331
You are welcome! Glad that you got it eventually working. You can still create a Synology support ticket...
Replies
11
Views
2,443
Well in a way this is expected. Your DSM6 machine (host) is still on an older version and anything other...
Replies
1
Views
694
  • Question
A bit odd tbh. Not sure how the name could be a factor here as Synology doesn't have a list of all bucket...
Replies
4
Views
1,214
Not sure what to say... I have about 5-6 separate tasks towards C2 and not a single time on any of them...
Replies
3
Views
1,503
Sorry, I do not know how to write a script for this but I am confused and have a question. Maybe there is...
Replies
1
Views
1,027

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Trending threads

Back
Top