Consideration: solution for unattended backup of docker environment

Currently reading
Consideration: solution for unattended backup of docker environment

2,486
840
NAS
Synology, TrueNAS
Operating system
  1. Linux
  2. Windows
Expectations: Disaster recovery covered

1. Backup and Restore the entire Docker node from a single host to another (just necessary data).

2. For containers created from:
- Syno shell or docker pckg
- Portainer/stacks as docker-compose

3. Syno docker environment is stored here:
- images: /volume1/@docker/image
- volumes: /volume1/@docker/volumes
- networks: /volume1/@docker/network
- containers: /volume1/@docker/containers

4. There are also mounted volumes, up to your architecture, e.g. /volume1/docker/path to mounted volumes. This is the simplest part of the problem. I don't need a discussion about it.

-----------------
Consideration:

A. You can't use Synology Snapshots pckg for these directories (syno ./@docker)

B. rsync of them over SSH by defined and scheduled task is the simplest method (archive which preserves all timestamps and permission, compressed, human-readable, recursive)
Bash:
rsync -azher
useful for backup (e.g. x per day) also for the restore (in case of the disaster)

C. You can also use create a Backup of settings from the Portainer, but it will save mainly your Compose files, PortainerDB, certificates used and setup of connected Docker nodes. When you have a backup of the entire Portainer container (rsync), you have it all in your pocket.

So, here is also open discussion about:
- why to backup Images, when you can use pull from the repo? So, the answer is in the problem between versions (e.g. InfluxDB)
- when container by container backup/restore is chosen and not entire docker environment, then there is a problem, how to create script for an exact image+volume+mapped volume backup to single file, e.g. TAR.

My way - Backup of the defined container:
First - docker export doesn't work, because following the docker documentation:
docker export does not export the contents of volumes associated with the container
So, the 'docker save' is the better way. Tested a few diff ways and this is a merge of them:
1. You need to stop the container:
Bash:
docker stop <container name or ID>
2. you need to create a new image from the existing:
Code:
docker commit <container name> <new container name>
3. Then you need to save the new image file somewhere, better in TAR:
Code:
docker save -o <new image name.tar> <new container name>

and here is the hardest point where I don't have a useful solution till now: how to keep mapped volumes with the container IDs?

4. Both files (images.tar and volumes.tar) need to be transferred to the backup environment.

Done. tested only with the image. Even better, because you can use this backup for some migration purposes (from source host to another host):
docker load -i <new image name.tar>
then you can create a new container.
 
The approach seems reasonable if you depend on the persistent data that is written into the copy-on-write persistance layer of the container. Typicaly it's sign of missing volume mappings, if you depend on that data. Commit creates a new image based on the current container state, including the original image layers and a new image layer with a copy of the copy-on-write layer of the container.

In an ideal word, where persistant data is stored in volumes outside the container, it is sufficient to use the image tag you used to create the container + the data of the volumes.

Instead of saving the image (image manifest + all image layers referenced in it), you could run a private pull-through container image registry - this way you would always have a copy of the images available and could avoid the `docker save` action all along.

N.B.: If wanted a list of tags can be exported at once to leverage layer de-depulication (if possible). Another thing is that you can pipe the output of docker save to gzip and end up with a way smaller tar.gz instead of a .tar.


If you are looking for an approach to backup volumes (!=the binds that Synology uses as volumes), then this script might be helpfull. It is a clean an generic way to archive content of whatever type of named docker volume:

Bash:
#!/bin/bash -eu
backup_date=$(date +"%Y%m%d")
docker_volumes=$(docker volume ls --format {{.Name}})
volume_found=false

if [ $# -eq 0 ]; then
  echo "please provide a volume name or all as argument "
  for volume in ${docker_volumes}; do
    echo "${volume}"
  done
  exit 1
fi

for volume in ${docker_volumes}; do
  if [ "${1}_x" == "${volume}_x" ] || [ "${1}_x" == "all_x" ]; then
    volume_found=true
    docker run --tty --rm --interactive --volume ${volume}:/source --volume ${PWD}:/backup  alpine tar czvf /backup/${volume}_${backup_date}.tar.gz -C /source tar czvf .
  fi
done

if [ "${volume_found}" == "false" ];then
  echo "volume does not exist: ${1}"
  echo "existing volumes: "
  for volume in ${docker_volumes}; do
    volume=${volume#*/}
    echo "${volume}"
  done
fi

You just pass over the volume name if you want a specfic volume to be achived or you use all to archive all volumes. Of course the container needs to be stopped before using the script.

If you are looking for bind mount instead, you could leverage the output of docker container inspect ${container name} --format '{{json .Mounts }}' | jq to fetch the details you need for your actions. It should return details for binds and "real" volumes.
 
this is the missing part in my research! Thx

Btw: re Migration (one of the options)
I opened a ticket in Portainer Git for a bug in Portainer CE:
- you can Migrate the entire stack chosen between two hosts (docker nodes) managed by the same Portainer. Works as expected. ..... in case of migration between two same vendor NASes or diff. one (tested between Syno to TN Scale, great).
- you can't Duplicate some of them. The Duplicate button is "shaded" and an Error message is there: "this container name is already used by another container running in this environment: <container name>". What is a bug.
Now in the test in Portainer for finding a reason.
 
A consideration:
1. stop all running containers (in the specified time)
Bash:
docker kill $(docker ps -q)
it needs a minute (up to amount of the running containers)

2. entire docker node backup: (you can detach verbose option for a scheduled task)
Bash:
sudo rsync -aXv --ignore-existing '/volume1/@docker/' '/volume3/dockernode-backup/rsync/'
-preserve all necessary details
- in my case within the same NAS but into diff volume
- yes we can discuss why also include the 'mess' like
1639487065569.png

3. Start of the stopped containers (based on restart policy)

4. snapshot of the '/volume3/dockernode-backup/rsync/' to diff volume within same NAS (an advantage of multivolume NASes)

5. hyper backup of the '/volume3/dockernode-backup/rsync/' to (another) 'backup NAS' host

Pros:
- in the case of possible HDD block error you can replace damaged file/s; ASAP
- in the case of troubles with RAID rebuild
- the snapshot is an insurance of the time consumed by step no. 5
- there is an entire node backup
- useful for really big disasters within the original volume (volume1/@docker)

Cons:
- initial sync takes a time
- can't get a time consumption or a dashboard of finished rsync

You can check rsync log >>>> rsync.error .... in case of troubles

PS: all of the tasks are automatised/triggered by Syno Task Scheduler.

Your point?

The next stage is a hardening of container DBs automatized backup, like this one (one example for MySQL):
Bash:
docker exec CONTAINER /usr/bin/mysqldump -u root --password=root DATABASE > backup$(date +"%Y%m%d_%H%M%S").sql
 
Last edited:
It kind of boils down to the implementation specifics.

to 1) Preparation: stop containers

Necessary step for a consistent backup. Though. please avoid to stop containers with `docker kill`, it realy does what it says: as it kills the containers without any grace period. A `docker stop` will wait the grace-period and then apply a kill for containers that passed the grace period. I would not risk inconsistancy or corruption to safe some seconds.... It is like pulling the power plug on your nas to shut it down :)

to 2) Perform backup

The backup strategy depends on your deployment/provisioning/operation strategie. Your selected strategy will allow an easy backup but might be harder to restore.

I would stay away from backing up docker's data root folder if possible. The only situation I encountered so far is if you have a swarm cluster and can not afford to loose cluster configuration. I strongly recommend to rely on compose files and retaining images in a private registry instead - this approach will require a fully scripted deployment, but will make restorations way simpler, as you only depend on your scripts and the restoration of the volume's content. Ofc the deployment scripts have to be stored in git and backuped as well.

I do miss the backup of your binds/volumes,. But I assume you perform the backup, as you need it for all scenarios.

N.B.: If you are interested I can share the code base of my make file based deployment - that is if you are realy interessted as it takes time on my part to anonymize it for you. My deployment works like a watering can: it has a Makefile in the deployment root folder, which triggers a makefile in a subfolder, which either performs the action or triggers another makefile in a subfolder and so on. This allows to structure the project as needed and encapsulate dependencies in directory structures. The beauty is, you can mix docker-compose and all different set of binary commands (kubectl, helm,...) as it pleases you. Drawback: you need to compile make for Synology (been there done that, not so hard)

Step 3,4 and 5 make sense in every szenario.

The main question boils down to: "do you realy want to persist the metadata configuration of docker itself?" I feel the solution is more error prone and will require a "full restore" of a cluster and state -> kind of feels like a backup monolith?
 
thx OEK

when some thinking about 'stop' and writes 'kill' :rolleyes:

Pull out the disc from the NAS. Connect the disk to the proper SW, find the damaged sector, read, what files are affected (incl. path), marking the sector as a bad block. Manual replacement of the damaged file/files within the damaged sector, then putting the disk into NAS, scrubbing and done. Fast and useful for case, when the disk is damaged in the few blocks only. In other cases you can get a time purchase new one and create RAID rebuild when it is necessary (in serious FS damage it is better to switch off the NAS till the new disk will arrive).
OFC, opened for an innovations :sneaky:

Back to the backup purpose or cover predefined incidents (everyone has a different scenario of the repairing) :

1. to get a useful backup for the entire docker node restoration or just of a single container damage in case of the fatal damage.
A. there is necessary to define what data I can recreate from the new Installation of Docker pckg - no need for backup. There is just a single idea or reason why to backup all docker root data - the simplicity of the backup task + small amount of such data for the backup.
As was written in my initial post, all /volume1/docker/ enviro needs to be under command of a sufficient backup scheme (up to your needs). No need to explain here.
1639487065569-png.5015

Following the screening of the /volume1/@docker root directory I have this evaluation:
22GB in 675k files within 106k folders
AVG rsync speed 82MB/s between two volumes of the same NAS (what is really nice performance for such bunch of small data).
The main data part is located in:
- 86% of the data (from data volume point of view) in btrfs/aufs directory (up to your filesystem for the docker). And 97% from the files amount point of view.
- 3% in containers
- volumes contain just peanut data space because 98% of the containers data are mounted (then backed by diff methods).
So when I will create the backup of the entire @docker root directory - it can't kill my backup environment. And it will not affect the backup speed.


2. to prevent btrfs write hole event, boosted with btrfs bad blocks relocation Achilles’ tendon (during scrubbing). There is a simple workaround (for skilled)


3. same for the ext4 FS as mentioned in point no.2, but easier from the repair point of view.


For this case I have (rsynced)snapshots from DSM /var/log outside the NAS to get all possible logs - understand the reason of the problems, include predefined PowerQueries to ASAP data interpretation:
1639556258938.png
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Back
Top