#Doing backups
Until recently, I haven't really done backups of my files. So far, I've been lucky as nothing has been lost. The SSD on my current laptop is behaving just fine. Yet, the laptop is soon turning 7 years old. Just like insurance, at some point in life you figure out you probably should have it. For me, that point is now. Doing backups, that is. I still don't have life insurance.
For Macbook people, this is a non-problem. Apple manages backup of your files. I'm sure Windows has similar offerings for the Windows people. Linux does not. I'm still waiting for Linus Torvalds to call me asking to back up my files. Until then, I'll manage backups on my own. It shouldn't be too hard. Essentially, I want to
Run backups at a regular cadence, e.g. daily.
Avoid data loss. The backup destination shouldn't be bricked.
Ensure nobody else has access to the backed up files.
#Running regularly
Fortunately, running a job regularly is easy in Linux. Using cron is one option. Systemd timers offer another. In the end I ended up with the latter option, simply because it's convenient to see status of the last run and its logs. I enabled the service by running:
systemctl --user enable --now backup.service
backup.timer
[Unit]
Description=Backup (daily)
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
RandomizedDelaySec=600
[Install]
WantedBy=timers.target
backup.service
[Unit]
Description=Backup (all jobs)
[Service]
Type=oneshot
ExecStart=%h/xdg_config/backup/backup.mjs
This runs a script called backup.mjs every night. The script loops through a folder
called jobs/. For example, I have jobs/ha.conf that configures how I want to back up my Home Assistant data. I have jobs/ha.exclude that contains a list of patterns that I want to ignore, like a .gitignore.
jobs/ha.conf
SOURCE_DIR="/home/mikael/src/service"
KEEP_DAILY=7
KEEP_WEEKLY=4
KEEP_MONTHLY=6
jobs/ha.exclude
home-assistant.log.*
.ha_run.lock
__pycache__/
# ... etc
#Destination unknown
There's a lot of flexibility in where backups go. At home, I have one laptop, and one Dell Wyse thin client running 24/7. I could write my backup files to both computers. If one gets bricked, the other is probably still alive, unless they both burn down in a house fire1. I could add more redundancy by writing to an external USB connected to one of the machines.
Alternatively, I could write backups to a cloud vendor. Lots of choices. Google Cloud Storage, AWS S3, Cloudflare R2 or any other big tech company. In that case, I can be pretty confident that the backups won't be lost. For example, Google Cloud Storage has at least 99.999999999% annual durability, which is just insane.
In the end, many options work and I ended up using Google Cloud Storage. It is cheap enough and convenient to set up. Assuming 1GB2 of backed up data with coldline storage, the annual price is 0.50 NOK. In other words, less than 10% of what it costs to buy a plastic bag at the grocery store in Norway.
#Storing securely
In principle I could use rsync or its cloud counterpart, rclone, as the backup tool. It handles file integrity, only uploads files that have changed and supports a large variety of cloud vendors. However, another solution which I ended up using is restic. It's an open-source tool developed by some smart German guys. In addition to the above, we get:
Encryption by default: all data is encrypted. Even if someone has access to the backup directory, they still need to decrypt to see its contents.
Snapshots and retention policies: Instead of having a single backup, we can have several backups over time. These are called snapshots. For example, to keep snapshots from the last 7 days, plus one per week for the last 4 weeks, you could run:
restic forget -r foo --prune --keep-daily 7 --keep-weekly 4. Setting up different retention policies for different files is easy. For example, I keep snapshots several months back for my Home Assistant instance. On the other hand, photos only have a few weeks of backup.Open source and good developer experience: The documentation and tooling is pretty good. Gotta appreciate that. There's a single binary with good CLI and a nice API.
Everything is encrypted by default. The master password is written in a file. The file is backed up in a password manager. I chose bitwarden. They have the encryption key. Google has the encrypted files. Should be pretty safe.
#Device synchronization
The good thing about running a backup solution on your own is that it's pretty straightforward to extend it. Another software I've been using is syncthing. It lets me synchronize folders between devices. I cannot recommend it enough. I synchronize personal notes, meeting notes and scripts between several devices. I can attend a meeting with my work laptop, and immediately after be able to see the notes on my stationary work computer. The configuration is managed by a web interface accessible on localhost:8384. Synchronization does not happen via a central server; it's peer-to-peer, which is pretty awesome!
I synchronize meeting notes and scripts from my work computer(s) to my private laptop, and from there I run backup to Google Cloud Storage. Likewise, I synchronize photos from my phone back to the laptop. My Home Assistant configuration is available on my laptop, even though it's running on my home server. Everything synchronizes with each other, like a Mexican standoff.
It's still early days. Hopefully I won't need to care about using the backups for a few more years3. Hopefully I can celebrate the 10-year birthday of my SSD, especially considering getting a replacement is really expensive now. But using restic for backups and syncthing to "gather" files seems to be a great combination.
Comments