How I blew up my backup server (Valve pls fix)
It all started with me getting a Steam Deck.
Background
After getting familiar with the Steam Deck and how the Proton compatibility layer works, I decided to write a backup script that would back up everything in the home folder, excluding the Steam games themselves due to the sheer girth of modern games (how the hell has GTA V ballooned up to 100+GB???).
Among the folders that I backed up was compatdata
, a folder that contains files that Proton needs
to make Windows games run. If you browse the folder, the contents look like a mini-Windows installation,
and among those files you can also find your savegames. It made sense to me to back up this folder, and
I was quite happy to know that no matter what happens, my game saves would be safe as long as I made a copy
of this folder.
Fast forward a few weeks: I’m doing some changes to my self-hosting infrastructure and decide to redeploy changes to a backup server that I have. Nothing fancy, just Ansible roles that make sure that the backup server has some configuration present and that the backup folder has the right permissions.
The step looks something like this:
- name: Fix permissions
ansible.builtin.file:
name: /path/to/backups
state: directory
owner: user
group: user
recurse: yes
Everything was okay until I saw this.
After trying to recover the backup server and giving up as soon as I found that I cannot log in as root or
use sudo
, I gave up on doing that over the network and decided to get physical acccess to the server.
One reinstall and Ansible run later, the server is okay again.
Investigation
How did we end up here though?
The backups from the Steam Deck are made to my home server using rsync
. The Steam Deck is just a Linux
machine, after all, and it made the most sense to me as I could make a backup that preserves all the
permissions and links. Should I ever screw something up, I can run the same script in reverse and have everything working
as it used to. I would hate setting up all my games and customizations again.
The script looked something like this.
rsync -aAXzv --delete-before /home/deck/ backupuser@myserver:/path/to/steamdeck/ \
--exclude .local/share/Steam/steamapps/common \
--exclude .local/share/Steam/steamapps/downloading
sudo shutdown now
Free tech tip: you can create a desktop entry for any script, add it to Steam, and run it within the Steam Deck UI when you’re done playing. You’ll have a backup and the system will shut down automatically once it’s done!
Nothing suspicious about the script, right?
There’s one problem with it: rsync
also sends over links. These come in various types (symlinks, hard links)
and act as a pointer to another file or folder. Turns out that Proton (or Wine) loves using symlinks. Most
usages I saw were pointing towards common distributions of Proton, which makes perfect sense since it saves
disk space.
There are also links that point to the root folder /
. The use case for these seems also makes sense:
present the game with a drive like Z:
, and point it to /
on Linux side, and now the user can easily
navigate their whole file system within the context of the game they’re running. Maybe you want to install
the game or its add-ons into a different folder, or maybe you want to load a save game that you
have somewhere on your Steam Deck.
Just one problem with this: what happens when you have this symlink on another machine, such as my backup server,
and you use Ansible to set the permissions for a folder containing this symlink, recursively?
And what if the follow
setting in Ansible is on by default since Ansible 2.5?
Turns out that the answer is that you’re going to mess up the file permissions on the whole machine, and most things stop working at that point. I could still login to one user on that box, but I could not do anything that would help recover the state of the machine at that point.
The fix (?)
There are many options for avoiding this problem, or at least working around it. I’m not sure what the perfect fix looks like. In case you know one, let me know.
One thing I added to my backup script was --no-links
, which instructs rsync
to not copy symlinks. That setting
has to appear after the first batch of arguments (rsync -aAXzv
) because otherwise it will be overridden by the -a
parameter.
I also updated my Ansible setup to avoid setting the permissions for the Steam Deck backup folder as a precaution.
The major downside with my tweaks is that I don’t really have a “full” backup of my Steam Deck anymore.
I do have backups of my game saves, but recovering from the backup will become a bit of a hassle due to
me having to find and copy the game saves manually to the new compatdata
folders since the backed up
ones are missing all the symlinks to Proton/Wine-managed dependencies.
Alternatively I can consider using a backup tool like restic
that should
preserve symlinks. I’ll just
need to also test recovering from that backup method.
Either way I can’t really blame Valve, Wine or Proton developers for my own fuck-up. They used symlinks in a way that allowed them to save disk space and give the user easier access to their files in-game. It’s just unfortunate that I learned about this setup the hard way.
Completely unrelated to my issue, but I can’t help but remember that one person whose machine got wiped by Steam.
Subscribe to new posts via the RSS feed.
Not sure what RSS is, or how to get started? Check this guide!
You can reach me via e-mail or LinkedIn.
If you liked this post, consider sharing it!