I finally bothered to create an automated backup workflow for my critical data.7 min read

I’ve played fast and loose with my data for a while now, never really concerned with the possibility of losing something. My data was stored in Google Drive for a long time and then moved to OneDrive when I returned to school for a post-graduate degree. The Microsoft Office Suite of tools integrated better with tools like Grammarly and Zotero.

I’d dabbled in setting up a NAS for years, but it was never a serious project after wrecking my wife’s final project for her B.A. degree. I ran an OpenMediaVault 3 instance several years ago on a Raspberry Pi 3 B+ with some external drives. I had her laptop (and all the other computers) using a local file redirection to store everything on the NAS. But I screwed something up, and she lost all her files.

After that incident, I shifted my focus away from trying to run a NAS and relied almost entirely on 3rd party services to handle data storage. I mainly used two services: OneDrive for general storage and Google Photos for photo and video storage.

Over the last year, I’ve become increasingly concerned with the resiliency of my data. Storing all of my photos and videos in one location, especially one I didn’t control, presented a significant risk to my data. Google is a monolithic corporation, and trying to get ahold of an actual human is nearly impossible. Google could ban my account for no reason, and I would have almost no recourse but to plead my case to every customer service agent I could and hope a higher-up would listen.

I had 200K photos/videos totaling 1.7TBs of data in Google Photos. Those photos dated back to 1992 and, most importantly, included every photo and video of my kids since they were born ~10 years ago.

Needless to say, these photos and videos were essential data that I wanted to protect at all costs. In addition to photos and videos, I had various important documents spread out among different services. Important court documents, my daughter’s adoption paperwork, tax documents, my military paperwork, school work, certifications, etc. These files also represented essential data I wanted to secure.

Requirements

Like all of these projects I undertake, it’s essential to understand the requirements used to define success. This project also had an extra dynamic I needed to account for user flow. Most of my projects are relatively seamless to the end users (my wife and kids); I do the rearchitecting in the background and leave the things they directly interact with alone. They don’t have any direct inputs to the system; however, with this project, I need to account for their inputs seamlessly.

To put it simply, in the past, when I ran my own NAS and used folder redirection, my wife was frequently irritated that whenever she took her laptop away from the house to Starbucks, she couldn’t access her files. I tried to fix that with a VPN, but she would complain that the scripts were breaking her computer (they weren’t, but that’s a discussion for later). So, I needed a seamless system for her while simultaneously backing up everything.

User Stories

I’m taking a page from Agile methodology and creating user personas and stories for this requirement. First up, I’ll make a persona of my wife. She’s a hard worker who isn’t tech-savvy. She mostly does light web browsing and some office work. The primary tools she uses are the Microsoft Office suite, Adobe Acrobat, and some Clipchamp video editing. She wants to log in to her laptop from anywhere, open a Word document, and write. She does not want to figure out how to save files to a shared drive, use a VPN, or download/save files to a remote website. As a user, she wants to be able to open her laptop wherever she is, work on some files, and have them securely stored without any effort.

User Story 2

As a user who values my photos as critical data, I never want to lose them. I want a seamless system that can back up all my photos while keeping them accessible. I want those photos to survive a system failure, phone loss, and account ban.

Requirements

Based on that story and persona, I created a few requirements;

  • There should be no additional software installed on the system.
  • Users should not have to manually configure anything, such as mounting a shared drive.
  • Users should not have to enable or disable anything like a VPN.
  • Users should not have to learn new habits like saving files in a particular location (such as a shared drive)

Tools

To meet those requirements, I must also look at what tools I have available.

  • OneDrive Family Accounts (1TB per person)
    • Direct integration with Windows file system
  • A Synology RS812 with 4x4TB HDDs in a Synology Hybrid RAID.
    • The RS812 is an older Synology model that won’t get updated to DSM 7+.
      • This means that I can’t use tools like Synology Drive, Synology Photos, or others.
  • Unlimited Google photo storage thanks to a T-Mobile promotion
    • This also includes 2TBs of family storage (a shared 2TB pool)
  • My VM farm (discussed in various other posts)

Solution

This diagram showcases the workflow I created to get after these requirements; let’s walk through it.

Starting with the computers on the left these are the devices my family and I use daily for work and school. OneDrive is installed by default on each of these devices, and I have it configured to automatically back up the Documents, Pictures, Videos, Music, and Desktop. From there, Synology Cloud Sync copies all OneDrive files to shared folders in the SHR1 pool. Afterward, it encrypts those files and backs them up to Google Drive. Additionally, if documents are stored in specific folders I’ve marked critical, the RS812 uses hyperbackup to back up the files to a Backblaze B2 bucket. Finally, I have a 3TB WD passport that stores copies additional copies of those files offline.

From a user’s perspective, all of this happens in the background. This process kicks in copying, updating, and backing up files as soon as they save a file. While this process works for files, it does not meet the second user story of storing photos while still being accessible.

For that story, I’ll use my VM farm to load an Immich server. Immich is about the closest we have to a proper Google Photos alternative, and it’s what I’ll use for the story. In the Immich env file, I can specify an upload directory for where Immich stores photos. I created a shared directory on Synology and mounted that directory on the Immich server, then set that directory as the upload directory in the env file. Finally, I configured the phones in my house to upload images to Immich when on Wi-Fi. Those images then get backed up to a different Backblaze B2 bucket, where they sit until I hopefully never have to restore them.

That covers new photos, but what about existing pictures on Google Photos and the creations Google Photos creates? I set up a reoccurring Google Takeout process to transfer my photos to my OneDrive account, where another cloud sync process copies that directory to the photo directory where Immich stores pictures so they get deduplicated.

I effectively back up my critical data between these processes while following the 3-2-1 strategy. Three copies of data, with two alternate methods and at least one offsite. The primary data is on the computer’s HDD. OneDrive, Synology, Google Drive, and B2 all represent alternate methods, with OneDrive, Google Drive, and B2 all meeting the “offsite” requirement. Additionally, the process is seamless for the end user, who doesn’t have to do anything except open Word and start typing.

Is it overly convoluted? Probably. Do I feel better knowing that pictures of my kids are safe? Absolutely.

WordPress Appliance - Powered by TurnKey Linux