The Google Drive S3 Round-About

FILE 0xA4·THE GOOGLE DRIVE S3 ROUND-ABOUT

May 25, 2026 · ~14 min read · homelab, backup, rclone, FastAPI

I have a Synology NAS with about 27 TB of data on it — family photos, my Plex library, code, archived business files, and a growing pile of audio masters. The library is going to roughly double in the next year. Off-site backup has been an open problem for me for the better part of a year, and the path I ended up on is weird enough that I want to write it down end-to-end, with enough detail that you could rebuild it.

The short version: HyperBackup talks S3 to a 350-line Python proxy I wrote, which fans every write across five Google Drive accounts using rclone. It looks like AWS to the NAS and looks like five separate humans to Google. In the middle, it costs me $0 in storage and pushes ~3.75 TB/day of new data into the cloud.

Here's how I got there.

The problem

I have a 1 Gbps symmetric fiber line and 27 TB to ship. At line rate that's roughly 2.5 days. In practice I never came close.

I tried, in order:

Synology Cloud Sync to Google Drive. Got it pointed at a dedicated backup@ Google account on my Workspace tenant. After about three weeks it was sitting on ~400,000 files perpetually "syncing." Throughput hovered between 1–3 MB/s. Restarting Cloud Sync would briefly burst, then settle back to that floor.
HyperBackup to Google Drive. Same target, different tool. Same outcome. HyperBackup's chunking is smarter than Cloud Sync's, but it still ground to a halt around the 800 GB mark and never recovered.
HyperBackup to Backblaze B2. Worked perfectly. Also would have cost me about $135/month at 27 TB, and more like $250/month once I doubled. Not interested.
Restic + Backrest + Google Drive via rclone. Better than HyperBackup direct, but still capped around 2–3 MB/s sustained on the same Drive backend.

I kept blaming my network. It turned out to be Google's API.

The actual bottleneck

Once I started tailing logs on the Synology and on a Proxmox host running rclone with --rc, the picture cleared up. All three tools were doing the same thing: uploading millions of tiny chunks one at a time against the Drive API. Each chunk is one or two HTTP requests. Each request counts against the per-user-per-100-seconds API quota.

The relevant Drive quotas, none of which Google publishes prominently:

~750 GB/day of writes per user
~10,000 API requests per 100 seconds per user (so ~100/sec, but bursty)
A separate, lower per-IP quota that shows up if you do anything truly aggressive

The throttle I was hitting was the requests-per-100-seconds one. With chunks running 4–16 MB and a steady stream of metadata calls (list, mkdir, stat…), my effective ceiling sat right around 2–3 MB/s sustained — exactly what I was seeing. Pushing harder triggered 403s with userRateLimitExceeded or rateLimitExceeded, and the tools would back off into the floor.

There's a second multiplier I missed for weeks: rclone with its default config shares a global OAuth client (202264815644.apps.googleusercontent.com) with every other rclone user on the planet. That global client gets its own rate-limit pool, which is permanently saturated. The fix is to register your own OAuth client in a GCP project and pass it via --drive-client-id / --drive-client-secret. This alone roughly 5x'd my sustained rate, and is non-optional for anything serious.

Two design choices that fall out

Once I understood the throttle, two things became obvious:

Bundle aggressively before upload. One 25 GB blob hits the API a few hundred times during a resumable upload. The same 25 GB as 6,250 × 4 MB files hits the API thousands of times. Bundling small files into bigger objects is the difference between "works" and "doesn't."
Use multiple accounts in parallel. Each Drive user gets its own 750 GB/day budget. Five users = 3.75 TB/day. Ten = 7.5 TB/day. The quota is per user, not per app, not per IP. Round-robin across N accounts and you scale linearly until you saturate your uplink.

HyperBackup understands point 1 natively — it produces ~50 MB pack files by default. It does not understand point 2 at all; it knows about one Google account.

So either I patch HyperBackup (no), or I put something in front of Google Drive that looks like a single bucket to HyperBackup and silently fans the writes across N accounts.

The dead end: writing a restic clone

My first attempt at "something in front of Google Drive" was a custom backup engine. I called it Cass Vault, started it on May 18, and killed it on May 20. About 2,000 lines of Python — FastAPI service, SQLite schema for accounts/chunks/packs/snapshots/jobs, an APScheduler-driven worker that walked the source tree, content-addressed chunks via blake3, packed them into 25 GB pack files with zstd-1 compression, encrypted with age, and uploaded via rclone to a pool of Drive accounts.

It worked. It was also a restic reimplementation, and I was writing the bug list as fast as I was writing the features. SQLite contention under five concurrent jobs. OOM on the LXC because three jobs × eight packs × 64 MB buffers blew through the container's memory limit. Edge cases around resumable uploads that I'd already shipped in production code at $work and didn't want to debug again at home.

The unlock came when I realized I was solving the wrong problem. HyperBackup already does the bundle-and-content-address-and-encrypt dance. It does it well. It produces files that look like Pegasus_1/Pool/4/12/327.index.2 — binary blobs, content-addressed, 5–50 MB each, that it knows how to talk S3 to. I didn't need a new backup engine. I needed a fake S3 endpoint.

The new shape: S3 in, Drive out

The replacement is called cass-s3. It's a single FastAPI process that speaks the subset of the S3 API that HyperBackup actually uses, persists metadata in Postgres, stages object bodies on local disk, and uploads each finished part to one of five Google Drive accounts via rclone. To HyperBackup it's just an S3 bucket; to Google it's five unrelated humans doing modest backups.

          Synology (HyperBackup)
                 |
                 |  S3 PUT/GET/etc., HTTPS
                 v
       https://vault.cwfrazier.com  (nginx, Let's Encrypt)
                 |
                 v
        cass-s3  (FastAPI on CT 200, :9000)
        +--------+--------+
        |                 |
   Postgres          local disk (staging)
   (object index)         |
                          v
                  rclone (one process per account)
                  +----+----+----+----+----+
                  |    |    |    |    |
              backup@1 ... backup@5  (Google Drive)

The components, with enough detail to rebuild:

Hardware / hosts

Synology DSM on a Pegasus chassis (192.168.1.2). Source data lives in /volume1/*. HyperBackup is the only thing on it that knows about cass-s3.
Proxmox host (192.168.1.14). Runs the cass-s3 service in LXC CT 200, bound to the host's IP on port 9000. Also runs nginx + Let's Encrypt for the HTTPS frontdoor.
1 Gbps symmetric fiber. Real-world peak so far is ~600 Mbps sustained outbound across five accounts.

The S3 surface

HyperBackup uses very little of the S3 API. The full set cass-s3 had to implement to keep HyperBackup happy:

ListBuckets, HeadBucket, CreateBucket, GetBucketLocation
ListObjectsV2 (with prefix + delimiter)
HeadObject, GetObject (including Range)
PutObject (single-shot)
CreateMultipartUpload, UploadPart, CompleteMultipartUpload, AbortMultipartUpload
DeleteObject (HyperBackup uses this when pruning old generations)

That's it. No ACLs, no versioning, no policies, no tagging, no SSE-KMS. AWS Signature V4 verification is implemented but loose — the access key/secret are static and live in cass-s3's config and in HyperBackup. There is no IAM.

Virtual-host-style addressing (e.g. https://pegasus-prod.vault.cwfrazier.com/) is supported via a Route 53 wildcard *.vault.cwfrazier.com pointed at the proxy and a DNS-01 wildcard cert from Let's Encrypt. Both path-style and virtual-host-style work; HyperBackup happens to use path-style.

The metadata model

One Postgres table per concept, all in one DB called cass_s3:

buckets — name, created_at, region (always "us-east-1" because HyperBackup doesn't care).
objects — (bucket, key) primary key, size, etag, content_type, account_id (which Drive account holds the final part), drive_file_id, uploaded_at, deleted_at (soft delete for the prune window).
multipart_uploads — upload_id, bucket, key, started_at, completed_at, account_id (which account this upload's parts are staging toward).
parts — (upload_id, part_number) PK, etag, size, local_path while staging.
accounts — (id, email, oauth_tokens, daily_bytes_uploaded, daily_window_started_at, enabled). The scheduler reads this to decide where to send the next upload.

One important design choice: a multipart upload picks its account at CreateMultipartUpload time and sticks with it until CompleteMultipartUpload. You can't fan parts of the same object across accounts, because the only way to assemble them on the Drive side is to upload the final concatenated blob to one account.

The upload path

For a single-shot PutObject:

Stream the body to a temp file under /opt/cass-s3/data/parts/<uuid>.bin. (Originally this was on the LXC's root filesystem. That bit me — see the incident below.)
Pick the next available account from accounts — least-recently-used among accounts whose daily-bytes counter is below 700 GB. Drive's hard cap is 750 GB but I keep a 50 GB safety margin.
Shell out to rclone copyto /opt/cass-s3/data/parts/<uuid>.bin gdrive-<account>:cass-s3/<bucket>/<sha256-of-key>.bin --drive-chunk-size=64M --transfers=1 --tpslimit=50.
Read back the Drive file ID from rclone's JSON output and write the row into objects.
Delete the staging file.
Return 200 with the etag (MD5 of the body) to HyperBackup.

For multipart:

CreateMultipartUpload allocates an upload_id (UUID), picks an account, returns the ID.
Each UploadPart streams to /opt/cass-s3/data/parts/<upload_id>/<part_number>.bin and records the row.
CompleteMultipartUpload concatenates the parts in order to a single file (cheap: it's a streaming cat on the same disk), rclones that single file to the chosen account, writes the objects row, deletes the staging directory.

Two things matter about this:

Concatenation, not stitching. S3 multipart lets the client concatenate parts on the server side. Google Drive has no equivalent. So I concatenate locally and upload one blob. This means a multipart upload temporarily needs 2 × object_size on disk: the parts plus the concatenated file. For HyperBackup, which produces ~50 MB pack files, that's nothing. For a giant ad-hoc aws s3 cp of a 50 GB file, it matters — budget your staging volume accordingly.

One account per object, not per part. If a part fails to upload to the chosen account, the entire CompleteMultipartUpload fails and HyperBackup retries from CreateMultipartUpload. There is no recover-by-switching-accounts. This was a deliberate trade for simplicity.

The download path

Symmetric. GetObject looks up objects, finds the account and Drive file ID, streams via rclone cat back to the HTTP response. Range requests use rclone cat --offset --count. There is no caching layer — reads are rare (HyperBackup mostly only restores during disaster recovery), so cold-cache latency is fine.

Account onboarding

Each Drive account gets its own OAuth grant, but they all share one OAuth client — a Web OAuth client I created in a dedicated GCP project (applied-pipe-496717-a8) with the redirect URI https://vault.cwfrazier.com/api/oauth/callback and the drive + email scopes. The shared client matters because it dodges the global rclone client's rate-limit pool entirely.

The dashboard at https://vault.cwfrazier.com/_dashboard has an "Add account" button. The flow:

Click the button. Browser goes to /api/oauth/start, which redirects to Google's consent screen with access_type=offline and prompt=consent (to force a refresh token).
User picks a Google account and consents.
Google redirects back to /api/oauth/callback?code=....
cass-s3 exchanges the code for a refresh token, fetches the user's email, and writes the row into accounts.
A new [gdrive-<email>] section gets written to /root/.config/rclone/rclone.conf with that account's refresh token and the shared client ID/secret.

I currently have five accounts authed. Adding accounts 6–10 is a UI click. Past about 10, my uplink saturates before I can take advantage of more quota.

The dashboard

A small Svelte SPA at vault.cwfrazier.com/_dashboard that polls the API. Browser requests (Accept: text/html) get redirected to the dashboard automatically; S3 clients get their normal XML responses. Panels:

Pool quota meter (100 TiB cap, current used/free)
Per-account daily bytes uploaded + reset countdown
In-flight uploads (count + bytes)
Recent objects and recent multipart uploads

HyperBackup config

The Synology end is straightforward once cass-s3 is up. Create a new HyperBackup task, type "S3 Storage," then:

S3 Server: Custom Server URL
Server address: vault.cwfrazier.com (HTTPS, port 443)
Signature version: v4
Request style: Path
Region: us-east-1 (ignored by cass-s3, but HyperBackup wants something)
Access Key / Secret Key: from cass-s3 config — in my case, access key SeEd-PGotHTwJLpLMDykpQ; secret lives in the cass-s3 .env and Bitwarden.
Bucket name: anything not in use, e.g. pegasus-prod. HyperBackup will CreateBucket it for you on first run.
Client-side encryption: on. Even though Drive and the transit layer are encrypted, I don't want Google to have the plaintext keys, and HyperBackup's encryption is cheap.

Tune the task's "Maximum number of concurrent backup tasks" up — I run four. Anything higher and the Synology's CPU becomes the bottleneck before the network does.

An incident worth flagging

May 21, 01:14 CDT. The root filesystem of CT 200 filled to 100%. 864 multipart parts × 64 MB each = 54 GB of staging files sitting on a 94 GB root volume. Postgres crashed, all six in-flight rclone jobs stalled at 0 B/s, and HyperBackup started retrying from scratch.

Fix: I symlinked /opt/cass-s3/data/parts/ to a dedicated 500 GB LVM volume and added a janitor that rms any staging file older than 24 hours (defensive — should never happen if the upload path is healthy). Lesson: staging volume sizing matters. The right formula is (max_concurrent_uploads × max_object_size × 2) with a generous safety margin. For HyperBackup's 50 MB pack files this is laughable; for ad-hoc large objects it's not.

Capacity math

Each Drive account gets ~750 GB/day of writes. I leave 50 GB of headroom and round down, so I plan for 700 GB/day/account. With five accounts that's 3.5 TB/day of new data. Initial seed of 27 TB ≈ 8 days at the cap, which is roughly what I observed (it ran a bit slower while I was tuning).

Free tier per Google account is 15 GB. Workspace Business Standard accounts on my tenant get a pooled 2 TB by default, but in practice the cap that matters is the daily-write throttle, not the storage. I've stayed below 1.5 TB per account so far; if I bump up against pool limits I'll buy storage add-ons rather than restructure the design.

Accounts	Daily write ceiling	Seed time for 50 TB
1	0.7 TB/day	~71 days
5 (today)	3.5 TB/day	~14 days
10	7.0 TB/day	~7 days (uplink-bound at ~600 Mbps)

What I'd do differently

Skip the restic clone entirely. I lost two days writing Cass Vault before throwing it away. The instinct was right (Google Drive needs a pre-bundling layer), but HyperBackup is that layer for free.
Postgres from day one. The first cut used SQLite. It worked, but the moment I had 3+ concurrent uploads it started throwing SQLITE_BUSY even with WAL mode and a 30s busy_timeout. Switching to Postgres took an afternoon and made the concurrency story trivial.
Stage to a dedicated volume. The disk-full incident was 100% avoidable.
Custom OAuth client on day one. Using rclone's default global client wasted weeks of debugging where I assumed the problem was somewhere on my side.

Costs

Storage: $0. The five Drive accounts are all on my existing Workspace tenant. The "extra" accounts are real users I would have provisioned anyway.

Compute: cass-s3 runs on hardware I already had (Proxmox host, ~50 MB RAM, single-digit % CPU at peak). nginx and Let's Encrypt are free. Postgres is shared with the rest of the homelab.

Versus Backblaze B2 at 27 TB × $6/TB/month = $162/month, or AWS Glacier Deep Archive at roughly $27/month for storage but a $700+ retrieval bill the one time I'd ever need it. The Drive round-about pays for itself the day it's deployed.

What's next

Multi-region. Right now everything goes through one Proxmox host. If that host is down, backups stall. The clean fix is a second cass-s3 instance, a shared Postgres, and a load balancer.
Lifecycle. HyperBackup prunes old generations by calling DeleteObject, but the deleted Drive files currently go to that account's trash and need a periodic empty-trash sweep. Trivial cron, just haven't written it yet.
Egress. If I ever do a full restore, I'll hit Drive's read quota (10 TB/day/user). With 5 accounts that's a comfortable 50 TB/day, but it's worth knowing the number exists.

If you want to do this yourself, the moving parts are all small and the architecture diagram up top is the whole thing. The hardest bit was admitting that the answer was "stop trying to write a backup tool, write a fake S3 endpoint" — everything else followed.