1347ffadf5
09-storage.md:
- Replaced the "Upload flow" section. The previous text described the
multipart-via-API path that was removed in b7f8329. Now documents
the three-step direct-to-B2 flow (presign → POST to B2 → attach
via upload_ids[]) with an ASCII diagram and a server-side
enforcement-points table.
- Replaced the "Future: signed URLs" placeholder (since presigned
URLs are now the present, not the future).
- Added "Lifecycle and retention" subsections covering the
pending_uploads cleanup cron (worker, 30 * * * *), the B2 bucket
lifecycle as backstop (uploads/ prefix, 7-day hide + 1-day delete),
and the still-open user-deletion cascade gap.
14-deployment-process.md:
- Added a "One-time B2 bucket lifecycle (manual)" section explaining
why the rule can't live in the deploy script (B2's S3 lifecycle
API is partial), the exact rule to apply via the Backblaze
console, and a verification command.
docs/deployment/README.md:
- Updated the chapter 9 description to mention presigned-URL uploads.
README.md (root):
- Added a paragraph under "Object storage" pointing to the new
upload architecture and the relevant deployment-book chapters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
333 lines
12 KiB
Markdown
333 lines
12 KiB
Markdown
# 09 — Object Storage (Backblaze B2)
|
||
|
||
## Summary
|
||
|
||
User-uploaded files (photos, documents, task completion attachments) go
|
||
to Backblaze B2 via its S3-compatible API. The Go API uses `minio-go/v7`
|
||
as the client. This works around a Swarm-era problem where named volumes
|
||
are per-node — uploads on node A were invisible to replicas on B and C.
|
||
With k3s we could use a shared PVC instead, but B2 is cheaper, offsite,
|
||
and already set up.
|
||
|
||
## Why Backblaze B2
|
||
|
||
### Decision matrix
|
||
|
||
| Option | Price per TB stored | Egress | Pros | Cons |
|
||
|---|---|---|---|---|
|
||
| **Backblaze B2** | **$6/mo** | $0.01/GB, free via CF | Cheap, hard spending caps, S3-compatible | US-West/East regions only (not EU) |
|
||
| AWS S3 Standard | $23/mo | $0.09/GB | Most ubiquitous | Expensive |
|
||
| Cloudflare R2 | $15/mo | Free (!) | Zero egress, CF-native | Newer, fewer features |
|
||
| DigitalOcean Spaces | $5/mo for 250GB + $0.01/GB | Free 1TB, $0.01/GB after | Simple | Less reliable than AWS |
|
||
| Local PVC on k3s | $0 | $0 | Already in cluster | Per-node, no HA, no offsite |
|
||
|
||
B2 won because:
|
||
1. **Hard spending cap** — unique in the industry. No surprise AWS bill.
|
||
2. **Cheapest at rest** — 3–4× cheaper than S3.
|
||
3. **Free egress through Cloudflare** — we already use CF; when we
|
||
eventually serve upload URLs through CF, egress is free.
|
||
4. **Mature S3-compatible API** — minio-go talks to it natively.
|
||
|
||
Rejected:
|
||
- **R2** was the close second. Zero egress is amazing. Rejected
|
||
primarily for inertia (B2 already set up in the MyCrib era). A future
|
||
migration to R2 would be reasonable.
|
||
- **Local PVC** doesn't work for our setup because we want uploads
|
||
durable and accessible from any node/replica.
|
||
|
||
## Configuration
|
||
|
||
Bucket: `honeyDueProd` (mixed case; B2 allows this, minio-go handles it
|
||
via path-style addressing — see §path-style below).
|
||
|
||
Region: `us-east-005` (B2's South Carolina region — closer to our
|
||
Neon DB in AWS us-east-1 than the West Coast options).
|
||
|
||
Endpoint: `s3.us-east-005.backblazeb2.com`
|
||
|
||
### Environment variables
|
||
|
||
From ConfigMap:
|
||
|
||
| Var | Value |
|
||
|---|---|
|
||
| `B2_ENDPOINT` | `s3.us-east-005.backblazeb2.com` |
|
||
| `B2_BUCKET_NAME` | `honeyDueProd` |
|
||
| `B2_REGION` | `us-east-005` |
|
||
| `B2_USE_SSL` | `true` (but see §vestigial var below) |
|
||
|
||
From Secret:
|
||
|
||
| Var | Value |
|
||
|---|---|
|
||
| `B2_KEY_ID` | App key ID (B2-specific identifier) |
|
||
| `B2_APP_KEY` | App key secret |
|
||
|
||
### App key scope
|
||
|
||
The B2 app key is **bucket-scoped**, not account-scoped. Can only
|
||
read/write the `honeyDueProd` bucket. Cannot:
|
||
- List other buckets
|
||
- Delete the bucket
|
||
- Create new buckets
|
||
- Touch account settings
|
||
|
||
This is the B2 equivalent of an IAM role with least privilege. If the
|
||
key leaks, the damage is limited to the `honeyDueProd` bucket.
|
||
|
||
## The minio-go client
|
||
|
||
The Go app uses `github.com/minio/minio-go/v7` — a Go SDK compatible
|
||
with any S3-flavored API. Relevant code at
|
||
`internal/services/storage_backend_s3.go`:
|
||
|
||
```go
|
||
client, err := minio.New(endpoint, &minio.Options{
|
||
Creds: credentials.NewStaticV4(keyID, appKey, ""),
|
||
Secure: useSSL,
|
||
Region: region,
|
||
})
|
||
```
|
||
|
||
### Path-style vs virtual-hosted addressing
|
||
|
||
S3's URL scheme has two flavors:
|
||
|
||
- **Virtual-hosted**: `https://mybucket.s3.amazonaws.com/mykey`
|
||
- **Path-style**: `https://s3.amazonaws.com/mybucket/mykey`
|
||
|
||
With virtual-hosted style, the bucket name must be DNS-compatible —
|
||
lowercase, no uppercase letters. `honeyDueProd` fails this.
|
||
|
||
With path-style, the bucket name is just a URL path segment — any valid
|
||
string works.
|
||
|
||
minio-go auto-detects: for AWS S3 it prefers virtual-hosted; for
|
||
non-AWS endpoints (like B2) it defaults to path-style. So
|
||
`honeyDueProd` with capital letters works transparently.
|
||
|
||
## The `B2_USE_SSL` vestigial variable
|
||
|
||
`prod.env` has `B2_USE_SSL=true`. But the Go app's
|
||
`internal/config/config.go:295` reads the env var
|
||
`STORAGE_USE_SSL`, not `B2_USE_SSL`:
|
||
|
||
```go
|
||
S3UseSSL: viper.GetString("STORAGE_USE_SSL") == "" || viper.GetBool("STORAGE_USE_SSL"),
|
||
```
|
||
|
||
Whoever wrote the original config used `B2_USE_SSL` in `prod.env` and
|
||
`STORAGE_USE_SSL` in the code. They don't match.
|
||
|
||
**Net effect**: The app reads `STORAGE_USE_SSL`, which is unset, and
|
||
the default `(empty) || true` evaluates to `true`. So SSL is always on,
|
||
despite `B2_USE_SSL=false` or `true` or anything else.
|
||
|
||
This is a dormant bug. Anyone setting `B2_USE_SSL=false` expecting to
|
||
disable TLS would be surprised it stays on. Fortunately that's the
|
||
right default for production B2 (which only accepts HTTPS anyway).
|
||
|
||
**TODO**: Rename `STORAGE_USE_SSL` → `B2_USE_SSL` in the Go code to
|
||
match the config. Documented in Chapter 19 §Vestigial config.
|
||
|
||
## What we store there
|
||
|
||
Today (limited rollout):
|
||
- User profile photos
|
||
- Task completion photos
|
||
- Document uploads (PDFs, images attached to records)
|
||
|
||
File keys follow a hierarchy like:
|
||
```
|
||
users/<user_id>/profile/<uuid>.jpg
|
||
residences/<residence_id>/documents/<uuid>.pdf
|
||
tasks/<task_id>/completions/<uuid>.jpg
|
||
```
|
||
|
||
Max file size is **10 MB** per upload (`STORAGE_MAX_FILE_SIZE=10485760`).
|
||
Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`,
|
||
`application/pdf` (`STORAGE_ALLOWED_TYPES`).
|
||
|
||
## Access control
|
||
|
||
### Upload flow (current — direct-to-B2 with presigned POST)
|
||
|
||
Image and document uploads go **directly from the client to B2**. The
|
||
api server only signs a short-lived POST policy; the bytes never
|
||
traverse our cluster. This is the WhatsApp / Slack architecture and
|
||
sidesteps the api as a proxy bottleneck.
|
||
|
||
1. Client `POST /api/uploads/presign` with `{category, content_type, content_length}`.
|
||
2. api validates auth, per-user quota (10 concurrent in-flight,
|
||
50/hour rate limit), allowed mime, and the 10 MB cap. On success it
|
||
creates a `pending_uploads` row, signs a B2 POST policy with a
|
||
`content-length-range` condition bound to the claimed length ±256
|
||
bytes, and returns `{id, upload_url, fields, key, expires_at}`.
|
||
3. Client multipart-POSTs the bytes directly to B2 using the returned
|
||
fields. **B2 enforces the size cap at the protocol level** — clients
|
||
can't bypass it by lying about Content-Length.
|
||
4. Client POSTs to the entity-creation endpoint (`/api/task-completions/`,
|
||
`/api/documents/`) with `upload_ids: [id]`. The service `HEAD`s each
|
||
B2 object, verifies size matches `expected_bytes`, marks the
|
||
`pending_uploads.claimed_at`, and writes the `task_completion_image`
|
||
/ `document_image` row referencing the upload.
|
||
|
||
The signed URL is valid for 15 minutes; presigns are not reusable.
|
||
|
||
The B2 bucket stays **private** — only the api ever holds the key
|
||
material. Clients can't list or GET directly without a presign.
|
||
|
||
```
|
||
┌──────────┐ 1) presign ┌────────┐
|
||
│ client │ ──────────────────► │ api │
|
||
│ │ ◄────────────────── │ │ POST policy + key
|
||
│ │ └────────┘
|
||
│ │ row in
|
||
│ │ pending_uploads
|
||
│ │ (claimed_at NULL)
|
||
│ │ 2) POST bytes ┌────────┐
|
||
│ │ ──────────────────► │ B2 │ enforces policy
|
||
│ │ ◄────────────────── │ │
|
||
│ │ └────────┘
|
||
│ │ 3) attach ┌────────┐
|
||
│ │ ──────────────────► │ api │ HEAD B2 object,
|
||
│ │ upload_ids: [id] │ │ mark claimed_at,
|
||
│ │ └────────┘ insert image row
|
||
└──────────┘
|
||
```
|
||
|
||
Server-side enforcement summary:
|
||
|
||
| Check | Where | Reject if |
|
||
|---|---|---|
|
||
| Auth | api middleware | unauthenticated |
|
||
| Mime allowlist | `upload_service.go:allowedContentTypes` | not in list for category |
|
||
| Size cap (10 MB) | api before signing + B2 policy | content_length > 10 MiB |
|
||
| Concurrency cap (10) | `CountUnclaimedActiveForUser` | already 10 unclaimed in-flight |
|
||
| Rate limit (50/hr) | Redis sliding window `upload:presign:<uid>:<bucket>` | 51st presign in the same hour |
|
||
| Size at upload time | B2 (signed policy) | bytes outside content-length-range |
|
||
| Ownership at attach | `FindUnclaimedForUser` | upload_id belongs to a different user |
|
||
| Bytes match claim | `s3.Stat()` + bytes comparison | actual size differs from expected ±256 |
|
||
|
||
### Download flow (current)
|
||
|
||
1. Client requests `/api/media/<key>`
|
||
2. Go API checks the user can access this key
|
||
3. Go API fetches from B2 and streams back to the client
|
||
|
||
This proxies every download through the api. For high-traffic media
|
||
that's inefficient (api becomes an egress bottleneck) — could be
|
||
replaced with presigned GET URLs on the same bucket. Not yet shipped;
|
||
download volume is low enough that the proxy is fine for now.
|
||
|
||
## Lifecycle and retention
|
||
|
||
### Orphan cleanup (`pending_uploads`)
|
||
|
||
Every presign creates a row in `pending_uploads` with `expires_at =
|
||
now + 15 min`. If the client never finishes the upload, or finishes
|
||
but never calls the attach endpoint, the row stays unclaimed. An
|
||
hourly cron in the worker reaps them:
|
||
|
||
- **`maintenance:upload_cleanup`** — cron `30 * * * *`. Selects
|
||
unclaimed rows past `expires_at`, deletes the corresponding B2
|
||
object, deletes the row. Up to 500 per tick; the next tick picks up
|
||
any overflow. Worker logs include `reaped` count.
|
||
|
||
The worker constructs a `StorageService` at startup; if storage init
|
||
fails (e.g. `B2_KEY_ID` / `B2_APP_KEY` not wired into the worker
|
||
deployment), the cleanup handler logs a warning and no-ops. See
|
||
`deploy-k3s/manifests/worker/deployment.yaml` — both B2 secrets are
|
||
required envs on this pod.
|
||
|
||
### Bucket lifecycle (backstop)
|
||
|
||
A B2 lifecycle rule on the `uploads/` prefix is the safety net if the
|
||
worker is offline for an extended period:
|
||
|
||
- Hide objects 7 days after upload.
|
||
- Delete 1 day after hidden.
|
||
|
||
This is configured manually via the Backblaze console (B2's S3
|
||
lifecycle API isn't fully implemented). See
|
||
`deploy-k3s/manifests/b2-lifecycle.md` for the exact rule and
|
||
`b2 bucket get-info` verification command.
|
||
|
||
### User-deletion cascade
|
||
|
||
When a user deletes their account, the app deletes their `task_*` /
|
||
`document` rows. The associated B2 objects survive — same compliance
|
||
gap as before, not yet automated. Two approaches:
|
||
|
||
- Walk the image rows on user delete and `RemoveObject` each (simple,
|
||
synchronous, slow for users with many uploads).
|
||
- Tag objects with a `user_id` metadata header at upload time, then
|
||
use a B2 lifecycle rule scoped to a deleted-users prefix.
|
||
|
||
Option 1 is the next item in the upload roadmap.
|
||
|
||
## Backup of B2
|
||
|
||
We have no backup of B2 objects. B2 itself replicates within the region,
|
||
but:
|
||
- Accidental deletion via our app = data gone
|
||
- B2 itself being compromised = data gone
|
||
|
||
B2 offers **Object Lock** (WORM — write once read many) which prevents
|
||
deletion for a retention period. Not enabled; revisit if/when user data
|
||
sensitivity justifies it.
|
||
|
||
## Cost projection
|
||
|
||
Current usage is **small** — estimated <50 GB stored.
|
||
|
||
```
|
||
50 GB × $0.006/GB = $0.30/mo storage
|
||
1 GB/mo egress (mostly uncached media served via api) → $0.01 (first
|
||
3× of stored amount is free anyway, so effectively $0)
|
||
```
|
||
|
||
Total B2 cost: **< $1/mo**. Hard spending cap set to $20/mo in B2
|
||
console — if we ever breach that, something's wrong and we want to
|
||
know immediately.
|
||
|
||
At 100k users each uploading ~10 MB average:
|
||
- 1 TB stored = $6/mo
|
||
- Egress depends on access patterns; with signed URLs served through CF
|
||
the egress could still be ~free
|
||
|
||
## Operator cheat sheet
|
||
|
||
```bash
|
||
# List bucket contents (requires mc or aws CLI configured with B2 creds)
|
||
mc alias set b2 https://s3.us-east-005.backblazeb2.com <KEY_ID> <APP_KEY>
|
||
mc ls b2/honeyDueProd/
|
||
|
||
# Count objects
|
||
mc find b2/honeyDueProd/ --type f | wc -l
|
||
|
||
# Download an object
|
||
mc cp b2/honeyDueProd/<key> ./
|
||
|
||
# Check B2 console for usage graphs:
|
||
# https://secure.backblaze.com/b2_buckets.htm
|
||
```
|
||
|
||
From inside a Go api pod:
|
||
```bash
|
||
# Check the in-cluster client config
|
||
kubectl exec -n honeydue deploy/api -- env | grep B2_
|
||
```
|
||
|
||
## References
|
||
|
||
- [Backblaze B2 docs][b2-docs]
|
||
- [B2 S3-compatible API][b2-s3]
|
||
- [minio-go/v7][minio-go]
|
||
- [S3 path-style vs virtual-hosted][s3-style]
|
||
|
||
[b2-docs]: https://www.backblaze.com/docs/
|
||
[b2-s3]: https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
|
||
[minio-go]: https://github.com/minio/minio-go
|
||
[s3-style]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
|