Files
honeyDueAPI/docs/deployment/09-storage.md
T
Trey t 6f303dbbaa
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
  temporarily for reference

Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
  callback (was causing 'unlock of unlocked mutex' fatal after
  Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
  + allowlist fonts.googleapis.com so the marketing landing page CSS
  actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
  --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
  images runnable on x86_64 Hetzner nodes; fix array expansion under
  set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
  top-level aliases (the '\${X_SECRET}' form never actually resolved);
  dozzle ports: long-form host_ip is rejected by Swarm, switched to
  short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
  worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
  (Next.js serves at root; /admin/ returned 404 and killed pods);
  startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
  1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
  12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
  real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
  and admin/src/app/api/*, hiding legitimate files)

New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
  hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
  without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log

Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
  - Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
  - Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
  - Part III Security, Traefik ingress (Ch 5-6)
  - Part IV Services, DB, storage, secrets, registry (Ch 7-11)
  - Part V Data flow, deploy process, observability, failures, runbook
    (Ch 12, 14-17)
  - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
  - Appendices: glossary, kubectl cheat sheet, file locations,
    consolidated citations
- README.md: Production Deployment section replaced with pointer to
  the book; Go version bumped to 1.25

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 07:20:54 -05:00

266 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 09 — Object Storage (Backblaze B2)
## Summary
User-uploaded files (photos, documents, task completion attachments) go
to Backblaze B2 via its S3-compatible API. The Go API uses `minio-go/v7`
as the client. This works around a Swarm-era problem where named volumes
are per-node — uploads on node A were invisible to replicas on B and C.
With k3s we could use a shared PVC instead, but B2 is cheaper, offsite,
and already set up.
## Why Backblaze B2
### Decision matrix
| Option | Price per TB stored | Egress | Pros | Cons |
|---|---|---|---|---|
| **Backblaze B2** | **$6/mo** | $0.01/GB, free via CF | Cheap, hard spending caps, S3-compatible | US-West/East regions only (not EU) |
| AWS S3 Standard | $23/mo | $0.09/GB | Most ubiquitous | Expensive |
| Cloudflare R2 | $15/mo | Free (!) | Zero egress, CF-native | Newer, fewer features |
| DigitalOcean Spaces | $5/mo for 250GB + $0.01/GB | Free 1TB, $0.01/GB after | Simple | Less reliable than AWS |
| Local PVC on k3s | $0 | $0 | Already in cluster | Per-node, no HA, no offsite |
B2 won because:
1. **Hard spending cap** — unique in the industry. No surprise AWS bill.
2. **Cheapest at rest** — 34× cheaper than S3.
3. **Free egress through Cloudflare** — we already use CF; when we
eventually serve upload URLs through CF, egress is free.
4. **Mature S3-compatible API** — minio-go talks to it natively.
Rejected:
- **R2** was the close second. Zero egress is amazing. Rejected
primarily for inertia (B2 already set up in the MyCrib era). A future
migration to R2 would be reasonable.
- **Local PVC** doesn't work for our setup because we want uploads
durable and accessible from any node/replica.
## Configuration
Bucket: `honeyDueProd` (mixed case; B2 allows this, minio-go handles it
via path-style addressing — see §path-style below).
Region: `us-east-005` (B2's South Carolina region — closer to our
Neon DB in AWS us-east-1 than the West Coast options).
Endpoint: `s3.us-east-005.backblazeb2.com`
### Environment variables
From ConfigMap:
| Var | Value |
|---|---|
| `B2_ENDPOINT` | `s3.us-east-005.backblazeb2.com` |
| `B2_BUCKET_NAME` | `honeyDueProd` |
| `B2_REGION` | `us-east-005` |
| `B2_USE_SSL` | `true` (but see §vestigial var below) |
From Secret:
| Var | Value |
|---|---|
| `B2_KEY_ID` | App key ID (B2-specific identifier) |
| `B2_APP_KEY` | App key secret |
### App key scope
The B2 app key is **bucket-scoped**, not account-scoped. Can only
read/write the `honeyDueProd` bucket. Cannot:
- List other buckets
- Delete the bucket
- Create new buckets
- Touch account settings
This is the B2 equivalent of an IAM role with least privilege. If the
key leaks, the damage is limited to the `honeyDueProd` bucket.
## The minio-go client
The Go app uses `github.com/minio/minio-go/v7` — a Go SDK compatible
with any S3-flavored API. Relevant code at
`internal/services/storage_backend_s3.go`:
```go
client, err := minio.New(endpoint, &minio.Options{
Creds: credentials.NewStaticV4(keyID, appKey, ""),
Secure: useSSL,
Region: region,
})
```
### Path-style vs virtual-hosted addressing
S3's URL scheme has two flavors:
- **Virtual-hosted**: `https://mybucket.s3.amazonaws.com/mykey`
- **Path-style**: `https://s3.amazonaws.com/mybucket/mykey`
With virtual-hosted style, the bucket name must be DNS-compatible —
lowercase, no uppercase letters. `honeyDueProd` fails this.
With path-style, the bucket name is just a URL path segment — any valid
string works.
minio-go auto-detects: for AWS S3 it prefers virtual-hosted; for
non-AWS endpoints (like B2) it defaults to path-style. So
`honeyDueProd` with capital letters works transparently.
## The `B2_USE_SSL` vestigial variable
`prod.env` has `B2_USE_SSL=true`. But the Go app's
`internal/config/config.go:295` reads the env var
`STORAGE_USE_SSL`, not `B2_USE_SSL`:
```go
S3UseSSL: viper.GetString("STORAGE_USE_SSL") == "" || viper.GetBool("STORAGE_USE_SSL"),
```
Whoever wrote the original config used `B2_USE_SSL` in `prod.env` and
`STORAGE_USE_SSL` in the code. They don't match.
**Net effect**: The app reads `STORAGE_USE_SSL`, which is unset, and
the default `(empty) || true` evaluates to `true`. So SSL is always on,
despite `B2_USE_SSL=false` or `true` or anything else.
This is a dormant bug. Anyone setting `B2_USE_SSL=false` expecting to
disable TLS would be surprised it stays on. Fortunately that's the
right default for production B2 (which only accepts HTTPS anyway).
**TODO**: Rename `STORAGE_USE_SSL``B2_USE_SSL` in the Go code to
match the config. Documented in Chapter 19 §Vestigial config.
## What we store there
Today (limited rollout):
- User profile photos
- Task completion photos
- Document uploads (PDFs, images attached to records)
File keys follow a hierarchy like:
```
users/<user_id>/profile/<uuid>.jpg
residences/<residence_id>/documents/<uuid>.pdf
tasks/<task_id>/completions/<uuid>.jpg
```
Max file size is **10 MB** per upload (`STORAGE_MAX_FILE_SIZE=10485760`).
Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`,
`application/pdf` (`STORAGE_ALLOWED_TYPES`).
## Access control
### Upload flow
1. Client POSTs to `/api/upload/`
2. Go API validates the user is authenticated and authorized for the
target resource
3. Go API streams the upload to B2 via minio-go's `PutObject`
4. B2 returns a key
5. Go API stores the key in Postgres
6. Returns the key to the client
The B2 bucket is **private**. Clients can't GET directly; they always
go through the Go API.
### Download flow (current)
1. Client requests `/api/media/<key>`
2. Go API checks the user can access this key
3. Go API fetches from B2 and streams back to the client
This proxies every download through the api. For high-traffic media
that's inefficient (api becomes an egress bottleneck).
### Future: signed URLs
We could generate time-limited signed URLs for B2 objects:
```go
url, err := s3Client.PresignedGetObject(ctx, bucket, key, 1*time.Hour, nil)
```
Returns a URL the client can GET directly from B2, scoped to a specific
object, valid for 1h. Saves api bandwidth and latency.
Not yet implemented. TODO (Chapter 20).
## Lifecycle and retention
We have **no lifecycle rules** set on the bucket. Objects live forever
unless the app deletes them.
When a user deletes their account, the app should delete their B2
objects. This is currently not automated — a compliance gap for any
"right to be forgotten" request.
**TODO** (Chapter 20): Either:
- Implement explicit cleanup in the user deletion handler, or
- Add B2 lifecycle rule tied to object metadata (tag objects with
user ID; rule deletes tagged objects when user is soft-deleted)
## Backup of B2
We have no backup of B2 objects. B2 itself replicates within the region,
but:
- Accidental deletion via our app = data gone
- B2 itself being compromised = data gone
B2 offers **Object Lock** (WORM — write once read many) which prevents
deletion for a retention period. Not enabled; revisit if/when user data
sensitivity justifies it.
## Cost projection
Current usage is **small** — estimated <50 GB stored.
```
50 GB × $0.006/GB = $0.30/mo storage
1 GB/mo egress (mostly uncached media served via api) → $0.01 (first
3× of stored amount is free anyway, so effectively $0)
```
Total B2 cost: **< $1/mo**. Hard spending cap set to $20/mo in B2
console — if we ever breach that, something's wrong and we want to
know immediately.
At 100k users each uploading ~10 MB average:
- 1 TB stored = $6/mo
- Egress depends on access patterns; with signed URLs served through CF
the egress could still be ~free
## Operator cheat sheet
```bash
# List bucket contents (requires mc or aws CLI configured with B2 creds)
mc alias set b2 https://s3.us-east-005.backblazeb2.com <KEY_ID> <APP_KEY>
mc ls b2/honeyDueProd/
# Count objects
mc find b2/honeyDueProd/ --type f | wc -l
# Download an object
mc cp b2/honeyDueProd/<key> ./
# Check B2 console for usage graphs:
# https://secure.backblaze.com/b2_buckets.htm
```
From inside a Go api pod:
```bash
# Check the in-cluster client config
kubectl exec -n honeydue deploy/api -- env | grep B2_
```
## References
- [Backblaze B2 docs][b2-docs]
- [B2 S3-compatible API][b2-s3]
- [minio-go/v7][minio-go]
- [S3 path-style vs virtual-hosted][s3-style]
[b2-docs]: https://www.backblaze.com/docs/
[b2-s3]: https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
[minio-go]: https://github.com/minio/minio-go
[s3-style]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html