diff --git a/README.md b/README.md index 74ac734..91132a6 100644 --- a/README.md +++ b/README.md @@ -184,6 +184,15 @@ needed for local dev. For the complete production env var reference Leave all four `B2_*` empty in dev to fall back to a local `/app/uploads` volume. +**Upload architecture (since `b7f8329`)**: Image and document uploads go +**directly from the client to B2** via a presigned POST policy issued by +`POST /api/uploads/presign`. Bytes never traverse the api server. B2 +enforces a 10 MB per-object cap at the protocol level. The worker reaps +orphaned upload sessions hourly via the `maintenance:upload_cleanup` +cron. See [`docs/deployment/09-storage.md`](./docs/deployment/09-storage.md) +for the full flow, and [`docs/deployment/14-deployment-process.md`](./docs/deployment/14-deployment-process.md#one-time-b2-bucket-lifecycle-manual) +for the one-time bucket lifecycle setup. + ### Worker schedules (UTC hours) | Variable | Description | Default | diff --git a/docs/deployment/09-storage.md b/docs/deployment/09-storage.md index c7cf26b..bdff3ec 100644 --- a/docs/deployment/09-storage.md +++ b/docs/deployment/09-storage.md @@ -150,18 +150,64 @@ Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`, ## Access control -### Upload flow +### Upload flow (current — direct-to-B2 with presigned POST) -1. Client POSTs to `/api/upload/` -2. Go API validates the user is authenticated and authorized for the - target resource -3. Go API streams the upload to B2 via minio-go's `PutObject` -4. B2 returns a key -5. Go API stores the key in Postgres -6. Returns the key to the client +Image and document uploads go **directly from the client to B2**. The +api server only signs a short-lived POST policy; the bytes never +traverse our cluster. This is the WhatsApp / Slack architecture and +sidesteps the api as a proxy bottleneck. -The B2 bucket is **private**. Clients can't GET directly; they always -go through the Go API. +1. Client `POST /api/uploads/presign` with `{category, content_type, content_length}`. +2. api validates auth, per-user quota (10 concurrent in-flight, + 50/hour rate limit), allowed mime, and the 10 MB cap. On success it + creates a `pending_uploads` row, signs a B2 POST policy with a + `content-length-range` condition bound to the claimed length ±256 + bytes, and returns `{id, upload_url, fields, key, expires_at}`. +3. Client multipart-POSTs the bytes directly to B2 using the returned + fields. **B2 enforces the size cap at the protocol level** — clients + can't bypass it by lying about Content-Length. +4. Client POSTs to the entity-creation endpoint (`/api/task-completions/`, + `/api/documents/`) with `upload_ids: [id]`. The service `HEAD`s each + B2 object, verifies size matches `expected_bytes`, marks the + `pending_uploads.claimed_at`, and writes the `task_completion_image` + / `document_image` row referencing the upload. + +The signed URL is valid for 15 minutes; presigns are not reusable. + +The B2 bucket stays **private** — only the api ever holds the key +material. Clients can't list or GET directly without a presign. + +``` +┌──────────┐ 1) presign ┌────────┐ +│ client │ ──────────────────► │ api │ +│ │ ◄────────────────── │ │ POST policy + key +│ │ └────────┘ +│ │ row in +│ │ pending_uploads +│ │ (claimed_at NULL) +│ │ 2) POST bytes ┌────────┐ +│ │ ──────────────────► │ B2 │ enforces policy +│ │ ◄────────────────── │ │ +│ │ └────────┘ +│ │ 3) attach ┌────────┐ +│ │ ──────────────────► │ api │ HEAD B2 object, +│ │ upload_ids: [id] │ │ mark claimed_at, +│ │ └────────┘ insert image row +└──────────┘ +``` + +Server-side enforcement summary: + +| Check | Where | Reject if | +|---|---|---| +| Auth | api middleware | unauthenticated | +| Mime allowlist | `upload_service.go:allowedContentTypes` | not in list for category | +| Size cap (10 MB) | api before signing + B2 policy | content_length > 10 MiB | +| Concurrency cap (10) | `CountUnclaimedActiveForUser` | already 10 unclaimed in-flight | +| Rate limit (50/hr) | Redis sliding window `upload:presign::` | 51st presign in the same hour | +| Size at upload time | B2 (signed policy) | bytes outside content-length-range | +| Ownership at attach | `FindUnclaimedForUser` | upload_id belongs to a different user | +| Bytes match claim | `s3.Stat()` + bytes comparison | actual size differs from expected ±256 | ### Download flow (current) @@ -170,34 +216,55 @@ go through the Go API. 3. Go API fetches from B2 and streams back to the client This proxies every download through the api. For high-traffic media -that's inefficient (api becomes an egress bottleneck). - -### Future: signed URLs - -We could generate time-limited signed URLs for B2 objects: - -```go -url, err := s3Client.PresignedGetObject(ctx, bucket, key, 1*time.Hour, nil) -``` - -Returns a URL the client can GET directly from B2, scoped to a specific -object, valid for 1h. Saves api bandwidth and latency. - -Not yet implemented. TODO (Chapter 20). +that's inefficient (api becomes an egress bottleneck) — could be +replaced with presigned GET URLs on the same bucket. Not yet shipped; +download volume is low enough that the proxy is fine for now. ## Lifecycle and retention -We have **no lifecycle rules** set on the bucket. Objects live forever -unless the app deletes them. +### Orphan cleanup (`pending_uploads`) -When a user deletes their account, the app should delete their B2 -objects. This is currently not automated — a compliance gap for any -"right to be forgotten" request. +Every presign creates a row in `pending_uploads` with `expires_at = +now + 15 min`. If the client never finishes the upload, or finishes +but never calls the attach endpoint, the row stays unclaimed. An +hourly cron in the worker reaps them: -**TODO** (Chapter 20): Either: -- Implement explicit cleanup in the user deletion handler, or -- Add B2 lifecycle rule tied to object metadata (tag objects with - user ID; rule deletes tagged objects when user is soft-deleted) +- **`maintenance:upload_cleanup`** — cron `30 * * * *`. Selects + unclaimed rows past `expires_at`, deletes the corresponding B2 + object, deletes the row. Up to 500 per tick; the next tick picks up + any overflow. Worker logs include `reaped` count. + +The worker constructs a `StorageService` at startup; if storage init +fails (e.g. `B2_KEY_ID` / `B2_APP_KEY` not wired into the worker +deployment), the cleanup handler logs a warning and no-ops. See +`deploy-k3s/manifests/worker/deployment.yaml` — both B2 secrets are +required envs on this pod. + +### Bucket lifecycle (backstop) + +A B2 lifecycle rule on the `uploads/` prefix is the safety net if the +worker is offline for an extended period: + +- Hide objects 7 days after upload. +- Delete 1 day after hidden. + +This is configured manually via the Backblaze console (B2's S3 +lifecycle API isn't fully implemented). See +`deploy-k3s/manifests/b2-lifecycle.md` for the exact rule and +`b2 bucket get-info` verification command. + +### User-deletion cascade + +When a user deletes their account, the app deletes their `task_*` / +`document` rows. The associated B2 objects survive — same compliance +gap as before, not yet automated. Two approaches: + +- Walk the image rows on user delete and `RemoveObject` each (simple, + synchronous, slow for users with many uploads). +- Tag objects with a `user_id` metadata header at upload time, then + use a B2 lifecycle rule scoped to a deleted-users prefix. + +Option 1 is the next item in the upload roadmap. ## Backup of B2 diff --git a/docs/deployment/14-deployment-process.md b/docs/deployment/14-deployment-process.md index c32444f..10b2de5 100644 --- a/docs/deployment/14-deployment-process.md +++ b/docs/deployment/14-deployment-process.md @@ -247,6 +247,38 @@ kubectl patch secret honeydue-secrets -n honeydue \ kubectl rollout restart -n honeydue deployment/api deployment/worker ``` +## One-time B2 bucket lifecycle (manual) + +The `pending_uploads` cleanup cron (`30 * * * *` on the worker) handles +the common case of reaping orphaned uploads. The B2 bucket lifecycle +rule on the `uploads/` prefix is the **backstop** if the worker is +offline for >24 hours. It's configured once via the Backblaze web +console — B2's S3 lifecycle API isn't fully implemented, so this can't +be in the deploy script. + +One-time setup: + +1. Open https://secure.backblaze.com/b2_buckets.htm → bucket + `honeyDueProd` → **Lifecycle Settings** → **Custom** +2. Add rule: + - File name prefix: `uploads/` + - Hide files older than: **7 days** + - Delete hidden files older than: **1 day** + +Total maximum lifetime of an orphaned object after the rule fires: 8 +days. The worker normally reaps within an hour, so the rule should +almost never trigger. + +Verify: + +```bash +# Requires the b2 CLI: brew install b2-tools +b2 bucket get-info honeyDueProd | jq '.lifecycleRules' +``` + +See `deploy-k3s/manifests/b2-lifecycle.md` for the canonical rule +definition and a curl-based fallback if the b2 CLI isn't available. + ## Manifest changes When you add/modify a deployment YAML: diff --git a/docs/deployment/README.md b/docs/deployment/README.md index 69dc7c1..433884e 100644 --- a/docs/deployment/README.md +++ b/docs/deployment/README.md @@ -40,7 +40,7 @@ they do, and how to operate them. - [07 — Services](./07-services.md) — api, admin, worker, redis per-service deep dive - [08 — Database](./08-database.md) — Neon Postgres, advisory-lock migrations -- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go client details +- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go, presigned-URL direct uploads - [10 — Secrets & Config](./10-secrets-config.md) — ConfigMap, Secret, env mapping - [11 — Registry](./11-registry.md) — Gitea container registry, multi-arch builds