docs: presigned-URL upload flow + B2 lifecycle setup
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Backend CI / Build (push) Has been cancelled

09-storage.md:
  - Replaced the "Upload flow" section. The previous text described the
    multipart-via-API path that was removed in b7f8329. Now documents
    the three-step direct-to-B2 flow (presign → POST to B2 → attach
    via upload_ids[]) with an ASCII diagram and a server-side
    enforcement-points table.
  - Replaced the "Future: signed URLs" placeholder (since presigned
    URLs are now the present, not the future).
  - Added "Lifecycle and retention" subsections covering the
    pending_uploads cleanup cron (worker, 30 * * * *), the B2 bucket
    lifecycle as backstop (uploads/ prefix, 7-day hide + 1-day delete),
    and the still-open user-deletion cascade gap.

14-deployment-process.md:
  - Added a "One-time B2 bucket lifecycle (manual)" section explaining
    why the rule can't live in the deploy script (B2's S3 lifecycle
    API is partial), the exact rule to apply via the Backblaze
    console, and a verification command.

docs/deployment/README.md:
  - Updated the chapter 9 description to mention presigned-URL uploads.

README.md (root):
  - Added a paragraph under "Object storage" pointing to the new
    upload architecture and the relevant deployment-book chapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-05-01 17:44:08 -07:00
parent 14026251b7
commit 1347ffadf5
4 changed files with 142 additions and 34 deletions
+9
View File
@@ -184,6 +184,15 @@ needed for local dev. For the complete production env var reference
Leave all four `B2_*` empty in dev to fall back to a local `/app/uploads` volume. Leave all four `B2_*` empty in dev to fall back to a local `/app/uploads` volume.
**Upload architecture (since `b7f8329`)**: Image and document uploads go
**directly from the client to B2** via a presigned POST policy issued by
`POST /api/uploads/presign`. Bytes never traverse the api server. B2
enforces a 10 MB per-object cap at the protocol level. The worker reaps
orphaned upload sessions hourly via the `maintenance:upload_cleanup`
cron. See [`docs/deployment/09-storage.md`](./docs/deployment/09-storage.md)
for the full flow, and [`docs/deployment/14-deployment-process.md`](./docs/deployment/14-deployment-process.md#one-time-b2-bucket-lifecycle-manual)
for the one-time bucket lifecycle setup.
### Worker schedules (UTC hours) ### Worker schedules (UTC hours)
| Variable | Description | Default | | Variable | Description | Default |
+100 -33
View File
@@ -150,18 +150,64 @@ Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`,
## Access control ## Access control
### Upload flow ### Upload flow (current — direct-to-B2 with presigned POST)
1. Client POSTs to `/api/upload/` Image and document uploads go **directly from the client to B2**. The
2. Go API validates the user is authenticated and authorized for the api server only signs a short-lived POST policy; the bytes never
target resource traverse our cluster. This is the WhatsApp / Slack architecture and
3. Go API streams the upload to B2 via minio-go's `PutObject` sidesteps the api as a proxy bottleneck.
4. B2 returns a key
5. Go API stores the key in Postgres
6. Returns the key to the client
The B2 bucket is **private**. Clients can't GET directly; they always 1. Client `POST /api/uploads/presign` with `{category, content_type, content_length}`.
go through the Go API. 2. api validates auth, per-user quota (10 concurrent in-flight,
50/hour rate limit), allowed mime, and the 10 MB cap. On success it
creates a `pending_uploads` row, signs a B2 POST policy with a
`content-length-range` condition bound to the claimed length ±256
bytes, and returns `{id, upload_url, fields, key, expires_at}`.
3. Client multipart-POSTs the bytes directly to B2 using the returned
fields. **B2 enforces the size cap at the protocol level** — clients
can't bypass it by lying about Content-Length.
4. Client POSTs to the entity-creation endpoint (`/api/task-completions/`,
`/api/documents/`) with `upload_ids: [id]`. The service `HEAD`s each
B2 object, verifies size matches `expected_bytes`, marks the
`pending_uploads.claimed_at`, and writes the `task_completion_image`
/ `document_image` row referencing the upload.
The signed URL is valid for 15 minutes; presigns are not reusable.
The B2 bucket stays **private** — only the api ever holds the key
material. Clients can't list or GET directly without a presign.
```
┌──────────┐ 1) presign ┌────────┐
│ client │ ──────────────────► │ api │
│ │ ◄────────────────── │ │ POST policy + key
│ │ └────────┘
│ │ row in
│ │ pending_uploads
│ │ (claimed_at NULL)
│ │ 2) POST bytes ┌────────┐
│ │ ──────────────────► │ B2 │ enforces policy
│ │ ◄────────────────── │ │
│ │ └────────┘
│ │ 3) attach ┌────────┐
│ │ ──────────────────► │ api │ HEAD B2 object,
│ │ upload_ids: [id] │ │ mark claimed_at,
│ │ └────────┘ insert image row
└──────────┘
```
Server-side enforcement summary:
| Check | Where | Reject if |
|---|---|---|
| Auth | api middleware | unauthenticated |
| Mime allowlist | `upload_service.go:allowedContentTypes` | not in list for category |
| Size cap (10 MB) | api before signing + B2 policy | content_length > 10 MiB |
| Concurrency cap (10) | `CountUnclaimedActiveForUser` | already 10 unclaimed in-flight |
| Rate limit (50/hr) | Redis sliding window `upload:presign:<uid>:<bucket>` | 51st presign in the same hour |
| Size at upload time | B2 (signed policy) | bytes outside content-length-range |
| Ownership at attach | `FindUnclaimedForUser` | upload_id belongs to a different user |
| Bytes match claim | `s3.Stat()` + bytes comparison | actual size differs from expected ±256 |
### Download flow (current) ### Download flow (current)
@@ -170,34 +216,55 @@ go through the Go API.
3. Go API fetches from B2 and streams back to the client 3. Go API fetches from B2 and streams back to the client
This proxies every download through the api. For high-traffic media This proxies every download through the api. For high-traffic media
that's inefficient (api becomes an egress bottleneck). that's inefficient (api becomes an egress bottleneck) — could be
replaced with presigned GET URLs on the same bucket. Not yet shipped;
### Future: signed URLs download volume is low enough that the proxy is fine for now.
We could generate time-limited signed URLs for B2 objects:
```go
url, err := s3Client.PresignedGetObject(ctx, bucket, key, 1*time.Hour, nil)
```
Returns a URL the client can GET directly from B2, scoped to a specific
object, valid for 1h. Saves api bandwidth and latency.
Not yet implemented. TODO (Chapter 20).
## Lifecycle and retention ## Lifecycle and retention
We have **no lifecycle rules** set on the bucket. Objects live forever ### Orphan cleanup (`pending_uploads`)
unless the app deletes them.
When a user deletes their account, the app should delete their B2 Every presign creates a row in `pending_uploads` with `expires_at =
objects. This is currently not automated — a compliance gap for any now + 15 min`. If the client never finishes the upload, or finishes
"right to be forgotten" request. but never calls the attach endpoint, the row stays unclaimed. An
hourly cron in the worker reaps them:
**TODO** (Chapter 20): Either: - **`maintenance:upload_cleanup`** — cron `30 * * * *`. Selects
- Implement explicit cleanup in the user deletion handler, or unclaimed rows past `expires_at`, deletes the corresponding B2
- Add B2 lifecycle rule tied to object metadata (tag objects with object, deletes the row. Up to 500 per tick; the next tick picks up
user ID; rule deletes tagged objects when user is soft-deleted) any overflow. Worker logs include `reaped` count.
The worker constructs a `StorageService` at startup; if storage init
fails (e.g. `B2_KEY_ID` / `B2_APP_KEY` not wired into the worker
deployment), the cleanup handler logs a warning and no-ops. See
`deploy-k3s/manifests/worker/deployment.yaml` — both B2 secrets are
required envs on this pod.
### Bucket lifecycle (backstop)
A B2 lifecycle rule on the `uploads/` prefix is the safety net if the
worker is offline for an extended period:
- Hide objects 7 days after upload.
- Delete 1 day after hidden.
This is configured manually via the Backblaze console (B2's S3
lifecycle API isn't fully implemented). See
`deploy-k3s/manifests/b2-lifecycle.md` for the exact rule and
`b2 bucket get-info` verification command.
### User-deletion cascade
When a user deletes their account, the app deletes their `task_*` /
`document` rows. The associated B2 objects survive — same compliance
gap as before, not yet automated. Two approaches:
- Walk the image rows on user delete and `RemoveObject` each (simple,
synchronous, slow for users with many uploads).
- Tag objects with a `user_id` metadata header at upload time, then
use a B2 lifecycle rule scoped to a deleted-users prefix.
Option 1 is the next item in the upload roadmap.
## Backup of B2 ## Backup of B2
+32
View File
@@ -247,6 +247,38 @@ kubectl patch secret honeydue-secrets -n honeydue \
kubectl rollout restart -n honeydue deployment/api deployment/worker kubectl rollout restart -n honeydue deployment/api deployment/worker
``` ```
## One-time B2 bucket lifecycle (manual)
The `pending_uploads` cleanup cron (`30 * * * *` on the worker) handles
the common case of reaping orphaned uploads. The B2 bucket lifecycle
rule on the `uploads/` prefix is the **backstop** if the worker is
offline for >24 hours. It's configured once via the Backblaze web
console — B2's S3 lifecycle API isn't fully implemented, so this can't
be in the deploy script.
One-time setup:
1. Open https://secure.backblaze.com/b2_buckets.htm → bucket
`honeyDueProd`**Lifecycle Settings****Custom**
2. Add rule:
- File name prefix: `uploads/`
- Hide files older than: **7 days**
- Delete hidden files older than: **1 day**
Total maximum lifetime of an orphaned object after the rule fires: 8
days. The worker normally reaps within an hour, so the rule should
almost never trigger.
Verify:
```bash
# Requires the b2 CLI: brew install b2-tools
b2 bucket get-info honeyDueProd | jq '.lifecycleRules'
```
See `deploy-k3s/manifests/b2-lifecycle.md` for the canonical rule
definition and a curl-based fallback if the b2 CLI isn't available.
## Manifest changes ## Manifest changes
When you add/modify a deployment YAML: When you add/modify a deployment YAML:
+1 -1
View File
@@ -40,7 +40,7 @@ they do, and how to operate them.
- [07 — Services](./07-services.md) — api, admin, worker, redis per-service deep dive - [07 — Services](./07-services.md) — api, admin, worker, redis per-service deep dive
- [08 — Database](./08-database.md) — Neon Postgres, advisory-lock migrations - [08 — Database](./08-database.md) — Neon Postgres, advisory-lock migrations
- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go client details - [09 — Storage](./09-storage.md) — Backblaze B2, minio-go, presigned-URL direct uploads
- [10 — Secrets & Config](./10-secrets-config.md) — ConfigMap, Secret, env mapping - [10 — Secrets & Config](./10-secrets-config.md) — ConfigMap, Secret, env mapping
- [11 — Registry](./11-registry.md) — Gitea container registry, multi-arch builds - [11 — Registry](./11-registry.md) — Gitea container registry, multi-arch builds