docs: presigned-URL upload flow + B2 lifecycle setup
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Backend CI / Build (push) Has been cancelled

09-storage.md:
  - Replaced the "Upload flow" section. The previous text described the
    multipart-via-API path that was removed in b7f8329. Now documents
    the three-step direct-to-B2 flow (presign → POST to B2 → attach
    via upload_ids[]) with an ASCII diagram and a server-side
    enforcement-points table.
  - Replaced the "Future: signed URLs" placeholder (since presigned
    URLs are now the present, not the future).
  - Added "Lifecycle and retention" subsections covering the
    pending_uploads cleanup cron (worker, 30 * * * *), the B2 bucket
    lifecycle as backstop (uploads/ prefix, 7-day hide + 1-day delete),
    and the still-open user-deletion cascade gap.

14-deployment-process.md:
  - Added a "One-time B2 bucket lifecycle (manual)" section explaining
    why the rule can't live in the deploy script (B2's S3 lifecycle
    API is partial), the exact rule to apply via the Backblaze
    console, and a verification command.

docs/deployment/README.md:
  - Updated the chapter 9 description to mention presigned-URL uploads.

README.md (root):
  - Added a paragraph under "Object storage" pointing to the new
    upload architecture and the relevant deployment-book chapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-05-01 17:44:08 -07:00
parent 14026251b7
commit 1347ffadf5
4 changed files with 142 additions and 34 deletions
+9
View File
@@ -184,6 +184,15 @@ needed for local dev. For the complete production env var reference
Leave all four `B2_*` empty in dev to fall back to a local `/app/uploads` volume.
**Upload architecture (since `b7f8329`)**: Image and document uploads go
**directly from the client to B2** via a presigned POST policy issued by
`POST /api/uploads/presign`. Bytes never traverse the api server. B2
enforces a 10 MB per-object cap at the protocol level. The worker reaps
orphaned upload sessions hourly via the `maintenance:upload_cleanup`
cron. See [`docs/deployment/09-storage.md`](./docs/deployment/09-storage.md)
for the full flow, and [`docs/deployment/14-deployment-process.md`](./docs/deployment/14-deployment-process.md#one-time-b2-bucket-lifecycle-manual)
for the one-time bucket lifecycle setup.
### Worker schedules (UTC hours)
| Variable | Description | Default |
+100 -33
View File
@@ -150,18 +150,64 @@ Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`,
## Access control
### Upload flow
### Upload flow (current — direct-to-B2 with presigned POST)
1. Client POSTs to `/api/upload/`
2. Go API validates the user is authenticated and authorized for the
target resource
3. Go API streams the upload to B2 via minio-go's `PutObject`
4. B2 returns a key
5. Go API stores the key in Postgres
6. Returns the key to the client
Image and document uploads go **directly from the client to B2**. The
api server only signs a short-lived POST policy; the bytes never
traverse our cluster. This is the WhatsApp / Slack architecture and
sidesteps the api as a proxy bottleneck.
The B2 bucket is **private**. Clients can't GET directly; they always
go through the Go API.
1. Client `POST /api/uploads/presign` with `{category, content_type, content_length}`.
2. api validates auth, per-user quota (10 concurrent in-flight,
50/hour rate limit), allowed mime, and the 10 MB cap. On success it
creates a `pending_uploads` row, signs a B2 POST policy with a
`content-length-range` condition bound to the claimed length ±256
bytes, and returns `{id, upload_url, fields, key, expires_at}`.
3. Client multipart-POSTs the bytes directly to B2 using the returned
fields. **B2 enforces the size cap at the protocol level** — clients
can't bypass it by lying about Content-Length.
4. Client POSTs to the entity-creation endpoint (`/api/task-completions/`,
`/api/documents/`) with `upload_ids: [id]`. The service `HEAD`s each
B2 object, verifies size matches `expected_bytes`, marks the
`pending_uploads.claimed_at`, and writes the `task_completion_image`
/ `document_image` row referencing the upload.
The signed URL is valid for 15 minutes; presigns are not reusable.
The B2 bucket stays **private** — only the api ever holds the key
material. Clients can't list or GET directly without a presign.
```
┌──────────┐ 1) presign ┌────────┐
│ client │ ──────────────────► │ api │
│ │ ◄────────────────── │ │ POST policy + key
│ │ └────────┘
│ │ row in
│ │ pending_uploads
│ │ (claimed_at NULL)
│ │ 2) POST bytes ┌────────┐
│ │ ──────────────────► │ B2 │ enforces policy
│ │ ◄────────────────── │ │
│ │ └────────┘
│ │ 3) attach ┌────────┐
│ │ ──────────────────► │ api │ HEAD B2 object,
│ │ upload_ids: [id] │ │ mark claimed_at,
│ │ └────────┘ insert image row
└──────────┘
```
Server-side enforcement summary:
| Check | Where | Reject if |
|---|---|---|
| Auth | api middleware | unauthenticated |
| Mime allowlist | `upload_service.go:allowedContentTypes` | not in list for category |
| Size cap (10 MB) | api before signing + B2 policy | content_length > 10 MiB |
| Concurrency cap (10) | `CountUnclaimedActiveForUser` | already 10 unclaimed in-flight |
| Rate limit (50/hr) | Redis sliding window `upload:presign:<uid>:<bucket>` | 51st presign in the same hour |
| Size at upload time | B2 (signed policy) | bytes outside content-length-range |
| Ownership at attach | `FindUnclaimedForUser` | upload_id belongs to a different user |
| Bytes match claim | `s3.Stat()` + bytes comparison | actual size differs from expected ±256 |
### Download flow (current)
@@ -170,34 +216,55 @@ go through the Go API.
3. Go API fetches from B2 and streams back to the client
This proxies every download through the api. For high-traffic media
that's inefficient (api becomes an egress bottleneck).
### Future: signed URLs
We could generate time-limited signed URLs for B2 objects:
```go
url, err := s3Client.PresignedGetObject(ctx, bucket, key, 1*time.Hour, nil)
```
Returns a URL the client can GET directly from B2, scoped to a specific
object, valid for 1h. Saves api bandwidth and latency.
Not yet implemented. TODO (Chapter 20).
that's inefficient (api becomes an egress bottleneck) — could be
replaced with presigned GET URLs on the same bucket. Not yet shipped;
download volume is low enough that the proxy is fine for now.
## Lifecycle and retention
We have **no lifecycle rules** set on the bucket. Objects live forever
unless the app deletes them.
### Orphan cleanup (`pending_uploads`)
When a user deletes their account, the app should delete their B2
objects. This is currently not automated — a compliance gap for any
"right to be forgotten" request.
Every presign creates a row in `pending_uploads` with `expires_at =
now + 15 min`. If the client never finishes the upload, or finishes
but never calls the attach endpoint, the row stays unclaimed. An
hourly cron in the worker reaps them:
**TODO** (Chapter 20): Either:
- Implement explicit cleanup in the user deletion handler, or
- Add B2 lifecycle rule tied to object metadata (tag objects with
user ID; rule deletes tagged objects when user is soft-deleted)
- **`maintenance:upload_cleanup`** — cron `30 * * * *`. Selects
unclaimed rows past `expires_at`, deletes the corresponding B2
object, deletes the row. Up to 500 per tick; the next tick picks up
any overflow. Worker logs include `reaped` count.
The worker constructs a `StorageService` at startup; if storage init
fails (e.g. `B2_KEY_ID` / `B2_APP_KEY` not wired into the worker
deployment), the cleanup handler logs a warning and no-ops. See
`deploy-k3s/manifests/worker/deployment.yaml` — both B2 secrets are
required envs on this pod.
### Bucket lifecycle (backstop)
A B2 lifecycle rule on the `uploads/` prefix is the safety net if the
worker is offline for an extended period:
- Hide objects 7 days after upload.
- Delete 1 day after hidden.
This is configured manually via the Backblaze console (B2's S3
lifecycle API isn't fully implemented). See
`deploy-k3s/manifests/b2-lifecycle.md` for the exact rule and
`b2 bucket get-info` verification command.
### User-deletion cascade
When a user deletes their account, the app deletes their `task_*` /
`document` rows. The associated B2 objects survive — same compliance
gap as before, not yet automated. Two approaches:
- Walk the image rows on user delete and `RemoveObject` each (simple,
synchronous, slow for users with many uploads).
- Tag objects with a `user_id` metadata header at upload time, then
use a B2 lifecycle rule scoped to a deleted-users prefix.
Option 1 is the next item in the upload roadmap.
## Backup of B2
+32
View File
@@ -247,6 +247,38 @@ kubectl patch secret honeydue-secrets -n honeydue \
kubectl rollout restart -n honeydue deployment/api deployment/worker
```
## One-time B2 bucket lifecycle (manual)
The `pending_uploads` cleanup cron (`30 * * * *` on the worker) handles
the common case of reaping orphaned uploads. The B2 bucket lifecycle
rule on the `uploads/` prefix is the **backstop** if the worker is
offline for >24 hours. It's configured once via the Backblaze web
console — B2's S3 lifecycle API isn't fully implemented, so this can't
be in the deploy script.
One-time setup:
1. Open https://secure.backblaze.com/b2_buckets.htm → bucket
`honeyDueProd`**Lifecycle Settings****Custom**
2. Add rule:
- File name prefix: `uploads/`
- Hide files older than: **7 days**
- Delete hidden files older than: **1 day**
Total maximum lifetime of an orphaned object after the rule fires: 8
days. The worker normally reaps within an hour, so the rule should
almost never trigger.
Verify:
```bash
# Requires the b2 CLI: brew install b2-tools
b2 bucket get-info honeyDueProd | jq '.lifecycleRules'
```
See `deploy-k3s/manifests/b2-lifecycle.md` for the canonical rule
definition and a curl-based fallback if the b2 CLI isn't available.
## Manifest changes
When you add/modify a deployment YAML:
+1 -1
View File
@@ -40,7 +40,7 @@ they do, and how to operate them.
- [07 — Services](./07-services.md) — api, admin, worker, redis per-service deep dive
- [08 — Database](./08-database.md) — Neon Postgres, advisory-lock migrations
- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go client details
- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go, presigned-URL direct uploads
- [10 — Secrets & Config](./10-secrets-config.md) — ConfigMap, Secret, env mapping
- [11 — Registry](./11-registry.md) — Gitea container registry, multi-arch builds