docs: presigned-URL upload flow + B2 lifecycle setup
09-storage.md:
- Replaced the "Upload flow" section. The previous text described the
multipart-via-API path that was removed in b7f8329. Now documents
the three-step direct-to-B2 flow (presign → POST to B2 → attach
via upload_ids[]) with an ASCII diagram and a server-side
enforcement-points table.
- Replaced the "Future: signed URLs" placeholder (since presigned
URLs are now the present, not the future).
- Added "Lifecycle and retention" subsections covering the
pending_uploads cleanup cron (worker, 30 * * * *), the B2 bucket
lifecycle as backstop (uploads/ prefix, 7-day hide + 1-day delete),
and the still-open user-deletion cascade gap.
14-deployment-process.md:
- Added a "One-time B2 bucket lifecycle (manual)" section explaining
why the rule can't live in the deploy script (B2's S3 lifecycle
API is partial), the exact rule to apply via the Backblaze
console, and a verification command.
docs/deployment/README.md:
- Updated the chapter 9 description to mention presigned-URL uploads.
README.md (root):
- Added a paragraph under "Object storage" pointing to the new
upload architecture and the relevant deployment-book chapters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -184,6 +184,15 @@ needed for local dev. For the complete production env var reference
|
||||
|
||||
Leave all four `B2_*` empty in dev to fall back to a local `/app/uploads` volume.
|
||||
|
||||
**Upload architecture (since `b7f8329`)**: Image and document uploads go
|
||||
**directly from the client to B2** via a presigned POST policy issued by
|
||||
`POST /api/uploads/presign`. Bytes never traverse the api server. B2
|
||||
enforces a 10 MB per-object cap at the protocol level. The worker reaps
|
||||
orphaned upload sessions hourly via the `maintenance:upload_cleanup`
|
||||
cron. See [`docs/deployment/09-storage.md`](./docs/deployment/09-storage.md)
|
||||
for the full flow, and [`docs/deployment/14-deployment-process.md`](./docs/deployment/14-deployment-process.md#one-time-b2-bucket-lifecycle-manual)
|
||||
for the one-time bucket lifecycle setup.
|
||||
|
||||
### Worker schedules (UTC hours)
|
||||
|
||||
| Variable | Description | Default |
|
||||
|
||||
+100
-33
@@ -150,18 +150,64 @@ Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`,
|
||||
|
||||
## Access control
|
||||
|
||||
### Upload flow
|
||||
### Upload flow (current — direct-to-B2 with presigned POST)
|
||||
|
||||
1. Client POSTs to `/api/upload/`
|
||||
2. Go API validates the user is authenticated and authorized for the
|
||||
target resource
|
||||
3. Go API streams the upload to B2 via minio-go's `PutObject`
|
||||
4. B2 returns a key
|
||||
5. Go API stores the key in Postgres
|
||||
6. Returns the key to the client
|
||||
Image and document uploads go **directly from the client to B2**. The
|
||||
api server only signs a short-lived POST policy; the bytes never
|
||||
traverse our cluster. This is the WhatsApp / Slack architecture and
|
||||
sidesteps the api as a proxy bottleneck.
|
||||
|
||||
The B2 bucket is **private**. Clients can't GET directly; they always
|
||||
go through the Go API.
|
||||
1. Client `POST /api/uploads/presign` with `{category, content_type, content_length}`.
|
||||
2. api validates auth, per-user quota (10 concurrent in-flight,
|
||||
50/hour rate limit), allowed mime, and the 10 MB cap. On success it
|
||||
creates a `pending_uploads` row, signs a B2 POST policy with a
|
||||
`content-length-range` condition bound to the claimed length ±256
|
||||
bytes, and returns `{id, upload_url, fields, key, expires_at}`.
|
||||
3. Client multipart-POSTs the bytes directly to B2 using the returned
|
||||
fields. **B2 enforces the size cap at the protocol level** — clients
|
||||
can't bypass it by lying about Content-Length.
|
||||
4. Client POSTs to the entity-creation endpoint (`/api/task-completions/`,
|
||||
`/api/documents/`) with `upload_ids: [id]`. The service `HEAD`s each
|
||||
B2 object, verifies size matches `expected_bytes`, marks the
|
||||
`pending_uploads.claimed_at`, and writes the `task_completion_image`
|
||||
/ `document_image` row referencing the upload.
|
||||
|
||||
The signed URL is valid for 15 minutes; presigns are not reusable.
|
||||
|
||||
The B2 bucket stays **private** — only the api ever holds the key
|
||||
material. Clients can't list or GET directly without a presign.
|
||||
|
||||
```
|
||||
┌──────────┐ 1) presign ┌────────┐
|
||||
│ client │ ──────────────────► │ api │
|
||||
│ │ ◄────────────────── │ │ POST policy + key
|
||||
│ │ └────────┘
|
||||
│ │ row in
|
||||
│ │ pending_uploads
|
||||
│ │ (claimed_at NULL)
|
||||
│ │ 2) POST bytes ┌────────┐
|
||||
│ │ ──────────────────► │ B2 │ enforces policy
|
||||
│ │ ◄────────────────── │ │
|
||||
│ │ └────────┘
|
||||
│ │ 3) attach ┌────────┐
|
||||
│ │ ──────────────────► │ api │ HEAD B2 object,
|
||||
│ │ upload_ids: [id] │ │ mark claimed_at,
|
||||
│ │ └────────┘ insert image row
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
Server-side enforcement summary:
|
||||
|
||||
| Check | Where | Reject if |
|
||||
|---|---|---|
|
||||
| Auth | api middleware | unauthenticated |
|
||||
| Mime allowlist | `upload_service.go:allowedContentTypes` | not in list for category |
|
||||
| Size cap (10 MB) | api before signing + B2 policy | content_length > 10 MiB |
|
||||
| Concurrency cap (10) | `CountUnclaimedActiveForUser` | already 10 unclaimed in-flight |
|
||||
| Rate limit (50/hr) | Redis sliding window `upload:presign:<uid>:<bucket>` | 51st presign in the same hour |
|
||||
| Size at upload time | B2 (signed policy) | bytes outside content-length-range |
|
||||
| Ownership at attach | `FindUnclaimedForUser` | upload_id belongs to a different user |
|
||||
| Bytes match claim | `s3.Stat()` + bytes comparison | actual size differs from expected ±256 |
|
||||
|
||||
### Download flow (current)
|
||||
|
||||
@@ -170,34 +216,55 @@ go through the Go API.
|
||||
3. Go API fetches from B2 and streams back to the client
|
||||
|
||||
This proxies every download through the api. For high-traffic media
|
||||
that's inefficient (api becomes an egress bottleneck).
|
||||
|
||||
### Future: signed URLs
|
||||
|
||||
We could generate time-limited signed URLs for B2 objects:
|
||||
|
||||
```go
|
||||
url, err := s3Client.PresignedGetObject(ctx, bucket, key, 1*time.Hour, nil)
|
||||
```
|
||||
|
||||
Returns a URL the client can GET directly from B2, scoped to a specific
|
||||
object, valid for 1h. Saves api bandwidth and latency.
|
||||
|
||||
Not yet implemented. TODO (Chapter 20).
|
||||
that's inefficient (api becomes an egress bottleneck) — could be
|
||||
replaced with presigned GET URLs on the same bucket. Not yet shipped;
|
||||
download volume is low enough that the proxy is fine for now.
|
||||
|
||||
## Lifecycle and retention
|
||||
|
||||
We have **no lifecycle rules** set on the bucket. Objects live forever
|
||||
unless the app deletes them.
|
||||
### Orphan cleanup (`pending_uploads`)
|
||||
|
||||
When a user deletes their account, the app should delete their B2
|
||||
objects. This is currently not automated — a compliance gap for any
|
||||
"right to be forgotten" request.
|
||||
Every presign creates a row in `pending_uploads` with `expires_at =
|
||||
now + 15 min`. If the client never finishes the upload, or finishes
|
||||
but never calls the attach endpoint, the row stays unclaimed. An
|
||||
hourly cron in the worker reaps them:
|
||||
|
||||
**TODO** (Chapter 20): Either:
|
||||
- Implement explicit cleanup in the user deletion handler, or
|
||||
- Add B2 lifecycle rule tied to object metadata (tag objects with
|
||||
user ID; rule deletes tagged objects when user is soft-deleted)
|
||||
- **`maintenance:upload_cleanup`** — cron `30 * * * *`. Selects
|
||||
unclaimed rows past `expires_at`, deletes the corresponding B2
|
||||
object, deletes the row. Up to 500 per tick; the next tick picks up
|
||||
any overflow. Worker logs include `reaped` count.
|
||||
|
||||
The worker constructs a `StorageService` at startup; if storage init
|
||||
fails (e.g. `B2_KEY_ID` / `B2_APP_KEY` not wired into the worker
|
||||
deployment), the cleanup handler logs a warning and no-ops. See
|
||||
`deploy-k3s/manifests/worker/deployment.yaml` — both B2 secrets are
|
||||
required envs on this pod.
|
||||
|
||||
### Bucket lifecycle (backstop)
|
||||
|
||||
A B2 lifecycle rule on the `uploads/` prefix is the safety net if the
|
||||
worker is offline for an extended period:
|
||||
|
||||
- Hide objects 7 days after upload.
|
||||
- Delete 1 day after hidden.
|
||||
|
||||
This is configured manually via the Backblaze console (B2's S3
|
||||
lifecycle API isn't fully implemented). See
|
||||
`deploy-k3s/manifests/b2-lifecycle.md` for the exact rule and
|
||||
`b2 bucket get-info` verification command.
|
||||
|
||||
### User-deletion cascade
|
||||
|
||||
When a user deletes their account, the app deletes their `task_*` /
|
||||
`document` rows. The associated B2 objects survive — same compliance
|
||||
gap as before, not yet automated. Two approaches:
|
||||
|
||||
- Walk the image rows on user delete and `RemoveObject` each (simple,
|
||||
synchronous, slow for users with many uploads).
|
||||
- Tag objects with a `user_id` metadata header at upload time, then
|
||||
use a B2 lifecycle rule scoped to a deleted-users prefix.
|
||||
|
||||
Option 1 is the next item in the upload roadmap.
|
||||
|
||||
## Backup of B2
|
||||
|
||||
|
||||
@@ -247,6 +247,38 @@ kubectl patch secret honeydue-secrets -n honeydue \
|
||||
kubectl rollout restart -n honeydue deployment/api deployment/worker
|
||||
```
|
||||
|
||||
## One-time B2 bucket lifecycle (manual)
|
||||
|
||||
The `pending_uploads` cleanup cron (`30 * * * *` on the worker) handles
|
||||
the common case of reaping orphaned uploads. The B2 bucket lifecycle
|
||||
rule on the `uploads/` prefix is the **backstop** if the worker is
|
||||
offline for >24 hours. It's configured once via the Backblaze web
|
||||
console — B2's S3 lifecycle API isn't fully implemented, so this can't
|
||||
be in the deploy script.
|
||||
|
||||
One-time setup:
|
||||
|
||||
1. Open https://secure.backblaze.com/b2_buckets.htm → bucket
|
||||
`honeyDueProd` → **Lifecycle Settings** → **Custom**
|
||||
2. Add rule:
|
||||
- File name prefix: `uploads/`
|
||||
- Hide files older than: **7 days**
|
||||
- Delete hidden files older than: **1 day**
|
||||
|
||||
Total maximum lifetime of an orphaned object after the rule fires: 8
|
||||
days. The worker normally reaps within an hour, so the rule should
|
||||
almost never trigger.
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
# Requires the b2 CLI: brew install b2-tools
|
||||
b2 bucket get-info honeyDueProd | jq '.lifecycleRules'
|
||||
```
|
||||
|
||||
See `deploy-k3s/manifests/b2-lifecycle.md` for the canonical rule
|
||||
definition and a curl-based fallback if the b2 CLI isn't available.
|
||||
|
||||
## Manifest changes
|
||||
|
||||
When you add/modify a deployment YAML:
|
||||
|
||||
@@ -40,7 +40,7 @@ they do, and how to operate them.
|
||||
|
||||
- [07 — Services](./07-services.md) — api, admin, worker, redis per-service deep dive
|
||||
- [08 — Database](./08-database.md) — Neon Postgres, advisory-lock migrations
|
||||
- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go client details
|
||||
- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go, presigned-URL direct uploads
|
||||
- [10 — Secrets & Config](./10-secrets-config.md) — ConfigMap, Secret, env mapping
|
||||
- [11 — Registry](./11-registry.md) — Gitea container registry, multi-arch builds
|
||||
|
||||
|
||||
Reference in New Issue
Block a user