29c9014a33
Replaces the multipart-via-API path for image uploads with a three-step
direct-to-storage flow:
1. Client POSTs /api/uploads/presign with content_length + content_type;
server validates size (10 MB cap), mime allow-list per category, rate
limit (50/hour/user via Redis sliding window), and concurrent unclaimed
cap (10 in-flight per user). On success it persists a pending_uploads
row, signs an S3 POST policy with content-length-range bound to the
claimed length ±256 bytes, and returns the URL+fields.
2. Client POSTs the bytes directly to B2 using the signed policy. B2
enforces size, content-type, and key match before accepting.
3. Client passes upload_ids[] to /api/task-completions/ or /api/documents/.
Service HEADs each B2 object, verifies size matches expected_bytes
within slack, marks pending_uploads claimed_at, and creates the
associated TaskCompletionImage / DocumentImage rows.
Bytes never traverse our API server. The 1 MB Echo BodyLimit middleware
that was rejecting all task-completion image uploads becomes irrelevant
for this path. Existing multipart endpoints stay functional alongside,
soak-testing the new path before legacy removal.
Cleanup:
- cmd/worker registers a new hourly cron (TypeUploadCleanup, "30 * * * *")
that reaps pending_uploads where claimed_at IS NULL AND expires_at < NOW().
Reaps both the B2 object and the row.
- B2 bucket lifecycle rule on `uploads/` prefix (7 days hide → 1 day delete)
documented in deploy-k3s/manifests/b2-lifecycle.md as a backstop.
Schema:
- migrations/000002_pending_uploads.sql adds the table + partial index for
cleanup + nullable pending_upload_id FKs on task_taskcompletionimage and
task_documentimage.
Policy (single tier, no free/pro split):
- 10 MB cap per upload
- 50 presigns/hour/user
- 10 concurrent unclaimed uploads/user
- allow-list: jpeg/png/heic/heif/webp for image categories;
+ pdf for document_file
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
1.8 KiB
Markdown
58 lines
1.8 KiB
Markdown
# B2 bucket lifecycle — `uploads/` prefix
|
|
|
|
The `pending_uploads` cleanup worker (cron `30 * * * *`, see
|
|
`internal/worker/jobs/handler.go::HandleUploadCleanup`) reaps unclaimed
|
|
upload sessions every hour, deleting both the row and the corresponding B2
|
|
object. This bucket-level lifecycle rule is a **backstop** — it catches B2
|
|
objects that survive the row deletion (e.g. worker crashed mid-loop, B2
|
|
delete errored, manual DB tampering).
|
|
|
|
## Rule
|
|
|
|
Apply via the Backblaze web console: **Bucket → `honeyDueProd` → Lifecycle Settings → Custom**
|
|
|
|
```json
|
|
[
|
|
{
|
|
"fileNamePrefix": "uploads/",
|
|
"daysFromUploadingToHiding": 7,
|
|
"daysFromHidingToDeleting": 1
|
|
}
|
|
]
|
|
```
|
|
|
|
Effect: any object under the `uploads/` prefix is hidden 7 days after
|
|
upload, then permanently deleted 1 day after that. Total maximum lifetime
|
|
of an orphaned object: 8 days.
|
|
|
|
This rule does NOT affect:
|
|
|
|
- `images/`, `documents/`, `completions/` — legacy multipart-uploaded
|
|
objects, which are managed by the existing `task_completion_image` /
|
|
`document_image` / `document.file_url` references.
|
|
|
|
## Why a backstop, not the primary mechanism
|
|
|
|
The application worker is the primary mechanism because:
|
|
|
|
1. It can delete the **DB row** alongside the B2 object — lifecycle alone
|
|
would leave dangling `pending_uploads` rows.
|
|
2. It runs hourly vs. lifecycle's once-per-day evaluation — much tighter
|
|
recovery window for the common case.
|
|
3. It produces logs / metrics for orphan rate observability.
|
|
|
|
## Verification
|
|
|
|
After applying:
|
|
|
|
```bash
|
|
b2 bucket get-info honeyDueProd | jq '.lifecycleRules'
|
|
```
|
|
|
|
Should show the rule above. If you don't have the B2 CLI:
|
|
|
|
```bash
|
|
curl -u "$B2_KEY_ID:$B2_APP_KEY" https://api.backblazeb2.com/b2api/v3/b2_authorize_account
|
|
# Then use the returned authorization_token + apiUrl to call b2_get_bucket
|
|
```
|