feat(uploads): direct-to-B2 presigned uploads with content-length-range policy
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Replaces the multipart-via-API path for image uploads with a three-step
direct-to-storage flow:

  1. Client POSTs /api/uploads/presign with content_length + content_type;
     server validates size (10 MB cap), mime allow-list per category, rate
     limit (50/hour/user via Redis sliding window), and concurrent unclaimed
     cap (10 in-flight per user). On success it persists a pending_uploads
     row, signs an S3 POST policy with content-length-range bound to the
     claimed length ±256 bytes, and returns the URL+fields.
  2. Client POSTs the bytes directly to B2 using the signed policy. B2
     enforces size, content-type, and key match before accepting.
  3. Client passes upload_ids[] to /api/task-completions/ or /api/documents/.
     Service HEADs each B2 object, verifies size matches expected_bytes
     within slack, marks pending_uploads claimed_at, and creates the
     associated TaskCompletionImage / DocumentImage rows.

Bytes never traverse our API server. The 1 MB Echo BodyLimit middleware
that was rejecting all task-completion image uploads becomes irrelevant
for this path. Existing multipart endpoints stay functional alongside,
soak-testing the new path before legacy removal.

Cleanup:
  - cmd/worker registers a new hourly cron (TypeUploadCleanup, "30 * * * *")
    that reaps pending_uploads where claimed_at IS NULL AND expires_at < NOW().
    Reaps both the B2 object and the row.
  - B2 bucket lifecycle rule on `uploads/` prefix (7 days hide → 1 day delete)
    documented in deploy-k3s/manifests/b2-lifecycle.md as a backstop.

Schema:
  - migrations/000002_pending_uploads.sql adds the table + partial index for
    cleanup + nullable pending_upload_id FKs on task_taskcompletionimage and
    task_documentimage.

Policy (single tier, no free/pro split):
  - 10 MB cap per upload
  - 50 presigns/hour/user
  - 10 concurrent unclaimed uploads/user
  - allow-list: jpeg/png/heic/heif/webp for image categories;
    + pdf for document_file

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-05-01 14:36:42 -07:00
parent 9bee436e86
commit 29c9014a33
20 changed files with 1032 additions and 9 deletions
+2
View File
@@ -91,6 +91,8 @@ type DocumentImage struct {
DocumentID uint `gorm:"column:document_id;index;not null" json:"document_id"`
ImageURL string `gorm:"column:image_url;size:500;not null" json:"image_url"`
Caption string `gorm:"column:caption;size:255" json:"caption"`
// PendingUploadID — see TaskCompletionImage.PendingUploadID.
PendingUploadID *uint `gorm:"column:pending_upload_id" json:"pending_upload_id,omitempty"`
}
// TableName returns the table name for GORM
+53
View File
@@ -0,0 +1,53 @@
package models
import "time"
// UploadCategory enumerates the kinds of objects that can be uploaded via the
// presigned-URL flow. Each category has its own size cap and mime-type
// allow-list enforced at the service layer.
type UploadCategory string
const (
UploadCategoryCompletion UploadCategory = "completion"
UploadCategoryDocumentImage UploadCategory = "document_image"
UploadCategoryDocumentFile UploadCategory = "document_file"
)
// PendingUpload is a short-lived upload session created when the client asks
// for a presigned POST policy. The row tracks the intent so the server can
// validate quota / rate-limit / size up front, then attach the resulting B2
// object to a task_completion_image or document_image once the upload lands.
//
// Lifecycle:
//
// created → upload to B2 → attach via /api/task-completions/ or /documents/
// ↑ │
// └─ if not claimed before expires_at, the cleanup worker (see
// internal/worker/jobs) deletes the B2 object and the row.
type PendingUpload struct {
ID uint `gorm:"primaryKey" json:"id"`
UserID uint `gorm:"column:user_id;not null;index:idx_pending_uploads_user_created,priority:1" json:"user_id"`
Category UploadCategory `gorm:"column:category;size:32;not null" json:"category"`
B2Key string `gorm:"column:b2_key;size:255;uniqueIndex" json:"b2_key"`
ContentType string `gorm:"column:content_type;size:127;not null" json:"content_type"`
ExpectedBytes int64 `gorm:"column:expected_bytes;not null" json:"expected_bytes"`
ActualBytes *int64 `gorm:"column:actual_bytes" json:"actual_bytes,omitempty"`
ClaimedAt *time.Time `gorm:"column:claimed_at" json:"claimed_at,omitempty"`
CreatedAt time.Time `gorm:"column:created_at;autoCreateTime;index:idx_pending_uploads_user_created,priority:2,sort:desc" json:"created_at"`
ExpiresAt time.Time `gorm:"column:expires_at;not null" json:"expires_at"`
}
// TableName matches the goose migration.
func (PendingUpload) TableName() string {
return "pending_uploads"
}
// IsClaimed reports whether the upload has been linked to a real entity.
func (p *PendingUpload) IsClaimed() bool {
return p.ClaimedAt != nil
}
// IsExpired reports whether the upload session has passed its TTL.
func (p *PendingUpload) IsExpired(now time.Time) bool {
return now.After(p.ExpiresAt)
}
+4
View File
@@ -215,6 +215,10 @@ type TaskCompletionImage struct {
CompletionID uint `gorm:"column:completion_id;index;not null" json:"completion_id"`
ImageURL string `gorm:"column:image_url;size:500;not null" json:"image_url"`
Caption string `gorm:"column:caption;size:255" json:"caption"`
// PendingUploadID links to the pending_uploads row that produced this
// image when uploaded via the presigned-URL flow. Nullable: legacy rows
// uploaded through the multipart path don't have one.
PendingUploadID *uint `gorm:"column:pending_upload_id" json:"pending_upload_id,omitempty"`
}
// TableName returns the table name for GORM