Harden prod deploy: versioned secrets, healthchecks, migration lock, dry-run

Swarm stack
- Resource limits on all services, stop_grace_period 60s on api/worker/admin
- Dozzle bound to manager loopback only (ssh -L required for access)
- Worker health server on :6060, admin /api/health endpoint
- Redis 200M LRU cap, B2/S3 env vars wired through to api service

Deploy script
- DRY_RUN=1 prints plan + exits
- Auto-rollback on failed healthcheck, docker logout at end
- Versioned-secret pruning keeps last SECRET_KEEP_VERSIONS (default 3)
- PUSH_LATEST_TAG default flipped to false
- B2 all-or-none validation before deploy

Code
- cmd/api takes pg_advisory_lock on a dedicated connection before
  AutoMigrate, serialising boot-time migrations across replicas
- cmd/worker exposes an HTTP /health endpoint with graceful shutdown

Docs
- deploy/DEPLOYING.md: step-by-step walkthrough for a real deploy
- deploy/shit_deploy_cant_do.md: manual prerequisites + recurring ops
- deploy/README.md updated with storage toggle, worker-replica caveat,
  multi-arch recipe, connection-pool tuning, renumbered sections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-14 15:22:43 -05:00
parent ca818e8478
commit 33eee812b6
11 changed files with 908 additions and 30 deletions

View File

@@ -65,8 +65,10 @@ func main() {
log.Error().Err(dbErr).Msg("Failed to connect to database - API will start but database operations will fail")
} else {
defer database.Close()
// Run database migrations only if connected
if err := database.Migrate(); err != nil {
// Run database migrations only if connected.
// MigrateWithLock serialises parallel replica starts via a Postgres
// advisory lock so concurrent AutoMigrate calls don't race on DDL.
if err := database.MigrateWithLock(); err != nil {
log.Error().Err(err).Msg("Failed to run database migrations")
}
}