Next.js bakes NEXT_PUBLIC_* vars into the client JS bundle at build
time, not runtime. The admin image was being built with
admin/.env.local containing NEXT_PUBLIC_API_URL=http://localhost:8000,
hardcoding localhost into the browser bundle. The runtime configMap
value had no effect on the already-compiled JS, causing prod admin
login to throw CORS errors hitting localhost.
Fix:
- Dockerfile: admin-builder stage accepts ARG NEXT_PUBLIC_API_URL and
strips any committed .env.local/.env.development.local before
npm run build.
- .dockerignore: explicitly exclude admin/.env.* (root-level .env.*
pattern doesn't match nested paths), so a local dev .env.local can
never sneak into the build context again.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swarm stack
- Resource limits on all services, stop_grace_period 60s on api/worker/admin
- Dozzle bound to manager loopback only (ssh -L required for access)
- Worker health server on :6060, admin /api/health endpoint
- Redis 200M LRU cap, B2/S3 env vars wired through to api service
Deploy script
- DRY_RUN=1 prints plan + exits
- Auto-rollback on failed healthcheck, docker logout at end
- Versioned-secret pruning keeps last SECRET_KEEP_VERSIONS (default 3)
- PUSH_LATEST_TAG default flipped to false
- B2 all-or-none validation before deploy
Code
- cmd/api takes pg_advisory_lock on a dedicated connection before
AutoMigrate, serialising boot-time migrations across replicas
- cmd/worker exposes an HTTP /health endpoint with graceful shutdown
Docs
- deploy/DEPLOYING.md: step-by-step walkthrough for a real deploy
- deploy/shit_deploy_cant_do.md: manual prerequisites + recurring ops
- deploy/README.md updated with storage toggle, worker-replica caveat,
multi-arch recipe, connection-pool tuning, renumbered sections
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>