deployment: extend api startup probe budget for direct-endpoint migrations
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

The migration-pooler fix (commit 30966c6) routes AutoMigrate through
Neon's direct compute endpoint to keep the session-scoped advisory lock
alive. That swap means each DDL pays a fresh transatlantic RTT instead
of riding warm pooler connections, so AutoMigrate's runtime climbs from
~90s to 4-6 min on the first pod of a cold boot. With the previous 240s
grace the startup probe was killing pods mid-migration.

Bumping to 120 × 5s = 600s grace. Subsequent pods inherit the schema
and finish their migrate-no-op in seconds, so this only matters for the
single first-pod migration window after a deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-26 22:05:58 -05:00
parent 30966c6f5e
commit a94744061e
+8 -5
View File
@@ -122,11 +122,14 @@ spec:
path: /api/health/ path: /api/health/
port: 8000 port: 8000
# MigrateWithLock in cmd/api/main.go runs pg_advisory_lock on # MigrateWithLock in cmd/api/main.go runs pg_advisory_lock on
# every startup. On a cold boot with 3 replicas, the first does # every startup against Neon's *direct* (non-pooler) endpoint,
# AutoMigrate (~90s) and the others wait on the lock, so real # because session-scoped locks don't survive PgBouncer
# startup runs 90240s. 48 × 5s = 240s grace absorbs it without # transaction-mode. AutoMigrate over a transatlantic direct
# healthcheck killing a still-starting replica. # link runs many DDLs serially × ~110ms RTT each ≈ 46 min on
failureThreshold: 48 # the first pod; subsequent pods see no-op migrate after
# acquiring the same lock. 120 × 5s = 600s grace absorbs it
# without the healthcheck killing a still-migrating replica.
failureThreshold: 120
periodSeconds: 5 periodSeconds: 5
readinessProbe: readinessProbe:
httpGet: httpGet: