deployment: extend api startup probe budget for direct-endpoint migrations
The migration-pooler fix (commit 30966c6) routes AutoMigrate through
Neon's direct compute endpoint to keep the session-scoped advisory lock
alive. That swap means each DDL pays a fresh transatlantic RTT instead
of riding warm pooler connections, so AutoMigrate's runtime climbs from
~90s to 4-6 min on the first pod of a cold boot. With the previous 240s
grace the startup probe was killing pods mid-migration.
Bumping to 120 × 5s = 600s grace. Subsequent pods inherit the schema
and finish their migrate-no-op in seconds, so this only matters for the
single first-pod migration window after a deploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -122,11 +122,14 @@ spec:
|
||||
path: /api/health/
|
||||
port: 8000
|
||||
# MigrateWithLock in cmd/api/main.go runs pg_advisory_lock on
|
||||
# every startup. On a cold boot with 3 replicas, the first does
|
||||
# AutoMigrate (~90s) and the others wait on the lock, so real
|
||||
# startup runs 90–240s. 48 × 5s = 240s grace absorbs it without
|
||||
# healthcheck killing a still-starting replica.
|
||||
failureThreshold: 48
|
||||
# every startup against Neon's *direct* (non-pooler) endpoint,
|
||||
# because session-scoped locks don't survive PgBouncer
|
||||
# transaction-mode. AutoMigrate over a transatlantic direct
|
||||
# link runs many DDLs serially × ~110ms RTT each ≈ 4–6 min on
|
||||
# the first pod; subsequent pods see no-op migrate after
|
||||
# acquiring the same lock. 120 × 5s = 600s grace absorbs it
|
||||
# without the healthcheck killing a still-migrating replica.
|
||||
failureThreshold: 120
|
||||
periodSeconds: 5
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
|
||||
Reference in New Issue
Block a user