# 08 — Database (Neon Postgres) ## Summary Authoritative user data lives in a Neon-managed Postgres database in AWS us-east-1. Connections use TLS (`DB_SSLMODE=require`). Schema is managed via GORM AutoMigrate inside the api binary, coordinated across replicas by a Postgres advisory lock to prevent concurrent migration attempts. ## Why Neon ### Decision matrix At deploy time we considered: | Option | Setup effort | Monthly cost | Backup/PITR | Scale ceiling | Notes | |---|---|---|---|---|---| | **Neon Launch** | Zero (managed) | $5-15 | Included | Large | **Picked** | | Postgres on a Hetzner VPS | High | $8 (VPS) | Manual | Medium | More ops | | AWS RDS | Medium | $30+ | Included | Huge | Overkill, expensive | | Supabase Free | Zero | $0 | Limited | Small | Free tier has quota limits | | CNPG on our k3s | High (Helm) | $0 (using cluster) | Self-rolled | Medium | Operational burden | Neon Launch won on: - **Serverless**: scales compute to zero when idle (cheap) - **Branch databases**: we can create dev/staging branches from prod in seconds - **Connection pooling built-in**: PgBouncer on the hostname suffix `-pooler` - **Point-in-time recovery** included (paid tier) - **Pay-as-you-go** with a $5 minimum — fits a bootstrapped app ### Connection details | Field | Value | |---|---| | Hostname | `ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech` | | Port | 5432 | | Username | `neondb_owner` | | Database | `honeyDue` (case-sensitive!) | | TLS mode | `require` (enforced by Neon; app pg driver verifies) | | Branch | production (Neon's concept — isolated DB within the project) | ### The database name is case-sensitive Postgres identifiers are lowercase unless quoted. Neon's UI created the database as `"honeyDue"` (quoted, camelCase preserved). In `prod.env` / ConfigMap we must use exactly `POSTGRES_DB=honeyDue` — lowercase `honeydue` gets a `database "honeydue" does not exist` error. This bit us during the initial Swarm deploy (Chapter 19 §Neon DB name). ## Connection pooling ### Why it matters Postgres is memory-hungry per connection (~5-10 MB each). 3 api replicas × `DB_MAX_OPEN_CONNS=25` = up to 75 direct Postgres connections. Add the worker's 25. Neon's free tier caps at 100 concurrent connections; paid tiers much higher. ### PgBouncer on Neon Neon provides a built-in PgBouncer at `-pooler` subdomain. Our hostname already includes `-pooler` handling in the route, so connections go through PgBouncer transparently. Modes PgBouncer supports: - **session** — one server connection held per client session (transparent) - **transaction** — server connection released after each transaction (high-throughput) - **statement** — per-statement (most aggressive; breaks many features) Neon's pooler runs in **transaction mode**. This is compatible with GORM out of the box (we don't use session-level features like prepared statements or session variables). ### Connection pool settings In `prod.env`: ``` DB_MAX_OPEN_CONNS=25 DB_MAX_IDLE_CONNS=10 DB_MAX_LIFETIME=600s ``` These are the Go `database/sql` pool settings (GORM uses `database/sql` underneath): - **MaxOpenConns: 25** — at most 25 concurrent connections per replica - **MaxIdleConns: 10** — keep up to 10 warm connections ready to reuse - **MaxLifetime: 600s** — recycle connections after 10 min (prevents stale state in long-lived connections, good for Neon's idle timeout) ### Worst-case connection count 3 api + 1 worker replicas × 25 conns = 100 peak. Right at Neon free tier's ceiling, with zero margin. **This is a real risk** — a spike that saturates the pool on all replicas simultaneously would exhaust Neon's limit. Mitigations to consider: - Drop `DB_MAX_OPEN_CONNS` to 15 → 60 peak. Safe on free tier. - Upgrade to Neon Scale plan (1000+ connections). - Rely on Neon's PgBouncer to multiplex — the raw backend connections to Postgres-proper are pooled, not our TCP connections to Neon. Currently we trust Neon's pooler to handle the multiplexing and run with the default 25/10. If we hit connection errors in prod, adjust. ## Schema management ### GORM AutoMigrate On startup, the Go API's `cmd/api/main.go` calls `database.MigrateWithLock()` which: 1. Opens a dedicated Postgres connection 2. `SELECT pg_advisory_lock(1751412071)` — acquires a session-level advisory lock on a hardcoded key 3. Calls `db.AutoMigrate(&models.*{})` for every GORM model 4. `SELECT pg_advisory_unlock(...)` via deferred function 5. Close the connection The advisory lock serializes migrations across replicas: when 3 api pods start simultaneously, one acquires the lock and migrates; the others block on the lock. Once the first finishes (≤2s for already- migrated schema, up to 90s on first cold boot), the next acquires and sees the schema is current (no-op migrate). ### Why an advisory lock Without it, concurrent `CREATE TABLE IF NOT EXISTS ...` statements from multiple replicas would race — Postgres usually handles it, but GORM's AutoMigrate also alters tables (adds columns, indexes) which can deadlock under concurrency. The advisory lock pattern (also used by Rails + Django + Alembic) is the canonical solution. ### The lock key `1751412071` is a hardcoded integer in `internal/database/database.go`. Arbitrary but unique — as long as nothing else in the Postgres instance uses the same advisory lock key, no conflicts. ### First-boot behavior On a **fresh database** (new Neon project), the first api pod runs through every model's `CREATE TABLE` statement. This is ~50 tables for honeyDue and takes ~90 seconds. On a **warm database** (tables already exist), AutoMigrate is fast — typically under 2 seconds. It still runs (GORM checks every model against the schema) but finds no work to do. ### Where this bit us With 3 api pods starting simultaneously and migrations taking 90s first time, the lock queue for the last replica is ~180s. We needed a startupProbe grace of 240s to cover this without false restart loops. See Chapter 7 §startupProbe and Chapter 19 §MigrateWithLock. ### Downside: no schema versioning AutoMigrate can only *add* — new tables, new columns, new indexes. It won't drop columns, rename them, or change types destructively. For those we'd need raw SQL migrations (a tool like `golang-migrate` or `dbmate`). Today: we accept that schema changes are additive-only. When we need destructive changes, we'd hand-write them. ## What's in the database Major tables (see `honeyDueAPI-go/internal/models/`): | Table | Purpose | |---|---| | `auth_user` | Users (Django legacy name kept for compatibility) | | `user_userprofile` | Profile data | | `authtoken_token` | API auth tokens | | `residence_residence` | Properties users manage | | `task_task` | Maintenance tasks | | `task_taskcompletion` | Task completion history | | `contractor_contractor` | Contractor contacts | | `documents_document` | Document records (files in B2) | | `notification_notification` | In-app notifications | | `subscription_usersubscription` | IAP subscriptions | | `admin_users` | Next.js admin panel users | See `honeyDueAPI-go/docs/TASK_LOGIC_ARCHITECTURE.md` for the task logic model details. ## Backup and recovery ### Neon's built-in Neon Launch includes **point-in-time recovery** within the last 24h (longer on Scale plan). To restore: 1. Go to Neon console → project → Backups 2. Create a branch from a timestamp 3. Point the app at the new branch (change `DB_HOST` in our ConfigMap) Done. No tape-wrangling. ### What we don't have - Off-site backup (if Neon itself is compromised, we have no exfil). A nightly `pg_dump` to Backblaze B2 would close this gap. **TODO** (Chapter 20). - Tested DR drills. We've never actually restored from a Neon backup into a new branch and pointed the app at it. Should be routine; hasn't been exercised. ## Migrations from old MyCrib/Casera data honeyDue originally ran on a Django codebase (MyCrib / Casera-era). The schema inherits Django's naming (`app_model` table names, `_id` suffix foreign keys). The Go app's GORM models have `TableName()` methods that preserve this: ```go func (Task) TableName() string { return "task_task" } ``` This isn't ideal (GORM's default `tasks` would be cleaner), but changing would require a migration that renames every table — more risk than value. ## Neon regions Neon's default region for new projects is `aws-us-east-1` (Virginia). Our DB is there. Latency from Nuremberg to us-east-1 is **~90-120ms round trip**. This is the slowest hop in our data flow. Every api request that needs a DB query (most of them) pays this latency at least once. **When this matters**: When we start seeing ~200ms+ response times from complex endpoints, it's likely DB latency dominant. Options: - Migrate Neon to `aws-eu-central-1` (Frankfurt) — shaves ~90ms off - Add Redis caching for hot reads (Chapter 7) - Read replicas (Neon supports them on paid tiers) ## Environment variables the app reads From ConfigMap: | Var | Purpose | |---|---| | `DB_HOST` | Neon pooler hostname | | `DB_PORT` | 5432 | | `POSTGRES_USER` | `neondb_owner` | | `POSTGRES_DB` | `honeyDue` | | `DB_SSLMODE` | `require` | | `DB_MAX_OPEN_CONNS` | 25 | | `DB_MAX_IDLE_CONNS` | 10 | | `DB_MAX_LIFETIME` | `600s` | From Secret (`honeydue-secrets`): | Var | Purpose | |---|---| | `POSTGRES_PASSWORD` | Neon DB password | ## Operator cheat sheet ```bash # Connect to Neon from workstation (requires psql + the password) PGPASSWORD="" psql -h ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \ -U neondb_owner -d honeyDue # From a pod (lets you debug against the actual in-cluster network path) kubectl exec -n honeydue -it deploy/api -- sh # inside the pod (no psql by default, but wget + JSON API works) wget -qO- http://127.0.0.1:8000/api/health/ # See current migration state (no direct CLI, but the api logs show it) kubectl logs -n honeydue deploy/api | grep -i migration # See active connections (run against Neon) SELECT count(*), usename, state, application_name FROM pg_stat_activity GROUP BY usename, state, application_name; ``` ## References - [Neon docs][neon-docs] - [Neon pricing][neon-pricing] - [Postgres advisory locks][pg-locks] - [GORM AutoMigrate][gorm-automigrate] - [honeyDue task architecture][task-arch] (repo-local) [neon-docs]: https://neon.com/docs/introduction [neon-pricing]: https://neon.com/pricing [pg-locks]: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS [gorm-automigrate]: https://gorm.io/docs/migration.html [task-arch]: ../../docs/TASK_LOGIC_ARCHITECTURE.md