Files
honeyDueAPI/deploy-k3s/scripts/_config.sh
T
Trey t 88fb1751c7
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Cut /api/tasks/ p99 from ~2500ms toward ~150-300ms
Stack of optimizations against the same Hetzner→Neon transatlantic link.
The trace revealed every visible ms was network/proxy overhead — DB
execution itself is sub-millisecond per query (verified via EXPLAIN
ANALYZE: index scans on every hot path).

Connection layer:
- DB_HOST → Neon pooler endpoint (-pooler suffix). PgBouncer
  transaction-mode keeps backend Postgres connections warm so we no
  longer pay the ~110ms Postgres-startup RTT on cold queries.
- GORM pool tuned: MaxIdleConns 10→20, MaxLifetime 600s→1800s,
  MaxIdleTime added (default 0 = never close idle).
- Eager pool warm-up at boot via parallel pings — first user request
  no longer pays the ~440ms TCP+TLS+startup handshake.
- Redis maxmemory-policy noeviction → allkeys-lru. Cache writes will
  evict cold keys instead of erroring at the 256MB limit.

Auth layer:
- TokenCacheTTL 5min → 1 hour (Redis token cache).
- UserCacheTTL 30s → 5min (in-memory User cache, per pod).
- UserCache gains a 5,000-entry LRU cap so a flood of unique users
  can't blow up pod RSS. ~5MB worst-case per pod.
- Token + user lookup collapsed from 2 GORM Preload queries into a
  single INNER JOIN. Saves 1 RTT per cold-cache request.
- Auth middleware's m.db.* now use db.WithContext(ctx) so the SQL
  spans nest under the parent HTTP request in Jaeger.

Service layer:
- TaskService.ListTasks: replaced two-step
  FindResidenceIDsByUser → GetKanbanDataForMultipleResidences
  with a single GetKanbanDataForUser that uses a Postgres subquery
  for residence-access. One round-trip instead of two.
- New CacheService residence-IDs cache: \"residence_ids_user:<id>\"
  with 5-min TTL. Wired into Task/Residence/Contractor/Document
  services for the four hot read paths that need this list.
- Cache invalidation on every relevant mutation: CreateResidence,
  DeleteResidence, JoinWithCode, RemoveUser. DeleteResidence
  invalidates every member of the residence, not just the owner.

What this stacks up to (Hetzner→Neon, before US migration):
  Path                                 Before        After (target)
  Cache-warm authed read               ~800ms        ~100-200ms
  Cache-cold authed read (1st in 1hr)  ~2500ms       ~500-700ms
  First request after deploy           ~2500ms       ~700-900ms

The endgame US-region migration on top of this gets us to ~30-50ms
warm-cache, but we're shippable at ~150ms warm right now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 17:13:50 -05:00

228 lines
7.2 KiB
Bash
Executable File

#!/usr/bin/env bash
# Shared config helper — sourced by all deploy scripts.
# Provides cfg() to read values from config.yaml.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DEPLOY_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
CONFIG_FILE="${DEPLOY_DIR}/config.yaml"
if [[ ! -f "${CONFIG_FILE}" ]]; then
if [[ -f "${CONFIG_FILE}.example" ]]; then
echo "[error] config.yaml not found. Run: cp config.yaml.example config.yaml" >&2
else
echo "[error] config.yaml not found." >&2
fi
exit 1
fi
# cfg "dotted.key.path" — reads a value from config.yaml
# Examples: cfg database.host, cfg nodes.0.ip, cfg features.push_enabled
cfg() {
python3 -c "
import yaml, json, sys
with open(sys.argv[1]) as f:
c = yaml.safe_load(f)
keys = sys.argv[2].split('.')
v = c
for k in keys:
if isinstance(v, list):
v = v[int(k)]
else:
v = v[k]
if isinstance(v, bool):
print(str(v).lower())
elif isinstance(v, (dict, list)):
print(json.dumps(v))
else:
print('' if v is None else v)
" "${CONFIG_FILE}" "$1" 2>/dev/null
}
# cfg_require "key" "label" — reads value and dies if empty
cfg_require() {
local val
val="$(cfg "$1")"
if [[ -z "${val}" ]]; then
echo "[error] Missing required config: $1 ($2)" >&2
exit 1
fi
printf '%s' "${val}"
}
# node_count — returns number of nodes
node_count() {
python3 -c "
import yaml
with open('${CONFIG_FILE}') as f:
c = yaml.safe_load(f)
print(len(c.get('nodes', [])))
"
}
# nodes_with_role "role" — returns node names with a given role
nodes_with_role() {
python3 -c "
import yaml
with open('${CONFIG_FILE}') as f:
c = yaml.safe_load(f)
for n in c.get('nodes', []):
if '$1' in n.get('roles', []):
print(n['name'])
"
}
# generate_env — writes the flat env file the app expects to stdout
generate_env() {
python3 -c "
import yaml
with open('${CONFIG_FILE}') as f:
c = yaml.safe_load(f)
d = c['domains']
db = c['database']
em = c['email']
ps = c['push']
st = c['storage']
wk = c['worker']
ft = c['features']
aa = c.get('apple_auth', {})
ga = c.get('google_auth', {})
rd = c.get('redis', {})
def b(v):
return str(v).lower() if isinstance(v, bool) else str(v)
def val(v):
return '' if v is None else str(v)
lines = [
# API
'DEBUG=false',
f\"ALLOWED_HOSTS={d['api']},{d['base']}\",
f\"CORS_ALLOWED_ORIGINS=https://{d['base']},https://{d['admin']}\",
'TIMEZONE=UTC',
f\"BASE_URL=https://{d['base']}\",
'PORT=8000',
# Admin
f\"NEXT_PUBLIC_API_URL=https://{d['api']}\",
f\"ADMIN_PANEL_URL=https://{d['admin']}\",
# Web (app.myhoneydue.com) — server-side proxy target; browser never sees this
f\"API_URL=https://{d['api']}/api\",
# Database
f\"DB_HOST={val(db['host'])}\",
f\"DB_PORT={db['port']}\",
f\"POSTGRES_USER={val(db['user'])}\",
f\"POSTGRES_DB={db['name']}\",
f\"DB_SSLMODE={db['sslmode']}\",
f\"DB_MAX_OPEN_CONNS={db['max_open_conns']}\",
f\"DB_MAX_IDLE_CONNS={db['max_idle_conns']}\",
f\"DB_MAX_LIFETIME={db['max_lifetime']}\",
f\"DB_MAX_IDLE_TIME={db.get('max_idle_time', '0s')}\",
# Redis (in-namespace DNS short form — password injected if configured;
# short form works because /etc/resolv.conf in pods searches honeydue.svc.cluster.local)
f\"REDIS_URL=redis://{':%s@' % val(rd.get('password')) if rd.get('password') else ''}redis:6379/0\",
'REDIS_DB=0',
# Email
f\"EMAIL_HOST={em['host']}\",
f\"EMAIL_PORT={em['port']}\",
f\"EMAIL_USE_TLS={b(em['use_tls'])}\",
f\"EMAIL_HOST_USER={val(em['user'])}\",
f\"DEFAULT_FROM_EMAIL={val(em['from'])}\",
# Push
'APNS_AUTH_KEY_PATH=/secrets/apns/apns_auth_key.p8',
f\"APNS_AUTH_KEY_ID={val(ps['apns_key_id'])}\",
f\"APNS_TEAM_ID={val(ps['apns_team_id'])}\",
f\"APNS_TOPIC={ps['apns_topic']}\",
f\"APNS_USE_SANDBOX={b(ps['apns_use_sandbox'])}\",
f\"APNS_PRODUCTION={b(ps['apns_production'])}\",
# Worker
f\"TASK_REMINDER_HOUR={wk['task_reminder_hour']}\",
f\"OVERDUE_REMINDER_HOUR={wk['overdue_reminder_hour']}\",
f\"DAILY_DIGEST_HOUR={wk['daily_digest_hour']}\",
# B2 Storage
# B2_KEY_ID and B2_APP_KEY are intentionally NOT emitted into the
# ConfigMap — they're credentials and belong in honeydue-secrets
# (set by 02-setup-secrets.sh). Wire them into the api/worker
# deployments via envFrom: secretRef when B2 uploads need to be
# active. Leaving them in cleartext here would leak via
# \"kubectl get cm\".
f\"B2_BUCKET_NAME={val(st['b2_bucket'])}\",
f\"B2_ENDPOINT={val(st['b2_endpoint'])}\",
f\"B2_REGION={val(st.get('b2_region'))}\",
f\"B2_USE_SSL={b(st.get('b2_use_ssl', True))}\",
f\"STORAGE_MAX_FILE_SIZE={st['max_file_size']}\",
f\"STORAGE_ALLOWED_TYPES={st['allowed_types']}\",
f\"STORAGE_UPLOAD_DIR={val(st.get('upload_dir', '/app/uploads'))}\",
f\"STORAGE_BASE_URL={val(st.get('base_url', '/uploads'))}\",
f\"STATIC_DIR={val(st.get('static_dir', '/app/static'))}\",
# Features
f\"FEATURE_PUSH_ENABLED={b(ft['push_enabled'])}\",
f\"FEATURE_EMAIL_ENABLED={b(ft['email_enabled'])}\",
f\"FEATURE_WEBHOOKS_ENABLED={b(ft['webhooks_enabled'])}\",
f\"FEATURE_ONBOARDING_EMAILS_ENABLED={b(ft['onboarding_emails_enabled'])}\",
f\"FEATURE_PDF_REPORTS_ENABLED={b(ft['pdf_reports_enabled'])}\",
f\"FEATURE_WORKER_ENABLED={b(ft['worker_enabled'])}\",
# Apple auth/IAP
f\"APPLE_CLIENT_ID={val(aa.get('client_id'))}\",
f\"APPLE_TEAM_ID={val(aa.get('team_id'))}\",
f\"APPLE_IAP_KEY_ID={val(aa.get('iap_key_id'))}\",
f\"APPLE_IAP_ISSUER_ID={val(aa.get('iap_issuer_id'))}\",
f\"APPLE_IAP_BUNDLE_ID={val(aa.get('iap_bundle_id'))}\",
f\"APPLE_IAP_KEY_PATH={val(aa.get('iap_key_path'))}\",
f\"APPLE_IAP_SANDBOX={b(aa.get('iap_sandbox', False))}\",
# Google auth/IAP
f\"GOOGLE_CLIENT_ID={val(ga.get('client_id'))}\",
f\"GOOGLE_ANDROID_CLIENT_ID={val(ga.get('android_client_id'))}\",
f\"GOOGLE_IOS_CLIENT_ID={val(ga.get('ios_client_id'))}\",
f\"GOOGLE_IAP_PACKAGE_NAME={val(ga.get('iap_package_name'))}\",
f\"GOOGLE_IAP_SERVICE_ACCOUNT_PATH={val(ga.get('iap_service_account_path'))}\",
]
print('\n'.join(lines))
"
}
# generate_cluster_config — writes hetzner-k3s YAML to stdout
generate_cluster_config() {
python3 -c "
import yaml
with open('${CONFIG_FILE}') as f:
c = yaml.safe_load(f)
cl = c['cluster']
config = {
'cluster_name': 'honeydue',
'kubeconfig_path': './kubeconfig',
'k3s_version': cl['k3s_version'],
'networking': {
'ssh': {
'port': 22,
'use_agent': False,
'public_key_path': cl['ssh_public_key'],
'private_key_path': cl['ssh_private_key'],
},
'allowed_networks': {
'ssh': ['0.0.0.0/0'],
'api': ['0.0.0.0/0'],
},
},
'api_server_hostname': '',
'schedule_workloads_on_masters': True,
'masters_pool': {
'instance_type': cl['instance_type'],
'instance_count': len(c.get('nodes', [])),
'location': cl['location'],
'image': 'ubuntu-24.04',
},
'additional_packages': ['open-iscsi'],
'post_create_commands': ['sudo systemctl enable --now iscsid'],
'k3s_config_file': 'secrets-encryption: true\n',
}
print(yaml.dump(config, default_flow_style=False, sort_keys=False))
"
}