Files
honeyDueAPI/deploy-k3s/manifests/network-policies.yaml
T
Trey t 6de90acef7
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
feat(kratos): deploy Ory Kratos to production (Apple-only OIDC)
Auth was structurally broken — the api's Kratos middleware was pointing
at http://kratos:4433 but Kratos wasn't deployed. The only thing keeping
users logged in was a 5-min Redis cache; once it expired the middleware
called Whoami → no DNS → 401 → forced relogin with no path back.

This commit deploys Kratos for real:

Manifests:
  - kratos.yaml + migrate-job.yaml: pin oryd/kratos:v26.2.0@sha256:92eedc...
    (CalVer current stable as of 2026-06-03)
  - configmap.yaml: drop Google OIDC provider (not in scope); fill the
    Apple provider with real Services ID / Team ID / Key ID — Apple now
    sits at providers[0]
  - kratos.yaml: drop the Google-secret env binding; rebind APPLE_PRIVATE_KEY
    to PROVIDERS_0_APPLE_PRIVATE_KEY (shifted from index 1)
  - network-policies.yaml: add a kratos egress rule to allow-egress-from-api.
    Without this, even with kratos running, the api gets "connection refused"
    on http://kratos:4433 (post-DNAT NetworkPolicy enforcement — runbook §9.2).

Operator prerequisites that were completed alongside this commit:
  - Neon kratos database created (separate from honeyDue, owner neondb_owner)
  - Cloudflare DNS for auth.myhoneydue.com (3 A records, proxied)
  - kratos: block added to config.yaml (gitignored): DSN to the Neon DIRECT
    endpoint, cookie + cipher secrets generated, Fastmail SMTPS URI,
    .p8 contents inline

Out of scope intentionally:
  - Google sign-in (additive; can append providers[] later)
  - Migrating existing auth_user rows onto Kratos identities — pre-prod;
    existing users will need to sign in fresh, which creates a new Kratos
    identity and a new local user row (per migration plan in
    manifests/kratos/README.md).

Verified end-to-end:
  - 338 schema migrations applied successfully
  - 2/2 kratos pods Ready
  - api → kratos:4433/sessions/whoami returns 401 for invalid token (was
    "connection refused" before this commit's NetworkPolicy patch)
  - auth.myhoneydue.com resolves through CF; cloudflare-only middleware
    keeps the origin protected exactly like the other hostnames

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 11:08:09 -05:00

443 lines
11 KiB
YAML

# Network Policies — default-deny with explicit allows
# Apply AFTER namespace and deployments are created.
# Verify: kubectl get networkpolicy -n honeydue
# --- Default deny all ingress and egress ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: honeydue
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# --- Allow DNS for all pods (required for service discovery) ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: honeydue
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
---
# --- API: allow ingress from Traefik (kube-system namespace) ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-to-api
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: api
policyTypes:
- Ingress
ingress:
# Traefik runs as DaemonSet with hostNetwork=true, so traffic from it
# arrives with the NODE IP as source (not a pod IP). The node pod CIDR
# 10.42.0.0/16 covers any intra-cluster caller; the three node IPs
# cover Traefik on hostNetwork.
- from:
- ipBlock:
cidr: 178.105.32.198/32 # ubuntu-8gb-nbg1-1
- ipBlock:
cidr: 178.104.247.152/32 # ubuntu-8gb-nbg1-2
- ipBlock:
cidr: 178.104.249.189/32 # ubuntu-8gb-nbg1-3
- ipBlock:
cidr: 10.42.0.0/16 # cluster pod CIDR
ports:
- protocol: TCP
port: 8000
---
# --- Admin: allow ingress from Traefik (kube-system namespace) ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-to-admin
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: admin
policyTypes:
- Ingress
ingress:
# Traefik runs as DaemonSet with hostNetwork=true — see allow-ingress-to-api
# for the rationale. Same ipBlock list.
- from:
- ipBlock:
cidr: 178.105.32.198/32
- ipBlock:
cidr: 178.104.247.152/32
- ipBlock:
cidr: 178.104.249.189/32
- ipBlock:
cidr: 10.42.0.0/16
ports:
- protocol: TCP
port: 3000
---
# --- Redis: allow ingress ONLY from api + worker pods ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-to-redis
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: redis
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: api
- podSelector:
matchLabels:
app.kubernetes.io/name: worker
ports:
- protocol: TCP
port: 6379
---
# --- API: allow egress to Redis, external services (Neon DB, APNs, FCM, B2, SMTP) ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-from-api
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: api
policyTypes:
- Egress
egress:
# Redis (in-cluster)
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: redis
ports:
- protocol: TCP
port: 6379
# Kratos (in-cluster). The auth middleware validates every session via
# http://kratos:4433/sessions/whoami; the AuthService also uses :4434
# for account deletion (DELETE /admin/identities/{id}). k3s evaluates
# egress rules AFTER kube-proxy DNAT (runbook §9.2), so this podSelector
# rule covers Service ClusterIP traffic correctly.
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: kratos
ports:
- protocol: TCP
port: 4433
- protocol: TCP
port: 4434
# External services: Neon DB (5432), SMTP (587), HTTPS (443 — APNs, FCM, B2, PostHog)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 5432
- protocol: TCP
port: 587
- protocol: TCP
port: 443
---
# --- Worker: allow egress to Redis, external services ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-from-worker
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: worker
policyTypes:
- Egress
egress:
# Redis (in-cluster)
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: redis
ports:
- protocol: TCP
port: 6379
# External services: Neon DB (5432), SMTP (587), HTTPS (443 — APNs, FCM, B2)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 5432
- protocol: TCP
port: 587
- protocol: TCP
port: 443
---
# --- Admin: allow egress to API (internal) for SSR ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-from-admin
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: admin
policyTypes:
- Egress
egress:
# API service (in-cluster, for server-side API calls)
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: api
ports:
- protocol: TCP
port: 8000
---
# --- Web: allow ingress from Traefik (kube-system namespace) ---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-to-web
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: web
policyTypes:
- Ingress
ingress:
# Traefik runs as DaemonSet with hostNetwork=true — see allow-ingress-to-api
# for the rationale. Same ipBlock list.
- from:
- ipBlock:
cidr: 178.105.32.198/32
- ipBlock:
cidr: 178.104.247.152/32
- ipBlock:
cidr: 178.104.249.189/32
- ipBlock:
cidr: 10.42.0.0/16
ports:
- protocol: TCP
port: 3000
---
# --- Web: allow egress for the Next.js server-side proxy routes ---
# Browser → app.myhoneydue.com → web pod (Node.js) → api.myhoneydue.com
# The web pod resolves api.myhoneydue.com via public DNS and hits
# Cloudflare (143.). We don't know which CF IP yet at policy time, so
# allow HTTPS to public ipBlock (except private CIDRs).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-from-web
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: web
policyTypes:
- Egress
egress:
# HTTPS to public (api.myhoneydue.com via CF, PostHog, any other remote)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 443
---
# vmagent egress.
#
# IMPORTANT (gotcha): k3s's built-in NetworkPolicy controller appears to
# evaluate egress rules AFTER kube-proxy's DNAT, not before (contrary to
# the k8s spec). So traffic from a pod to the kubernetes Service
# (ClusterIP 10.43.0.1:443) is policy-checked as dst=<node_public_ip>:6443.
# That's why we need an explicit rule for :6443 to public IPs, even though
# we already allow :443 to the cluster service CIDR.
#
# Without the :6443 rule, vmagent's k8s service discovery silently fails
# and zero pods get scraped. See deploy-k3s/RUNBOOK.md ("vmagent SD broken").
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-from-vmagent
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: vmagent
policyTypes:
- Egress
egress:
# DNS (cluster-internal)
- to:
- namespaceSelector: {}
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# k8s API server via ClusterIP (pre-DNAT view)
- to:
- ipBlock:
cidr: 10.43.0.0/16
ports:
- port: 443
protocol: TCP
# k8s API server post-DNAT (real path k3s NetPol enforcer sees) — REQUIRED
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.42.0.0/16
ports:
- port: 6443
protocol: TCP
# Scrape api Pods on :8000
- to:
- ipBlock:
cidr: 10.42.0.0/16
ports:
- port: 8000
protocol: TCP
# Scrape kube-state-metrics Pod on :8080 (pod CIDR)
- to:
- ipBlock:
cidr: 10.42.0.0/16
ports:
- port: 8080
protocol: TCP
# HTTPS to public (remote-write to obs.88oakapps.com via Cloudflare)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.42.0.0/16
- 10.43.0.0/16
ports:
- port: 443
protocol: TCP
---
# Allow vmagent → api ingress on :8000 so api pods accept scrapes.
# api Pods are otherwise locked down by default-deny-all + allow-ingress-to-api
# (which only allows Traefik). This adds vmagent specifically.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-vmagent-to-api
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: vmagent
ports:
- port: 8000
protocol: TCP
---
# alloy-logs egress — Grafana Alloy discovers honeydue pods via the k8s API
# and pushes their logs to Loki at obs.88oakapps.com. Same k3s NetworkPolicy
# DNAT gotcha as vmagent: API-server traffic is policy-checked as
# dst=<node_public_ip>:6443, so an explicit :6443 rule is required.
# Alloy reads log FILES from a hostPath, so it needs no ingress and no
# egress to pod :8000/:8080 — only DNS, the API server, and obs HTTPS.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-from-alloy-logs
namespace: honeydue
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: alloy-logs
policyTypes:
- Egress
egress:
# DNS (cluster-internal)
- to:
- namespaceSelector: {}
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# k8s API server via ClusterIP (pre-DNAT view)
- to:
- ipBlock:
cidr: 10.43.0.0/16
ports:
- port: 443
protocol: TCP
# k8s API server post-DNAT (real path k3s NetPol enforcer sees) — REQUIRED
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.42.0.0/16
ports:
- port: 6443
protocol: TCP
# HTTPS to public (log push to obs.88oakapps.com via Cloudflare)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.42.0.0/16
- 10.43.0.0/16
ports:
- port: 443
protocol: TCP