admin/honeyDueAPI

Fork 0

Files

T

Trey t 6f303dbbaa

Backend CI / Test (push) Has been cancelled

Details

Backend CI / Contract Tests (push) Has been cancelled

Details

Backend CI / Build (push) Has been cancelled

Details

Backend CI / Lint (push) Has been cancelled

Details

Backend CI / Secret Scanning (push) Has been cancelled

Details

Migrate prod deploy from Swarm to K3s; add full deployment book

Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
  temporarily for reference

Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
  callback (was causing 'unlock of unlocked mutex' fatal after
  Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
  + allowlist fonts.googleapis.com so the marketing landing page CSS
  actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
  --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
  images runnable on x86_64 Hetzner nodes; fix array expansion under
  set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
  top-level aliases (the '\${X_SECRET}' form never actually resolved);
  dozzle ports: long-form host_ip is rejected by Swarm, switched to
  short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
  worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
  (Next.js serves at root; /admin/ returned 404 and killed pods);
  startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
  1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
  12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
  real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
  and admin/src/app/api/*, hiding legitimate files)

New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
  hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
  without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log

Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
  - Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
  - Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
  - Part III Security, Traefik ingress (Ch 5-6)
  - Part IV Services, DB, storage, secrets, registry (Ch 7-11)
  - Part V Data flow, deploy process, observability, failures, runbook
    (Ch 12, 14-17)
  - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
  - Appendices: glossary, kubectl cheat sheet, file locations,
    consolidated citations
- README.md: Production Deployment section replaced with pointer to
  the book; Go version bumped to 1.25

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-24 07:20:54 -05:00

11 KiB

Raw Blame History

13 — Cloudflare

Summary

Cloudflare sits in front of every public request. It provides DNS (authoritative nameservers for myhoneydue.com), TLS termination at the edge, DDoS mitigation, caching, and the round-robin fan-out across our three node IPs. We use the Free plan. TLS mode is "Flexible" (HTTP between CF and origin). This chapter documents every Cloudflare setting that matters.

DNS

Zone

myhoneydue.com, managed by Cloudflare. Authoritative nameservers:

carol.ns.cloudflare.com
ishaan.ns.cloudflare.com

Records that matter

Type	Name	Content	Proxy	Notes
A	`api`	178.104.247.152	🟠 Proxied	hetzner1
A	`api`	178.105.32.198	🟠 Proxied	hetzner2
A	`api`	178.104.249.189	🟠 Proxied	hetzner3
A	`admin`	178.104.247.152	🟠 Proxied	same 3 IPs
A	`admin`	178.105.32.198	🟠 Proxied
A	`admin`	178.104.249.189	🟠 Proxied
A	`@`	178.104.247.152	🟠 Proxied	same 3 IPs
A	`@`	178.105.32.198	🟠 Proxied
A	`@`	178.104.249.189	🟠 Proxied

Three A records per name → Cloudflare selects one per request. With proxying on (orange cloud), the client never sees these IPs — it sees a Cloudflare edge IP. CF internally picks which of the three origin IPs to connect to; if one fails the connection, CF retries the next.

TXT records for email (Fastmail sending domain): SPF, DKIM, DMARC. Not our immediate concern; configured by the Fastmail custom-domain setup.

Why three A records per name, not one

With one record pointing at hetzner1:

Only hetzner1 sees traffic
If hetzner1 is unreachable, everything breaks until we change DNS

With three records:

CF chooses one origin per connection
If one node's port :80 stops responding, CF tries the others
Node upgrades can be done one at a time with no user impact

This is poor-man's load balancing. A Hetzner Load Balancer or Cloudflare Load Balancer (paid) would be more sophisticated — with active health checks and automatic failover on sub-second latency. Our DNS approach is "good enough" for the traffic volume.

Cloudflare's origin health checks

On Free plan, CF doesn't actively probe origins. It reacts to real connection failures: if an origin returns 5xx repeatedly or connection times out, CF marks it unhealthy for that edge POP for some time.

Upgrading to Cloudflare Load Balancing ($5/mo add-on) would enable active health checks — explicit probes independent of traffic. Useful when you want sub-second failover.

TLS

Mode: Flexible

CF Dashboard → SSL/TLS → Overview → Flexible.

What this means:

User ↔ Cloudflare: TLS (HTTPS)
Cloudflare ↔ Origin: plaintext HTTP (port 80)

Why we chose it:

No origin cert required on the Hetzner nodes
Zero Traefik cert-management complexity
Fine for a site where CF terminates all user-facing TLS

Downsides:

An attacker with network access between CF and Hetzner could read traffic. Realistically: nobody between CF's POPs and Hetzner's Nuremberg DC, but it's theoretically plaintext on the wire.
MitM risk if DNS gets hijacked and traffic is routed through an unintended origin.

Future: Full (strict)

The next step up is Full (strict): CF verifies origin's TLS cert and connects over HTTPS. Cloudflare provides free Origin CA certificates for this: they're issued by a CF-internal CA that only CF's own edge accepts. An attacker without a CF-signed cert can't impersonate our origin.

Path to enable:

Generate Origin CA cert in CF dashboard → SSL/TLS → Origin Server
Download as PEM

Create k8s Secret cloudflare-origin-cert:

kubectl create secret tls cloudflare-origin-cert -n honeydue \
  --cert=origin.crt --key=origin.key

Add tls: block to our Ingress:

spec:
  tls:
    - hosts: [api.myhoneydue.com]
      secretName: cloudflare-origin-cert

Switch CF SSL mode to Full (strict)

Trad-off: the cloudflare-origin-cert expires (default 15 years), so low maintenance. TODO (Chapter 20).

Edge certificate

CF provides a free edge certificate for *.myhoneydue.com and myhoneydue.com. Auto-renewed by Cloudflare. We don't touch it.

Always Use HTTPS

SSL/TLS → Edge Certificates → Always Use HTTPS: On (default).

Redirects any HTTP → HTTPS at the CF edge. Clients that hit http://api.myhoneydue.com/* get 301'd to https://.... Origin never sees the HTTP request.

HSTS

Not currently enabled. HSTS (HTTP Strict Transport Security) sends a header telling browsers "always use HTTPS for this domain." Once set with long max-age, it's permanent until it expires — if we later misconfigure TLS, HSTS-enabled browsers refuse to connect at all.

Enabling HSTS is a TODO but requires confidence in our TLS stability. Not tonight.

DDoS mitigation

CF's Free plan includes basic DDoS protection:

Volumetric attacks absorbed at the edge
Obvious bot patterns blocked (known-bad user agents, headless browsers doing suspicious things)

Under a large attack, CF might:

Insert a "checking your browser" JavaScript challenge (the ~5-second "Cloudflare is checking your browser" page)
Rate-limit by IP

Under a sustained, sophisticated attack we might need:

CF Pro plan ($20/mo) for more rule customization
Enterprise plan for negotiated protection
Extra measures like Cloudflare Magic Transit

So far, not needed.

Caching

Default CF caching:

Static assets (CSS, JS, images) cached aggressively based on extension
HTML pages honored per Cache-Control headers from origin
JSON API responses typically not cached (no Cache-Control: public)

Our Go API doesn't set Cache-Control: public on any endpoint, so CF treats them as uncacheable. Every API call reaches origin.

If we wanted to cache certain endpoints (e.g., public lookup tables):

c.Response().Header().Set("Cache-Control", "public, max-age=300")

And CF will cache for 5 minutes.

Firewall rules at CF

CF Dashboard → Security → WAF. On Free tier:

Managed rules: a small free allowlist of "obvious-attack" patterns
Custom rules: limited (5 on Free, 20 on Pro)

We have no custom rules defined currently. The managed ruleset covers:

SQL injection attempts in query strings
Known-vulnerable bot User-Agents
XSS attempts in common parameters

Rate limiting

CF Free: 10,000 requests per 10 minutes per IP for free rules (we haven't configured any). The API itself should have rate limits for sensitive endpoints; we don't rely on CF for that.

What CF does NOT do for us

Authenticate users — our app does
Authorize requests — our app does
Encrypt pod-to-pod traffic — nothing Cloudflare can help with
Backup origin data — CF caches but doesn't store copies persistently

Turnstile / bot management

Not enabled. If we start seeing account-creation spam, Cloudflare Turnstile (free) would be a good addition — a CAPTCHA replacement that doesn't require user interaction for most traffic.

Origin IP protection

CF proxying (orange cloud) is the primary protection of our origin IPs. When proxying is on:

DNS queries return CF edge IPs, never origin
HTTP/HTTPS traffic goes through CF

However, our origin IPs can leak via:

Email sending (if the app ever sent email directly from the origin IP) — we use Fastmail so this isn't an issue
Outbound connections (our pods connect out to Neon, B2, Fastmail from the nodes' public IPs; those IPs appear in external logs)
Historical DNS records (services like SecurityTrails log historical DNS; if we ever had unproxied A records, attackers can look them up)

If origin IPs leak, attackers can bypass CF's protection by connecting directly to node IPs. Current mitigation:

UFW only allows :80/:443 from anywhere
Our app has no ports bound to the public IP

Future (Chapter 20): UFW rule to allow :80/:443 only from CF IP ranges. Prevents direct-connect bypass entirely.

Cloudflare IP ranges (used in Traefik trustedIPs)

From cloudflare.com/ips:

IPv4 ranges:

173.245.48.0/20
103.21.244.0/22
103.22.200.0/22
103.31.4.0/22
141.101.64.0/18
108.162.192.0/18
190.93.240.0/20
188.114.96.0/20
197.234.240.0/22
198.41.128.0/17
162.158.0.0/15
104.16.0.0/13
104.24.0.0/14
172.64.0.0/13
131.0.72.0/22

IPv6 ranges:

2400:cb00::/32
2606:4700::/32
2803:f800::/32
2405:b500::/32
2405:8100::/32
2a06:98c0::/29
2c0f:f248::/32

These are used in two places:

Traefik forwardedHeaders.trustedIPs — we already have this configured (Chapter 6)
UFW allow 80/tcp from <cf-range> — NOT configured (TODO)

CF occasionally adds new ranges. If a future CF range isn't in our list, we'd either trust unknown IPs (if lax) or reject legitimate CF traffic (if strict). The canonical source is the public API:

curl -sS https://www.cloudflare.com/ips-v4
curl -sS https://www.cloudflare.com/ips-v6

API token for programmatic changes

If we automate DNS changes (e.g., adding new subdomain on deploy), we'd need a CF API token with Zone:DNS:Edit scope for the myhoneydue.com zone.

Currently not automated; DNS is managed in the CF dashboard by hand.

Cost

$0/mo. Free plan covers everything we use. Paid plans add features we don't need yet:

Feature	Free	Pro ($20)	Business ($200)
DNS + proxying	✓	✓	✓
Basic DDoS	✓	✓	✓
SSL (edge + Flexible + Full + Full strict)	✓	✓	✓
WAF managed rules	✓ (limited)	✓ (more)	✓ (all)
Custom firewall rules	5	20	100
Page Rules	3	20	50
Image Resizing	no	no	✓
Load Balancing	no	$5/mo add-on	✓

We'd consider Pro ($20/mo) if:

We needed a custom WAF rule beyond the 5-rule limit
We wanted Image Resizing for user-uploaded photos

Neither is needed today.

Operator cheat sheet

# Query current CF-served DNS
dig +short @1.1.1.1 api.myhoneydue.com    # returns CF edge IPs when proxied

# Query our origin directly (bypass CF)
curl -sS -H "Host: api.myhoneydue.com" http://178.104.247.152/api/health/

# Check CF headers (confirm you're going through CF)
curl -sS -I https://api.myhoneydue.com/api/health/ | grep -i cf-

# Purge CF cache (requires API token)
curl -X POST \
  -H "Authorization: Bearer $CF_TOKEN" \
  -H "Content-Type: application/json" \
  "https://api.cloudflare.com/client/v4/zones/<zone_id>/purge_cache" \
  -d '{"purge_everything":true}'

11 KiB Raw Blame History

13 — Cloudflare

Summary

DNS

Zone

Records that matter

Why three A records per name, not one

Cloudflare's origin health checks

TLS

Mode: Flexible

Future: Full (strict)

Edge certificate

Always Use HTTPS

HSTS

DDoS mitigation

Caching

Firewall rules at CF

Rate limiting

What CF does NOT do for us

Turnstile / bot management

Origin IP protection

Cloudflare IP ranges (used in Traefik trustedIPs)

API token for programmatic changes

Cost

Operator cheat sheet

References

11 KiB

Raw Blame History