Files
honeyDueAPI/deploy-k3s/README.md
Trey t 34553f3bec Add K3s dev deployment setup for single-node VPS
Mirrors the prod deploy-k3s/ setup but runs all services in-cluster
on a single node: PostgreSQL (replaces Neon), MinIO S3-compatible
storage (replaces B2), Redis, API, worker, and admin.

Includes fully automated setup scripts (00-init through 04-verify),
server hardening (SSH, fail2ban, ufw), Let's Encrypt TLS via Traefik,
network policies, RBAC, and security contexts matching prod.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 21:30:39 -05:00

392 lines
12 KiB
Markdown

# honeyDue — K3s Production Deployment
Production Kubernetes deployment for honeyDue on Hetzner Cloud using K3s.
**Architecture**: 3-node HA K3s cluster (CX33), Neon Postgres, Redis (in-cluster), Backblaze B2 (uploads), Cloudflare CDN/TLS.
**Domains**: `api.myhoneydue.com`, `admin.myhoneydue.com`
---
## Quick Start
```bash
cd honeyDueAPI-go/deploy-k3s
# 1. Fill in the single config file
cp config.yaml.example config.yaml
# Edit config.yaml — fill in ALL empty values
# 2. Create secret files
# See secrets/README.md for the full list
echo "your-neon-password" > secrets/postgres_password.txt
openssl rand -base64 48 > secrets/secret_key.txt
echo "your-smtp-password" > secrets/email_host_password.txt
echo "your-fcm-key" > secrets/fcm_server_key.txt
cp /path/to/AuthKey.p8 secrets/apns_auth_key.p8
cp /path/to/origin.pem secrets/cloudflare-origin.crt
cp /path/to/origin-key.pem secrets/cloudflare-origin.key
# 3. Provision → Secrets → Deploy
./scripts/01-provision-cluster.sh
./scripts/02-setup-secrets.sh
./scripts/03-deploy.sh
# 4. Set up Hetzner LB + Cloudflare DNS (see sections below)
# 5. Verify
./scripts/04-verify.sh
curl https://api.myhoneydue.com/api/health/
```
That's it. Everything reads from `config.yaml` + `secrets/`.
---
## Table of Contents
1. [Prerequisites](#1-prerequisites)
2. [Configuration](#2-configuration)
3. [Provision Cluster](#3-provision-cluster)
4. [Create Secrets](#4-create-secrets)
5. [Deploy](#5-deploy)
6. [Configure Load Balancer & DNS](#6-configure-load-balancer--dns)
7. [Verify](#7-verify)
8. [Monitoring & Logs](#8-monitoring--logs)
9. [Scaling](#9-scaling)
10. [Rollback](#10-rollback)
11. [Backup & DR](#11-backup--dr)
12. [Security Checklist](#12-security-checklist)
13. [Troubleshooting](#13-troubleshooting)
---
## 1. Prerequisites
| Tool | Install | Purpose |
|------|---------|---------|
| `hetzner-k3s` | `gem install hetzner-k3s` | Cluster provisioning |
| `kubectl` | https://kubernetes.io/docs/tasks/tools/ | Cluster management |
| `helm` | https://helm.sh/docs/intro/install/ | Optional: Prometheus/Grafana |
| `stern` | `brew install stern` | Multi-pod log tailing |
| `docker` | https://docs.docker.com/get-docker/ | Image building |
| `python3` | Pre-installed on macOS | Config parsing |
| `htpasswd` | `brew install httpd` or `apt install apache2-utils` | Admin basic auth secret |
Verify:
```bash
hetzner-k3s version && kubectl version --client && docker version && python3 --version
```
## 2. Configuration
There are two things to fill in:
### config.yaml — all string configuration
```bash
cp config.yaml.example config.yaml
```
Open `config.yaml` and fill in every empty `""` value:
| Section | What to fill in |
|---------|----------------|
| `cluster.hcloud_token` | Hetzner API token (Read/Write) — generate at console.hetzner.cloud |
| `registry.*` | GHCR credentials (same as Docker Swarm setup) |
| `database.host`, `database.user` | Neon PostgreSQL connection info |
| `email.user` | Fastmail email address |
| `push.apns_key_id`, `push.apns_team_id` | Apple Push Notification identifiers |
| `storage.b2_*` | Backblaze B2 bucket and credentials |
| `redis.password` | Strong password for Redis authentication (required for production) |
| `admin.basic_auth_user` | HTTP basic auth username for admin panel |
| `admin.basic_auth_password` | HTTP basic auth password for admin panel |
Everything else has sensible defaults. `config.yaml` is gitignored.
### secrets/ — file-based secrets
These are binary or multi-line files that can't go in YAML:
| File | Source |
|------|--------|
| `secrets/postgres_password.txt` | Your Neon database password |
| `secrets/secret_key.txt` | `openssl rand -base64 48` (min 32 chars) |
| `secrets/email_host_password.txt` | Fastmail app password |
| `secrets/fcm_server_key.txt` | Firebase console → Project Settings → Cloud Messaging |
| `secrets/apns_auth_key.p8` | Apple Developer → Keys → APNs key |
| `secrets/cloudflare-origin.crt` | Cloudflare → SSL/TLS → Origin Server → Create Certificate |
| `secrets/cloudflare-origin.key` | (saved with the certificate above) |
## 3. Provision Cluster
```bash
export KUBECONFIG=$(pwd)/kubeconfig
./scripts/01-provision-cluster.sh
```
This script:
1. Reads cluster config from `config.yaml`
2. Generates `cluster-config.yaml` for hetzner-k3s
3. Provisions 3x CX33 nodes with HA etcd (5-10 minutes)
4. Writes node IPs back into `config.yaml`
5. Labels the Redis node
After provisioning:
```bash
kubectl get nodes
```
## 4. Create Secrets
```bash
./scripts/02-setup-secrets.sh
```
This reads `config.yaml` for registry credentials and creates all Kubernetes Secrets from the `secrets/` files:
- `honeydue-secrets` — DB password, app secret, email password, FCM key, Redis password (if configured)
- `honeydue-apns-key` — APNS .p8 key (mounted as volume in pods)
- `ghcr-credentials` — GHCR image pull credentials
- `cloudflare-origin-cert` — TLS certificate for Ingress
- `admin-basic-auth` — htpasswd secret for admin panel basic auth (if configured)
## 5. Deploy
**Full deploy** (build + push + apply):
```bash
./scripts/03-deploy.sh
```
**Deploy pre-built images** (skip build):
```bash
./scripts/03-deploy.sh --skip-build --tag abc1234
```
The script:
1. Reads registry config from `config.yaml`
2. Builds and pushes 3 Docker images to GHCR
3. Generates a Kubernetes ConfigMap from `config.yaml` (converts to flat env vars)
4. Applies all manifests with image tag substitution
5. Waits for all rollouts to complete
## 6. Configure Load Balancer & DNS
### Hetzner Load Balancer
1. [Hetzner Console](https://console.hetzner.cloud/) → **Load Balancers → Create**
2. Location: **fsn1**, add all 3 nodes as targets
3. Service: TCP 443 → 443, health check on TCP 443
4. Note the LB IP and update `load_balancer_ip` in `config.yaml`
### Cloudflare DNS
1. [Cloudflare Dashboard](https://dash.cloudflare.com/) → `myhoneydue.com`**DNS**
| Type | Name | Content | Proxy |
|------|------|---------|-------|
| A | `api` | `<LB_IP>` | Proxied (orange cloud) |
| A | `admin` | `<LB_IP>` | Proxied (orange cloud) |
2. **SSL/TLS → Overview** → Set mode to **Full (Strict)**
3. If you haven't generated the origin cert yet:
**SSL/TLS → Origin Server → Create Certificate**
- Hostnames: `*.myhoneydue.com`, `myhoneydue.com`
- Validity: 15 years
- Save to `secrets/cloudflare-origin.crt` and `secrets/cloudflare-origin.key`
- Re-run `./scripts/02-setup-secrets.sh`
## 7. Verify
```bash
# Automated cluster health check
./scripts/04-verify.sh
# External health check (after DNS propagation)
curl -v https://api.myhoneydue.com/api/health/
```
Expected: `{"status": "ok"}` with HTTP 200.
## 8. Monitoring & Logs
### Logs with stern
```bash
stern -n honeydue api # All API pod logs
stern -n honeydue worker # All worker logs
stern -n honeydue . # Everything
stern -n honeydue api | grep ERROR # Filter
```
### kubectl logs
```bash
kubectl logs -n honeydue deployment/api -f
kubectl logs -n honeydue <pod-name> --previous # Crashed container
```
### Resource usage
```bash
kubectl top pods -n honeydue
kubectl top nodes
```
### Optional: Prometheus + Grafana
```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.adminPassword=your-password
# Access Grafana
kubectl port-forward -n monitoring svc/monitoring-grafana 3001:80
# Open http://localhost:3001
```
## 9. Scaling
### Manual
```bash
kubectl scale deployment/api -n honeydue --replicas=5
kubectl scale deployment/worker -n honeydue --replicas=3
```
### HPA (auto-scaling)
API auto-scales 3→6 replicas on CPU > 70% or memory > 80%:
```bash
kubectl get hpa -n honeydue
kubectl describe hpa api -n honeydue
```
### Adding nodes
Edit `config.yaml` to add nodes, then re-run provisioning:
```bash
./scripts/01-provision-cluster.sh
```
## 10. Rollback
```bash
./scripts/rollback.sh
```
Shows rollout history, asks for confirmation, rolls back all deployments to previous revision.
Single deployment rollback:
```bash
kubectl rollout undo deployment/api -n honeydue
```
## 11. Backup & DR
| Component | Strategy | Action Required |
|-----------|----------|-----------------|
| PostgreSQL | Neon PITR (automatic) | None |
| Redis | Reconstructible cache + Asynq queue | None |
| etcd | K3s auto-snapshots (12h, keeps 5) | None |
| B2 Storage | B2 versioning + lifecycle rules | Enable in B2 settings |
| Secrets | Local `secrets/` + `config.yaml` | Keep secure offline backup |
**Disaster recovery**: Re-provision → re-create secrets → re-deploy. Database recovers via Neon PITR.
## 12. Security
See **[SECURITY.md](SECURITY.md)** for the comprehensive hardening guide, incident response playbooks, and full compliance checklist.
### Summary of deployed security controls
| Control | Status | Manifests |
|---------|--------|-----------|
| Pod security contexts (non-root, read-only FS, no caps) | Applied | All `deployment.yaml` |
| Network policies (default-deny + explicit allows) | Applied | `manifests/network-policies.yaml` |
| RBAC (dedicated SAs, no K8s API access) | Applied | `manifests/rbac.yaml` |
| Pod disruption budgets | Applied | `manifests/pod-disruption-budgets.yaml` |
| Redis authentication | Applied (if `redis.password` set) | `redis/deployment.yaml` |
| Cloudflare-only origin lockdown | Applied | `ingress/ingress.yaml` |
| Admin basic auth | Applied (if `admin.*` set) | `ingress/middleware.yaml` |
| Security headers (HSTS, CSP, Permissions-Policy) | Applied | `ingress/middleware.yaml` |
| Secret encryption at rest | K3s config | `--secrets-encryption` |
### Quick checklist
- [ ] Hetzner Firewall: allow only 22, 443, 6443 from your IP
- [ ] SSH: key-only auth (`PasswordAuthentication no`)
- [ ] `redis.password` set in `config.yaml`
- [ ] `admin.basic_auth_user` and `admin.basic_auth_password` set in `config.yaml`
- [ ] `kubeconfig`: `chmod 600 kubeconfig`, never commit
- [ ] `config.yaml`: contains tokens — never commit, keep secure backup
- [ ] Image scanning: `trivy image` or `docker scout cves` before deploy
- [ ] Run `./scripts/04-verify.sh` — includes automated security checks
## 13. Troubleshooting
### ImagePullBackOff
```bash
kubectl describe pod <pod-name> -n honeydue
# Check: image name, GHCR credentials, image exists
```
Fix: verify `registry.*` in config.yaml, re-run `02-setup-secrets.sh`.
### CrashLoopBackOff
```bash
kubectl logs <pod-name> -n honeydue --previous
# Common: missing env vars, DB connection failure, invalid APNS key
```
### Redis connection refused / NOAUTH
```bash
kubectl get pods -n honeydue -l app.kubernetes.io/name=redis
# If redis.password is set, you must authenticate:
kubectl exec -it deploy/redis -n honeydue -- redis-cli -a "$REDIS_PASSWORD" ping
# Without -a: (error) NOAUTH Authentication required.
```
### Health check failures
```bash
kubectl exec -it deploy/api -n honeydue -- curl -v http://localhost:8000/api/health/
kubectl exec -it deploy/api -n honeydue -- env | sort
```
### Pods stuck in Pending
```bash
kubectl describe pod <pod-name> -n honeydue
# For Redis: ensure a node has label honeydue/redis=true
kubectl get nodes --show-labels | grep redis
```
### DNS not resolving
```bash
dig api.myhoneydue.com +short
# Verify LB IP matches what's in config.yaml
```
### Certificate / TLS errors
```bash
kubectl get secret cloudflare-origin-cert -n honeydue
kubectl describe ingress honeydue -n honeydue
curl -vk --resolve api.myhoneydue.com:443:<NODE_IP> https://api.myhoneydue.com/api/health/
```