Mirrors the prod deploy-k3s/ setup but runs all services in-cluster on a single node: PostgreSQL (replaces Neon), MinIO S3-compatible storage (replaces B2), Redis, API, worker, and admin. Includes fully automated setup scripts (00-init through 04-verify), server hardening (SSH, fail2ban, ufw), Let's Encrypt TLS via Traefik, network policies, RBAC, and security contexts matching prod. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
392 lines
12 KiB
Markdown
392 lines
12 KiB
Markdown
# honeyDue — K3s Production Deployment
|
|
|
|
Production Kubernetes deployment for honeyDue on Hetzner Cloud using K3s.
|
|
|
|
**Architecture**: 3-node HA K3s cluster (CX33), Neon Postgres, Redis (in-cluster), Backblaze B2 (uploads), Cloudflare CDN/TLS.
|
|
|
|
**Domains**: `api.myhoneydue.com`, `admin.myhoneydue.com`
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
cd honeyDueAPI-go/deploy-k3s
|
|
|
|
# 1. Fill in the single config file
|
|
cp config.yaml.example config.yaml
|
|
# Edit config.yaml — fill in ALL empty values
|
|
|
|
# 2. Create secret files
|
|
# See secrets/README.md for the full list
|
|
echo "your-neon-password" > secrets/postgres_password.txt
|
|
openssl rand -base64 48 > secrets/secret_key.txt
|
|
echo "your-smtp-password" > secrets/email_host_password.txt
|
|
echo "your-fcm-key" > secrets/fcm_server_key.txt
|
|
cp /path/to/AuthKey.p8 secrets/apns_auth_key.p8
|
|
cp /path/to/origin.pem secrets/cloudflare-origin.crt
|
|
cp /path/to/origin-key.pem secrets/cloudflare-origin.key
|
|
|
|
# 3. Provision → Secrets → Deploy
|
|
./scripts/01-provision-cluster.sh
|
|
./scripts/02-setup-secrets.sh
|
|
./scripts/03-deploy.sh
|
|
|
|
# 4. Set up Hetzner LB + Cloudflare DNS (see sections below)
|
|
|
|
# 5. Verify
|
|
./scripts/04-verify.sh
|
|
curl https://api.myhoneydue.com/api/health/
|
|
```
|
|
|
|
That's it. Everything reads from `config.yaml` + `secrets/`.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Prerequisites](#1-prerequisites)
|
|
2. [Configuration](#2-configuration)
|
|
3. [Provision Cluster](#3-provision-cluster)
|
|
4. [Create Secrets](#4-create-secrets)
|
|
5. [Deploy](#5-deploy)
|
|
6. [Configure Load Balancer & DNS](#6-configure-load-balancer--dns)
|
|
7. [Verify](#7-verify)
|
|
8. [Monitoring & Logs](#8-monitoring--logs)
|
|
9. [Scaling](#9-scaling)
|
|
10. [Rollback](#10-rollback)
|
|
11. [Backup & DR](#11-backup--dr)
|
|
12. [Security Checklist](#12-security-checklist)
|
|
13. [Troubleshooting](#13-troubleshooting)
|
|
|
|
---
|
|
|
|
## 1. Prerequisites
|
|
|
|
| Tool | Install | Purpose |
|
|
|------|---------|---------|
|
|
| `hetzner-k3s` | `gem install hetzner-k3s` | Cluster provisioning |
|
|
| `kubectl` | https://kubernetes.io/docs/tasks/tools/ | Cluster management |
|
|
| `helm` | https://helm.sh/docs/intro/install/ | Optional: Prometheus/Grafana |
|
|
| `stern` | `brew install stern` | Multi-pod log tailing |
|
|
| `docker` | https://docs.docker.com/get-docker/ | Image building |
|
|
| `python3` | Pre-installed on macOS | Config parsing |
|
|
| `htpasswd` | `brew install httpd` or `apt install apache2-utils` | Admin basic auth secret |
|
|
|
|
Verify:
|
|
|
|
```bash
|
|
hetzner-k3s version && kubectl version --client && docker version && python3 --version
|
|
```
|
|
|
|
## 2. Configuration
|
|
|
|
There are two things to fill in:
|
|
|
|
### config.yaml — all string configuration
|
|
|
|
```bash
|
|
cp config.yaml.example config.yaml
|
|
```
|
|
|
|
Open `config.yaml` and fill in every empty `""` value:
|
|
|
|
| Section | What to fill in |
|
|
|---------|----------------|
|
|
| `cluster.hcloud_token` | Hetzner API token (Read/Write) — generate at console.hetzner.cloud |
|
|
| `registry.*` | GHCR credentials (same as Docker Swarm setup) |
|
|
| `database.host`, `database.user` | Neon PostgreSQL connection info |
|
|
| `email.user` | Fastmail email address |
|
|
| `push.apns_key_id`, `push.apns_team_id` | Apple Push Notification identifiers |
|
|
| `storage.b2_*` | Backblaze B2 bucket and credentials |
|
|
| `redis.password` | Strong password for Redis authentication (required for production) |
|
|
| `admin.basic_auth_user` | HTTP basic auth username for admin panel |
|
|
| `admin.basic_auth_password` | HTTP basic auth password for admin panel |
|
|
|
|
Everything else has sensible defaults. `config.yaml` is gitignored.
|
|
|
|
### secrets/ — file-based secrets
|
|
|
|
These are binary or multi-line files that can't go in YAML:
|
|
|
|
| File | Source |
|
|
|------|--------|
|
|
| `secrets/postgres_password.txt` | Your Neon database password |
|
|
| `secrets/secret_key.txt` | `openssl rand -base64 48` (min 32 chars) |
|
|
| `secrets/email_host_password.txt` | Fastmail app password |
|
|
| `secrets/fcm_server_key.txt` | Firebase console → Project Settings → Cloud Messaging |
|
|
| `secrets/apns_auth_key.p8` | Apple Developer → Keys → APNs key |
|
|
| `secrets/cloudflare-origin.crt` | Cloudflare → SSL/TLS → Origin Server → Create Certificate |
|
|
| `secrets/cloudflare-origin.key` | (saved with the certificate above) |
|
|
|
|
## 3. Provision Cluster
|
|
|
|
```bash
|
|
export KUBECONFIG=$(pwd)/kubeconfig
|
|
./scripts/01-provision-cluster.sh
|
|
```
|
|
|
|
This script:
|
|
1. Reads cluster config from `config.yaml`
|
|
2. Generates `cluster-config.yaml` for hetzner-k3s
|
|
3. Provisions 3x CX33 nodes with HA etcd (5-10 minutes)
|
|
4. Writes node IPs back into `config.yaml`
|
|
5. Labels the Redis node
|
|
|
|
After provisioning:
|
|
|
|
```bash
|
|
kubectl get nodes
|
|
```
|
|
|
|
## 4. Create Secrets
|
|
|
|
```bash
|
|
./scripts/02-setup-secrets.sh
|
|
```
|
|
|
|
This reads `config.yaml` for registry credentials and creates all Kubernetes Secrets from the `secrets/` files:
|
|
- `honeydue-secrets` — DB password, app secret, email password, FCM key, Redis password (if configured)
|
|
- `honeydue-apns-key` — APNS .p8 key (mounted as volume in pods)
|
|
- `ghcr-credentials` — GHCR image pull credentials
|
|
- `cloudflare-origin-cert` — TLS certificate for Ingress
|
|
- `admin-basic-auth` — htpasswd secret for admin panel basic auth (if configured)
|
|
|
|
## 5. Deploy
|
|
|
|
**Full deploy** (build + push + apply):
|
|
|
|
```bash
|
|
./scripts/03-deploy.sh
|
|
```
|
|
|
|
**Deploy pre-built images** (skip build):
|
|
|
|
```bash
|
|
./scripts/03-deploy.sh --skip-build --tag abc1234
|
|
```
|
|
|
|
The script:
|
|
1. Reads registry config from `config.yaml`
|
|
2. Builds and pushes 3 Docker images to GHCR
|
|
3. Generates a Kubernetes ConfigMap from `config.yaml` (converts to flat env vars)
|
|
4. Applies all manifests with image tag substitution
|
|
5. Waits for all rollouts to complete
|
|
|
|
## 6. Configure Load Balancer & DNS
|
|
|
|
### Hetzner Load Balancer
|
|
|
|
1. [Hetzner Console](https://console.hetzner.cloud/) → **Load Balancers → Create**
|
|
2. Location: **fsn1**, add all 3 nodes as targets
|
|
3. Service: TCP 443 → 443, health check on TCP 443
|
|
4. Note the LB IP and update `load_balancer_ip` in `config.yaml`
|
|
|
|
### Cloudflare DNS
|
|
|
|
1. [Cloudflare Dashboard](https://dash.cloudflare.com/) → `myhoneydue.com` → **DNS**
|
|
|
|
| Type | Name | Content | Proxy |
|
|
|------|------|---------|-------|
|
|
| A | `api` | `<LB_IP>` | Proxied (orange cloud) |
|
|
| A | `admin` | `<LB_IP>` | Proxied (orange cloud) |
|
|
|
|
2. **SSL/TLS → Overview** → Set mode to **Full (Strict)**
|
|
|
|
3. If you haven't generated the origin cert yet:
|
|
**SSL/TLS → Origin Server → Create Certificate**
|
|
- Hostnames: `*.myhoneydue.com`, `myhoneydue.com`
|
|
- Validity: 15 years
|
|
- Save to `secrets/cloudflare-origin.crt` and `secrets/cloudflare-origin.key`
|
|
- Re-run `./scripts/02-setup-secrets.sh`
|
|
|
|
## 7. Verify
|
|
|
|
```bash
|
|
# Automated cluster health check
|
|
./scripts/04-verify.sh
|
|
|
|
# External health check (after DNS propagation)
|
|
curl -v https://api.myhoneydue.com/api/health/
|
|
```
|
|
|
|
Expected: `{"status": "ok"}` with HTTP 200.
|
|
|
|
## 8. Monitoring & Logs
|
|
|
|
### Logs with stern
|
|
|
|
```bash
|
|
stern -n honeydue api # All API pod logs
|
|
stern -n honeydue worker # All worker logs
|
|
stern -n honeydue . # Everything
|
|
stern -n honeydue api | grep ERROR # Filter
|
|
```
|
|
|
|
### kubectl logs
|
|
|
|
```bash
|
|
kubectl logs -n honeydue deployment/api -f
|
|
kubectl logs -n honeydue <pod-name> --previous # Crashed container
|
|
```
|
|
|
|
### Resource usage
|
|
|
|
```bash
|
|
kubectl top pods -n honeydue
|
|
kubectl top nodes
|
|
```
|
|
|
|
### Optional: Prometheus + Grafana
|
|
|
|
```bash
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
helm repo update
|
|
helm install monitoring prometheus-community/kube-prometheus-stack \
|
|
--namespace monitoring --create-namespace \
|
|
--set grafana.adminPassword=your-password
|
|
|
|
# Access Grafana
|
|
kubectl port-forward -n monitoring svc/monitoring-grafana 3001:80
|
|
# Open http://localhost:3001
|
|
```
|
|
|
|
## 9. Scaling
|
|
|
|
### Manual
|
|
|
|
```bash
|
|
kubectl scale deployment/api -n honeydue --replicas=5
|
|
kubectl scale deployment/worker -n honeydue --replicas=3
|
|
```
|
|
|
|
### HPA (auto-scaling)
|
|
|
|
API auto-scales 3→6 replicas on CPU > 70% or memory > 80%:
|
|
|
|
```bash
|
|
kubectl get hpa -n honeydue
|
|
kubectl describe hpa api -n honeydue
|
|
```
|
|
|
|
### Adding nodes
|
|
|
|
Edit `config.yaml` to add nodes, then re-run provisioning:
|
|
|
|
```bash
|
|
./scripts/01-provision-cluster.sh
|
|
```
|
|
|
|
## 10. Rollback
|
|
|
|
```bash
|
|
./scripts/rollback.sh
|
|
```
|
|
|
|
Shows rollout history, asks for confirmation, rolls back all deployments to previous revision.
|
|
|
|
Single deployment rollback:
|
|
|
|
```bash
|
|
kubectl rollout undo deployment/api -n honeydue
|
|
```
|
|
|
|
## 11. Backup & DR
|
|
|
|
| Component | Strategy | Action Required |
|
|
|-----------|----------|-----------------|
|
|
| PostgreSQL | Neon PITR (automatic) | None |
|
|
| Redis | Reconstructible cache + Asynq queue | None |
|
|
| etcd | K3s auto-snapshots (12h, keeps 5) | None |
|
|
| B2 Storage | B2 versioning + lifecycle rules | Enable in B2 settings |
|
|
| Secrets | Local `secrets/` + `config.yaml` | Keep secure offline backup |
|
|
|
|
**Disaster recovery**: Re-provision → re-create secrets → re-deploy. Database recovers via Neon PITR.
|
|
|
|
## 12. Security
|
|
|
|
See **[SECURITY.md](SECURITY.md)** for the comprehensive hardening guide, incident response playbooks, and full compliance checklist.
|
|
|
|
### Summary of deployed security controls
|
|
|
|
| Control | Status | Manifests |
|
|
|---------|--------|-----------|
|
|
| Pod security contexts (non-root, read-only FS, no caps) | Applied | All `deployment.yaml` |
|
|
| Network policies (default-deny + explicit allows) | Applied | `manifests/network-policies.yaml` |
|
|
| RBAC (dedicated SAs, no K8s API access) | Applied | `manifests/rbac.yaml` |
|
|
| Pod disruption budgets | Applied | `manifests/pod-disruption-budgets.yaml` |
|
|
| Redis authentication | Applied (if `redis.password` set) | `redis/deployment.yaml` |
|
|
| Cloudflare-only origin lockdown | Applied | `ingress/ingress.yaml` |
|
|
| Admin basic auth | Applied (if `admin.*` set) | `ingress/middleware.yaml` |
|
|
| Security headers (HSTS, CSP, Permissions-Policy) | Applied | `ingress/middleware.yaml` |
|
|
| Secret encryption at rest | K3s config | `--secrets-encryption` |
|
|
|
|
### Quick checklist
|
|
|
|
- [ ] Hetzner Firewall: allow only 22, 443, 6443 from your IP
|
|
- [ ] SSH: key-only auth (`PasswordAuthentication no`)
|
|
- [ ] `redis.password` set in `config.yaml`
|
|
- [ ] `admin.basic_auth_user` and `admin.basic_auth_password` set in `config.yaml`
|
|
- [ ] `kubeconfig`: `chmod 600 kubeconfig`, never commit
|
|
- [ ] `config.yaml`: contains tokens — never commit, keep secure backup
|
|
- [ ] Image scanning: `trivy image` or `docker scout cves` before deploy
|
|
- [ ] Run `./scripts/04-verify.sh` — includes automated security checks
|
|
|
|
## 13. Troubleshooting
|
|
|
|
### ImagePullBackOff
|
|
|
|
```bash
|
|
kubectl describe pod <pod-name> -n honeydue
|
|
# Check: image name, GHCR credentials, image exists
|
|
```
|
|
|
|
Fix: verify `registry.*` in config.yaml, re-run `02-setup-secrets.sh`.
|
|
|
|
### CrashLoopBackOff
|
|
|
|
```bash
|
|
kubectl logs <pod-name> -n honeydue --previous
|
|
# Common: missing env vars, DB connection failure, invalid APNS key
|
|
```
|
|
|
|
### Redis connection refused / NOAUTH
|
|
|
|
```bash
|
|
kubectl get pods -n honeydue -l app.kubernetes.io/name=redis
|
|
|
|
# If redis.password is set, you must authenticate:
|
|
kubectl exec -it deploy/redis -n honeydue -- redis-cli -a "$REDIS_PASSWORD" ping
|
|
# Without -a: (error) NOAUTH Authentication required.
|
|
```
|
|
|
|
### Health check failures
|
|
|
|
```bash
|
|
kubectl exec -it deploy/api -n honeydue -- curl -v http://localhost:8000/api/health/
|
|
kubectl exec -it deploy/api -n honeydue -- env | sort
|
|
```
|
|
|
|
### Pods stuck in Pending
|
|
|
|
```bash
|
|
kubectl describe pod <pod-name> -n honeydue
|
|
# For Redis: ensure a node has label honeydue/redis=true
|
|
kubectl get nodes --show-labels | grep redis
|
|
```
|
|
|
|
### DNS not resolving
|
|
|
|
```bash
|
|
dig api.myhoneydue.com +short
|
|
# Verify LB IP matches what's in config.yaml
|
|
```
|
|
|
|
### Certificate / TLS errors
|
|
|
|
```bash
|
|
kubectl get secret cloudflare-origin-cert -n honeydue
|
|
kubectl describe ingress honeydue -n honeydue
|
|
curl -vk --resolve api.myhoneydue.com:443:<NODE_IP> https://api.myhoneydue.com/api/health/
|
|
```
|