From 139a990ebc258bad4ba9a88014224997eddcd21c Mon Sep 17 00:00:00 2001 From: Trey t Date: Wed, 13 May 2026 00:30:11 -0500 Subject: [PATCH] fix(observability): unbreak vmagent SD on fresh deploy + ship kube-state-metrics MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit vmagent's k8s service discovery has been silently broken for 17+ days because k3s's NetworkPolicy controller evaluates egress AFTER kube-proxy's DNAT (contrary to the k8s spec). Pod → ClusterIP 10.43.0.1:443 was DNAT'd to :6443, and the resulting :6443 destination matched none of vmagent's egress rules → TCP RST → "connection refused" on every SD watch attempt. Grafana panels using kube_* or up{} metrics returned empty as a result. Changes: - network-policies.yaml: commit the previously-cluster-only NetPols (allow-egress-from-vmagent, allow-vmagent-to-api) so a fresh deploy produces a working cluster. The vmagent egress rule now includes :6443 to public IPs (the post-DNAT path) and :8080 to the pod CIDR (for scraping kube-state-metrics). - observability/kube-state-metrics.yaml: new manifest. Provides the kube_pod_*, kube_deployment_*, kube_service_* metrics that Grafana panels need to count pods, replicas, etc. Runs in kube-system with cluster-scoped RBAC. - observability/vmagent.yaml: * add kube-state-metrics scrape job to the ConfigMap * add vmagent-kube-system Role+RoleBinding so cross-namespace SD works * replace the misleading liveness probe (was /-/healthy, which lies while SD is broken) with an exec probe that checks /api/v1/targets for at least one healthy target — automatic recovery from future stale-SD incidents - scripts/03-deploy.sh: actually apply network-policies.yaml (was committed but never applied) and apply kube-state-metrics.yaml. - RUNBOOK.md (new): documents the post-DNAT gotcha, the liveness probe trap, bearer-token recovery procedure, drift-detection diff, and a post-redeploy verification checklist. - .gitignore: cover kubeconfig.tunnel (created during SSH-tunnelled kubectl sessions) so admin client cert can't be committed by accident. Verified via kubectl --dry-run on all three modified manifests. Co-Authored-By: Claude Opus 4.7 (1M context) --- deploy-k3s/.gitignore | 1 + deploy-k3s/RUNBOOK.md | 262 ++++++++++++++++++ deploy-k3s/manifests/network-policies.yaml | 97 +++++++ .../observability/kube-state-metrics.yaml | 223 +++++++++++++++ .../manifests/observability/vmagent.yaml | 69 ++++- deploy-k3s/scripts/03-deploy.sh | 20 +- 6 files changed, 666 insertions(+), 6 deletions(-) create mode 100644 deploy-k3s/RUNBOOK.md create mode 100644 deploy-k3s/manifests/observability/kube-state-metrics.yaml diff --git a/deploy-k3s/.gitignore b/deploy-k3s/.gitignore index 31d21e4..1eecf1a 100644 --- a/deploy-k3s/.gitignore +++ b/deploy-k3s/.gitignore @@ -3,6 +3,7 @@ config.yaml # Generated files kubeconfig +kubeconfig.* cluster-config.yaml prod.env diff --git a/deploy-k3s/RUNBOOK.md b/deploy-k3s/RUNBOOK.md new file mode 100644 index 0000000..53c4b9b --- /dev/null +++ b/deploy-k3s/RUNBOOK.md @@ -0,0 +1,262 @@ +# k3s Cluster Operations Runbook + +Living document for honeyDue k3s cluster operations. Add entries when you +hit something non-obvious so future-you (or your replacement) doesn't have +to rediscover it. + +--- + +## Deployment + +The canonical deploy path is `deploy-k3s/scripts/03-deploy.sh`. It applies +everything in `deploy-k3s/manifests/` in the right order. + +What it touches (in order): + +1. `namespace.yaml` +2. `network-policies.yaml` — **all** NetPols including the vmagent ones +3. `redis/` +4. `ingress/` +5. `migrate/job.yaml` (with image substitution; blocks on success) +6. `api/deployment.yaml`, `api/service.yaml`, `api/hpa.yaml` (image-subbed) +7. `worker/deployment.yaml` (image-subbed) +8. `admin/deployment.yaml`, `admin/service.yaml` (image-subbed) +9. `web/deployment.yaml`, `web/service.yaml` (image-subbed; optional dir) +10. `observability/kube-state-metrics.yaml` +11. `observability/vmagent.yaml` (with `TOKEN_PLACEHOLDER` sed-substituted from `deploy/prod.env`) + +If you add a new manifest, also add a `kubectl apply -f` line to +`03-deploy.sh` — there's no kustomization or `apply -R`. **A manifest +that exists in the repo but isn't applied by the script will silently +not deploy.** + +### Pre-deploy checklist + +- [ ] `deploy/prod.env` exists and contains `OBS_INGEST_TOKEN=...` + (otherwise vmagent gets skipped with a warning) +- [ ] `KUBECONFIG` points at the right cluster +- [ ] The Gitea image registry is reachable from k3s nodes +- [ ] Schema migrations in `migrations/` are tested locally first + (the deploy aborts if `honeydue-migrate` Job fails) + +--- + +## Known gotchas + +### vmagent SD broken on fresh deploy ("0 pods up" in Grafana) + +**Symptoms:** +- Grafana panels using `kube_*` metrics or `up{job=...}` show 0 +- vmagent logs: `dial tcp 10.43.0.1:443: connect: connection refused` + repeating every ~30s +- Direct test from a pod also refused: `kubectl -n honeydue exec deploy/vmagent + -- wget --no-check-certificate -qO- -T 3 https://10.43.0.1:443/livez` + +**Cause:** k3s's built-in NetworkPolicy controller evaluates egress rules +**after** kube-proxy's DNAT, not before (contrary to the k8s spec). +Traffic from a pod to the `kubernetes` Service (ClusterIP `10.43.0.1:443`) +gets DNAT'd to `:6443`, and **then** the policy check +runs. Without an explicit egress rule for `:6443`, the packet is rejected +with a TCP RST → "connection refused". + +The `allow-egress-from-vmagent` NetPol in `network-policies.yaml` includes +both rules: + +```yaml +# Pre-DNAT view (correct per spec; harmless if unused) +- to: + - ipBlock: { cidr: 10.43.0.0/16 } + ports: + - { port: 443, protocol: TCP } +# Post-DNAT path (what k3s NetPol enforcer actually sees) — REQUIRED +- to: + - ipBlock: + cidr: 0.0.0.0/0 + except: [10.42.0.0/16] + ports: + - { port: 6443, protocol: TCP } +``` + +**If this happens on a fresh deploy:** confirm `network-policies.yaml` +was applied: +```bash +kubectl -n honeydue get netpol allow-egress-from-vmagent -o yaml +``` +Look for the port-6443 egress rule. If missing, the apply step in +`03-deploy.sh` was skipped or the file was edited and the rule got +dropped. + +**Counter-evidence that confirms diagnosis:** kube-state-metrics in +`kube-system` works fine, because `kube-system` has no NetPols. So if +ksm is healthy but workloads in `honeydue` can't reach the apiserver +ClusterIP, this gotcha is the cause. + +--- + +### vmagent appears healthy but no data in Grafana + +vmagent's `/-/healthy` endpoint returns 200 as long as the process is +alive and remote-write is functional (TCP-level) — it does **not** +check whether scrapes are succeeding. We saw this fail once: vmagent +was "healthy" for 17 days while having zero healthy targets due to a +broken k8s SD watch. + +The liveness probe in `vmagent.yaml` queries the agent's `/api/v1/targets` +endpoint and fails the pod if no target is in state `up`. After 3 +consecutive failures (~3 min), kubelet recycles the pod and SD restarts +clean. + +**Verify it's working:** `kubectl -n honeydue describe pod -l app.kubernetes.io/name=vmagent` +should show `Liveness: exec [sh -c ...]`. If you ever see vmagent running +for weeks but no metrics in Grafana, the probe was disabled or the exec +command broke. + +--- + +### vmagent's bearer token got blown away after `kubectl apply -f vmagent.yaml` + +The committed `vmagent.yaml` has `bearer_token: TOKEN_PLACEHOLDER`. The +real token is sed-substituted at deploy time by `03-deploy.sh`. If you +ever apply `vmagent.yaml` directly: + +```bash +kubectl apply -f deploy-k3s/manifests/observability/vmagent.yaml # WRONG +``` + +the Secret gets overwritten with the literal string `TOKEN_PLACEHOLDER` +and all remote-writes start returning 401 from obs.88oakapps.com. + +**To restore without a full redeploy** (the safe inline path): + +```bash +export KUBECONFIG=... +OBS_TOKEN_B64=$(kubectl -n honeydue get secret honeydue-secrets \ + -o jsonpath='{.data.OBS_INGEST_TOKEN}') +kubectl -n honeydue patch secret vmagent-remote-write --type=json \ + -p="[{\"op\":\"replace\",\"path\":\"/data/bearer_token\",\"value\":\"${OBS_TOKEN_B64}\"}]" +kubectl -n honeydue rollout restart deploy/vmagent +``` + +The OBS token also lives in `honeydue-secrets.OBS_INGEST_TOKEN` because +the api pods use it for traces — same secret, same value. + +**Or just re-run the deploy:** `./deploy-k3s/scripts/03-deploy.sh`. The +sed step handles the substitution correctly. + +--- + +### Node kubeconfig is world-readable + +`/etc/rancher/k3s/k3s.yaml` is mode `0644` per the `--write-kubeconfig-mode=644` +k3s install flag. Any process on the host (including any container that +mounts the host filesystem) can read full cluster-admin credentials. + +This is intentional for the deploy user but worth knowing — any container +escape becomes immediate cluster-admin. Tracked as finding **F4** in +`k3_audit_5_12.md`. + +To tighten (if you ever turn this knob): change to `--write-kubeconfig-mode=600` +in the k3s install command, then re-fetch `deploy-k3s/kubeconfig`. + +--- + +## Common operations + +### Fetch a working kubectl tunnel (if `deploy-k3s/kubeconfig` is missing or stale) + +```bash +ssh -i ~/.ssh/hetzner deploy@hetzner1 'sudo cat /etc/rancher/k3s/k3s.yaml' \ + | sed 's|server: https://127.0.0.1:6443|server: https://178.104.247.152:6443|' \ + > deploy-k3s/kubeconfig +chmod 600 deploy-k3s/kubeconfig +``` + +If the public :6443 is firewalled from your IP (the default — only +Cloudflare ranges are allowed for app traffic; admin is locked down): + +```bash +# SSH tunnel — leave running in another terminal +ssh -fN -o ExitOnForwardFailure=yes -o ServerAliveInterval=30 \ + -i ~/.ssh/hetzner \ + -L 127.0.0.1:6443:127.0.0.1:6443 \ + deploy@hetzner1 + +# Then use a kubeconfig pointing at localhost +cp deploy-k3s/kubeconfig deploy-k3s/kubeconfig.tunnel +sed -i.bak 's|https://178.104.247.152:6443|https://127.0.0.1:6443|' \ + deploy-k3s/kubeconfig.tunnel +export KUBECONFIG="$(pwd)/deploy-k3s/kubeconfig.tunnel" +``` + +### Restore vmagent after a "0 targets" incident + +```bash +export KUBECONFIG="$(pwd)/deploy-k3s/kubeconfig.tunnel" + +# 1. Confirm the diagnosis +kubectl -n honeydue logs deploy/vmagent --tail=20 | grep -i "connect: connection refused" + +# 2. Check the NetPol has the :6443 rule +kubectl -n honeydue get netpol allow-egress-from-vmagent -o yaml | grep -A 5 6443 + +# 3. If missing, re-apply +kubectl apply -f deploy-k3s/manifests/network-policies.yaml + +# 4. Restart vmagent +kubectl -n honeydue rollout restart deploy/vmagent + +# 5. Verify targets after ~60s +kubectl -n honeydue port-forward deploy/vmagent 8429:8429 & +curl -s http://localhost:8429/api/v1/targets \ + | python3 -c "import json,sys; d=json.load(sys.stdin); \ + a=d['data']['activeTargets']; \ + print(f'targets={len(a)} up={sum(1 for t in a if t[\"health\"]==\"up\")}')" +``` + +### Verify NetPols match the repo + +If you suspect drift between cluster and repo: + +```bash +diff <(kubectl -n honeydue get netpol -o name | sort) \ + <(grep -E '^\s*name: ' deploy-k3s/manifests/network-policies.yaml \ + | sed 's/.*name: /networkpolicy.networking.k8s.io\//' | sort) +``` + +Empty output = match. Any differences need investigation — either the +cluster has policies that aren't in repo (manual `kubectl apply` did it) +or repo has policies that didn't apply. + +--- + +## Disaster recovery notes + +### "I have to redeploy the whole stack" + +The deploy path is designed to be re-runnable. From a fresh cluster: + +1. Install k3s on all 3 nodes (use existing `deploy-k3s/scripts/01-install-k3s.sh`) +2. Fetch a kubeconfig (see "Common operations" above) +3. Confirm `deploy/prod.env` has all required secrets: + - `POSTGRES_PASSWORD`, `SECRET_KEY`, `EMAIL_HOST_PASSWORD`, + `FCM_SERVER_KEY`, `B2_KEY_ID`, `B2_APP_KEY`, `OBS_INGEST_TOKEN`, + `OBS_TRACES_URL`, `REDIS_PASSWORD` (optional), `ADMIN_EMAIL`, `ADMIN_PASSWORD` +4. Run `./deploy-k3s/scripts/02-setup-secrets.sh` (creates `honeydue-secrets`) +5. Run `./deploy-k3s/scripts/03-deploy.sh` (deploys everything; sed-injects + the obs token into vmagent at apply time) +6. Verify: `kubectl -n honeydue get pods` should show all workloads Running + +### Post-redeploy verification checklist + +- [ ] `kubectl -n honeydue get netpol` shows **12 NetPols** (default-deny + + 6 egress + 5 ingress) +- [ ] `kubectl -n honeydue get netpol allow-egress-from-vmagent -o yaml | grep 6443` + returns the rule (if missing → see "vmagent SD broken" gotcha) +- [ ] `kubectl -n kube-system get pod -l app.kubernetes.io/name=kube-state-metrics` + shows 1 Running pod +- [ ] `kubectl -n honeydue port-forward deploy/vmagent 8429:8429` + curl + `localhost:8429/api/v1/targets` shows 4+ targets, all `up` +- [ ] Grafana panel "pods up" in `honeydue` namespace populates within 60s + +If any of those fail, this runbook entry tells you exactly which gotcha +you hit. diff --git a/deploy-k3s/manifests/network-policies.yaml b/deploy-k3s/manifests/network-policies.yaml index afb9e25..5bb59e1 100644 --- a/deploy-k3s/manifests/network-policies.yaml +++ b/deploy-k3s/manifests/network-policies.yaml @@ -275,3 +275,100 @@ spec: ports: - protocol: TCP port: 443 + +--- +# vmagent egress. +# +# IMPORTANT (gotcha): k3s's built-in NetworkPolicy controller appears to +# evaluate egress rules AFTER kube-proxy's DNAT, not before (contrary to +# the k8s spec). So traffic from a pod to the kubernetes Service +# (ClusterIP 10.43.0.1:443) is policy-checked as dst=:6443. +# That's why we need an explicit rule for :6443 to public IPs, even though +# we already allow :443 to the cluster service CIDR. +# +# Without the :6443 rule, vmagent's k8s service discovery silently fails +# and zero pods get scraped. See deploy-k3s/RUNBOOK.md ("vmagent SD broken"). +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-egress-from-vmagent + namespace: honeydue +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: vmagent + policyTypes: + - Egress + egress: + # DNS (cluster-internal) + - to: + - namespaceSelector: {} + ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP + # k8s API server via ClusterIP (pre-DNAT view) + - to: + - ipBlock: + cidr: 10.43.0.0/16 + ports: + - port: 443 + protocol: TCP + # k8s API server post-DNAT (real path k3s NetPol enforcer sees) — REQUIRED + - to: + - ipBlock: + cidr: 0.0.0.0/0 + except: + - 10.42.0.0/16 + ports: + - port: 6443 + protocol: TCP + # Scrape api Pods on :8000 + - to: + - ipBlock: + cidr: 10.42.0.0/16 + ports: + - port: 8000 + protocol: TCP + # Scrape kube-state-metrics Pod on :8080 (pod CIDR) + - to: + - ipBlock: + cidr: 10.42.0.0/16 + ports: + - port: 8080 + protocol: TCP + # HTTPS to public (remote-write to obs.88oakapps.com via Cloudflare) + - to: + - ipBlock: + cidr: 0.0.0.0/0 + except: + - 10.42.0.0/16 + - 10.43.0.0/16 + ports: + - port: 443 + protocol: TCP + +--- +# Allow vmagent → api ingress on :8000 so api pods accept scrapes. +# api Pods are otherwise locked down by default-deny-all + allow-ingress-to-api +# (which only allows Traefik). This adds vmagent specifically. +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-vmagent-to-api + namespace: honeydue +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: api + policyTypes: + - Ingress + ingress: + - from: + - podSelector: + matchLabels: + app.kubernetes.io/name: vmagent + ports: + - port: 8000 + protocol: TCP diff --git a/deploy-k3s/manifests/observability/kube-state-metrics.yaml b/deploy-k3s/manifests/observability/kube-state-metrics.yaml new file mode 100644 index 0000000..5be090f --- /dev/null +++ b/deploy-k3s/manifests/observability/kube-state-metrics.yaml @@ -0,0 +1,223 @@ +# kube-state-metrics — exposes cluster object state (pods, deployments, +# services, etc.) as Prometheus metrics. vmagent scrapes it via the api +# group defined in vmagent-config; Grafana panels that count pods, +# replicas, etc. consume the `kube_*` metrics this produces. +# +# Lives in kube-system because it watches resources cluster-wide. +# RBAC is cluster-scoped (ClusterRole + ClusterRoleBinding). +# +# Image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0 +# (latest stable as of authoring; bump when a newer minor is released) + +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: kube-state-metrics + namespace: kube-system + labels: + app.kubernetes.io/name: kube-state-metrics + app.kubernetes.io/part-of: honeydue-observability + +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: kube-state-metrics + labels: + app.kubernetes.io/name: kube-state-metrics + app.kubernetes.io/part-of: honeydue-observability +rules: + # Core resources + - apiGroups: [""] + resources: + - configmaps + - secrets + - nodes + - pods + - services + - serviceaccounts + - resourcequotas + - replicationcontrollers + - limitranges + - persistentvolumeclaims + - persistentvolumes + - namespaces + - endpoints + verbs: [list, watch] + # Apps + - apiGroups: ["apps"] + resources: + - statefulsets + - daemonsets + - deployments + - replicasets + verbs: [list, watch] + # Batch + - apiGroups: ["batch"] + resources: + - cronjobs + - jobs + verbs: [list, watch] + # Autoscaling + - apiGroups: ["autoscaling"] + resources: + - horizontalpodautoscalers + verbs: [list, watch] + # Authentication / authorization (used by some ksm collectors) + - apiGroups: ["authentication.k8s.io"] + resources: [tokenreviews] + verbs: [create] + - apiGroups: ["authorization.k8s.io"] + resources: [subjectaccessreviews] + verbs: [create] + # Policy + - apiGroups: ["policy"] + resources: [poddisruptionbudgets] + verbs: [list, watch] + # Certificate signing + - apiGroups: ["certificates.k8s.io"] + resources: [certificatesigningrequests] + verbs: [list, watch] + # Discovery + - apiGroups: ["discovery.k8s.io"] + resources: [endpointslices] + verbs: [list, watch] + # Storage + - apiGroups: ["storage.k8s.io"] + resources: + - storageclasses + - volumeattachments + verbs: [list, watch] + # Admission policy + - apiGroups: ["admissionregistration.k8s.io"] + resources: + - mutatingwebhookconfigurations + - validatingwebhookconfigurations + verbs: [list, watch] + # Networking + - apiGroups: ["networking.k8s.io"] + resources: + - networkpolicies + - ingressclasses + - ingresses + verbs: [list, watch] + # Coordination (leader election) + - apiGroups: ["coordination.k8s.io"] + resources: [leases] + verbs: [list, watch] + # RBAC + - apiGroups: ["rbac.authorization.k8s.io"] + resources: + - clusterrolebindings + - clusterroles + - rolebindings + - roles + verbs: [list, watch] + +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: kube-state-metrics + labels: + app.kubernetes.io/name: kube-state-metrics + app.kubernetes.io/part-of: honeydue-observability +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: kube-state-metrics +subjects: + - kind: ServiceAccount + name: kube-state-metrics + namespace: kube-system + +--- +apiVersion: v1 +kind: Service +metadata: + name: kube-state-metrics + namespace: kube-system + labels: + app.kubernetes.io/name: kube-state-metrics + app.kubernetes.io/part-of: honeydue-observability +spec: + type: ClusterIP + selector: + app.kubernetes.io/name: kube-state-metrics + ports: + - name: http-metrics + port: 8080 + targetPort: http-metrics + protocol: TCP + - name: telemetry + port: 8081 + targetPort: telemetry + protocol: TCP + +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: kube-state-metrics + namespace: kube-system + labels: + app.kubernetes.io/name: kube-state-metrics + app.kubernetes.io/part-of: honeydue-observability +spec: + replicas: 1 + strategy: + type: Recreate + selector: + matchLabels: + app.kubernetes.io/name: kube-state-metrics + template: + metadata: + labels: + app.kubernetes.io/name: kube-state-metrics + app.kubernetes.io/part-of: honeydue-observability + spec: + serviceAccountName: kube-state-metrics + automountServiceAccountToken: true + securityContext: + runAsNonRoot: true + runAsUser: 65534 + fsGroup: 65534 + seccompProfile: + type: RuntimeDefault + containers: + - name: kube-state-metrics + image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 8080 + name: http-metrics + - containerPort: 8081 + name: telemetry + args: + - --port=8080 + - --telemetry-port=8081 + resources: + requests: + cpu: 25m + memory: 64Mi + limits: + cpu: 200m + memory: 256Mi + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ALL] + readOnlyRootFilesystem: true + livenessProbe: + httpGet: + path: /livez + port: http-metrics + initialDelaySeconds: 5 + periodSeconds: 30 + readinessProbe: + httpGet: + path: /readyz + port: http-metrics + initialDelaySeconds: 5 + periodSeconds: 10 diff --git a/deploy-k3s/manifests/observability/vmagent.yaml b/deploy-k3s/manifests/observability/vmagent.yaml index b36d545..e12032d 100644 --- a/deploy-k3s/manifests/observability/vmagent.yaml +++ b/deploy-k3s/manifests/observability/vmagent.yaml @@ -42,6 +42,21 @@ data: - target_label: service replacement: api + # kube-state-metrics — cluster object state (kube_pod_*, kube_deployment_*, + # etc.) needed for Grafana panels that count pods/replicas/etc. + - job_name: kube-state-metrics + kubernetes_sd_configs: + - role: endpoints + namespaces: + names: [kube-system] + relabel_configs: + - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name] + action: keep + regex: kube-state-metrics + - source_labels: [__meta_kubernetes_endpoint_port_name] + action: keep + regex: http-metrics + # honeyDue worker — also exposes /metrics if/when we add it. # Keep this stanza commented until the worker has a /metrics endpoint; # uncommented form drops scrapes silently. @@ -104,6 +119,35 @@ roleRef: name: vmagent apiGroup: rbac.authorization.k8s.io +--- +# Allow vmagent to discover the kube-state-metrics Service/Endpoints in +# kube-system so the kube-state-metrics scrape job can find its target. +# Cross-namespace SD needs an explicit RoleBinding here. +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: vmagent-kube-system + namespace: kube-system +rules: + - apiGroups: [""] + resources: [services, endpoints, pods] + verbs: [get, list, watch] + +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: vmagent-kube-system + namespace: kube-system +subjects: + - kind: ServiceAccount + name: vmagent + namespace: honeydue +roleRef: + kind: Role + name: vmagent-kube-system + apiGroup: rbac.authorization.k8s.io + --- apiVersion: apps/v1 kind: Deployment @@ -162,12 +206,31 @@ spec: readOnly: true - name: buffer mountPath: /tmp/vmagent - livenessProbe: + # Process startup gate. /-/healthy returns 200 once vmagent has + # parsed config — gives the agent up to 2 min to come up before + # liveness starts evaluating. + startupProbe: httpGet: path: /-/healthy port: http - initialDelaySeconds: 10 - periodSeconds: 30 + initialDelaySeconds: 5 + periodSeconds: 5 + failureThreshold: 24 + # Real liveness check: are scrapes actually succeeding? + # /-/healthy was the old probe and returned 200 for 17 days even + # while vmagent had zero healthy targets (stale k8s SD watch). + # This exec probe queries vmagent's own targets API and fails if + # NO target is in state "up". Three consecutive failures (3 min) + # → kubelet kills the pod → fresh SD watch. + livenessProbe: + exec: + command: + - sh + - -c + - 'n=$(wget -qO- http://localhost:8429/api/v1/targets 2>/dev/null | grep -c ''"health":"up"''); [ "$n" -gt 0 ]' + initialDelaySeconds: 120 + periodSeconds: 60 + failureThreshold: 3 readinessProbe: httpGet: path: /-/healthy diff --git a/deploy-k3s/scripts/03-deploy.sh b/deploy-k3s/scripts/03-deploy.sh index c42b8b7..0169669 100755 --- a/deploy-k3s/scripts/03-deploy.sh +++ b/deploy-k3s/scripts/03-deploy.sh @@ -146,6 +146,14 @@ kubectl create configmap honeydue-config \ log "Applying manifests..." kubectl apply -f "${MANIFESTS}/namespace.yaml" + +# NetworkPolicies first — default-deny-all + per-app allow rules. +# These MUST be applied; without them the cluster falls back to default-allow +# (worse posture) AND the vmagent egress rule for :6443 (which fixes a k3s +# post-DNAT enforcement quirk for k8s API discovery) is missing. +# See deploy-k3s/RUNBOOK.md ("vmagent SD broken on fresh deploy"). +kubectl apply -f "${MANIFESTS}/network-policies.yaml" + kubectl apply -f "${MANIFESTS}/redis/" kubectl apply -f "${MANIFESTS}/ingress/" @@ -181,10 +189,16 @@ if [[ -d "${MANIFESTS}/web" ]]; then kubectl apply -f "${MANIFESTS}/web/service.yaml" fi -# Observability — vmagent scrapes api Pods :8000/metrics and remote-writes -# to obs.88oakapps.com. The bearer token comes from deploy/prod.env so it -# stays out of the repo; the manifest holds TOKEN_PLACEHOLDER. +# Observability — vmagent scrapes api Pods :8000/metrics + kube-state-metrics +# :8080/metrics and remote-writes everything to obs.88oakapps.com. The bearer +# token comes from deploy/prod.env so it stays out of the repo; the manifest +# holds TOKEN_PLACEHOLDER. kube-state-metrics provides the kube_* metrics +# Grafana panels need to count pods, deployments, etc. if [[ -d "${MANIFESTS}/observability" ]]; then + # kube-state-metrics — no secrets, plain apply + kubectl apply -f "${MANIFESTS}/observability/kube-state-metrics.yaml" + + # vmagent — needs the bearer-token substitution # prod.env lives at the repo's deploy/ dir (sibling of deploy-k3s/), not # under deploy-k3s/. It's gitignored — operator copies values there once. OBS_TOKEN="$(grep -E '^OBS_INGEST_TOKEN=' "${REPO_DIR}/deploy/prod.env" 2>/dev/null | cut -d= -f2- || true)"