fix(observability): unbreak vmagent SD on fresh deploy + ship kube-state-metrics
vmagent's k8s service discovery has been silently broken for 17+ days
because k3s's NetworkPolicy controller evaluates egress AFTER kube-proxy's
DNAT (contrary to the k8s spec). Pod → ClusterIP 10.43.0.1:443 was
DNAT'd to <node_public_ip>:6443, and the resulting :6443 destination
matched none of vmagent's egress rules → TCP RST → "connection refused"
on every SD watch attempt. Grafana panels using kube_* or up{} metrics
returned empty as a result.
Changes:
- network-policies.yaml: commit the previously-cluster-only NetPols
(allow-egress-from-vmagent, allow-vmagent-to-api) so a fresh deploy
produces a working cluster. The vmagent egress rule now includes :6443
to public IPs (the post-DNAT path) and :8080 to the pod CIDR (for
scraping kube-state-metrics).
- observability/kube-state-metrics.yaml: new manifest. Provides the
kube_pod_*, kube_deployment_*, kube_service_* metrics that Grafana
panels need to count pods, replicas, etc. Runs in kube-system with
cluster-scoped RBAC.
- observability/vmagent.yaml:
* add kube-state-metrics scrape job to the ConfigMap
* add vmagent-kube-system Role+RoleBinding so cross-namespace SD works
* replace the misleading liveness probe (was /-/healthy, which lies
while SD is broken) with an exec probe that checks /api/v1/targets
for at least one healthy target — automatic recovery from future
stale-SD incidents
- scripts/03-deploy.sh: actually apply network-policies.yaml (was
committed but never applied) and apply kube-state-metrics.yaml.
- RUNBOOK.md (new): documents the post-DNAT gotcha, the liveness probe
trap, bearer-token recovery procedure, drift-detection diff, and a
post-redeploy verification checklist.
- .gitignore: cover kubeconfig.tunnel (created during SSH-tunnelled
kubectl sessions) so admin client cert can't be committed by accident.
Verified via kubectl --dry-run on all three modified manifests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -275,3 +275,100 @@ spec:
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 443
|
||||
|
||||
---
|
||||
# vmagent egress.
|
||||
#
|
||||
# IMPORTANT (gotcha): k3s's built-in NetworkPolicy controller appears to
|
||||
# evaluate egress rules AFTER kube-proxy's DNAT, not before (contrary to
|
||||
# the k8s spec). So traffic from a pod to the kubernetes Service
|
||||
# (ClusterIP 10.43.0.1:443) is policy-checked as dst=<node_public_ip>:6443.
|
||||
# That's why we need an explicit rule for :6443 to public IPs, even though
|
||||
# we already allow :443 to the cluster service CIDR.
|
||||
#
|
||||
# Without the :6443 rule, vmagent's k8s service discovery silently fails
|
||||
# and zero pods get scraped. See deploy-k3s/RUNBOOK.md ("vmagent SD broken").
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-egress-from-vmagent
|
||||
namespace: honeydue
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: vmagent
|
||||
policyTypes:
|
||||
- Egress
|
||||
egress:
|
||||
# DNS (cluster-internal)
|
||||
- to:
|
||||
- namespaceSelector: {}
|
||||
ports:
|
||||
- port: 53
|
||||
protocol: UDP
|
||||
- port: 53
|
||||
protocol: TCP
|
||||
# k8s API server via ClusterIP (pre-DNAT view)
|
||||
- to:
|
||||
- ipBlock:
|
||||
cidr: 10.43.0.0/16
|
||||
ports:
|
||||
- port: 443
|
||||
protocol: TCP
|
||||
# k8s API server post-DNAT (real path k3s NetPol enforcer sees) — REQUIRED
|
||||
- to:
|
||||
- ipBlock:
|
||||
cidr: 0.0.0.0/0
|
||||
except:
|
||||
- 10.42.0.0/16
|
||||
ports:
|
||||
- port: 6443
|
||||
protocol: TCP
|
||||
# Scrape api Pods on :8000
|
||||
- to:
|
||||
- ipBlock:
|
||||
cidr: 10.42.0.0/16
|
||||
ports:
|
||||
- port: 8000
|
||||
protocol: TCP
|
||||
# Scrape kube-state-metrics Pod on :8080 (pod CIDR)
|
||||
- to:
|
||||
- ipBlock:
|
||||
cidr: 10.42.0.0/16
|
||||
ports:
|
||||
- port: 8080
|
||||
protocol: TCP
|
||||
# HTTPS to public (remote-write to obs.88oakapps.com via Cloudflare)
|
||||
- to:
|
||||
- ipBlock:
|
||||
cidr: 0.0.0.0/0
|
||||
except:
|
||||
- 10.42.0.0/16
|
||||
- 10.43.0.0/16
|
||||
ports:
|
||||
- port: 443
|
||||
protocol: TCP
|
||||
|
||||
---
|
||||
# Allow vmagent → api ingress on :8000 so api pods accept scrapes.
|
||||
# api Pods are otherwise locked down by default-deny-all + allow-ingress-to-api
|
||||
# (which only allows Traefik). This adds vmagent specifically.
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-vmagent-to-api
|
||||
namespace: honeydue
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: api
|
||||
policyTypes:
|
||||
- Ingress
|
||||
ingress:
|
||||
- from:
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: vmagent
|
||||
ports:
|
||||
- port: 8000
|
||||
protocol: TCP
|
||||
|
||||
@@ -0,0 +1,223 @@
|
||||
# kube-state-metrics — exposes cluster object state (pods, deployments,
|
||||
# services, etc.) as Prometheus metrics. vmagent scrapes it via the api
|
||||
# group defined in vmagent-config; Grafana panels that count pods,
|
||||
# replicas, etc. consume the `kube_*` metrics this produces.
|
||||
#
|
||||
# Lives in kube-system because it watches resources cluster-wide.
|
||||
# RBAC is cluster-scoped (ClusterRole + ClusterRoleBinding).
|
||||
#
|
||||
# Image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
|
||||
# (latest stable as of authoring; bump when a newer minor is released)
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: kube-state-metrics
|
||||
namespace: kube-system
|
||||
labels:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: honeydue-observability
|
||||
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: kube-state-metrics
|
||||
labels:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: honeydue-observability
|
||||
rules:
|
||||
# Core resources
|
||||
- apiGroups: [""]
|
||||
resources:
|
||||
- configmaps
|
||||
- secrets
|
||||
- nodes
|
||||
- pods
|
||||
- services
|
||||
- serviceaccounts
|
||||
- resourcequotas
|
||||
- replicationcontrollers
|
||||
- limitranges
|
||||
- persistentvolumeclaims
|
||||
- persistentvolumes
|
||||
- namespaces
|
||||
- endpoints
|
||||
verbs: [list, watch]
|
||||
# Apps
|
||||
- apiGroups: ["apps"]
|
||||
resources:
|
||||
- statefulsets
|
||||
- daemonsets
|
||||
- deployments
|
||||
- replicasets
|
||||
verbs: [list, watch]
|
||||
# Batch
|
||||
- apiGroups: ["batch"]
|
||||
resources:
|
||||
- cronjobs
|
||||
- jobs
|
||||
verbs: [list, watch]
|
||||
# Autoscaling
|
||||
- apiGroups: ["autoscaling"]
|
||||
resources:
|
||||
- horizontalpodautoscalers
|
||||
verbs: [list, watch]
|
||||
# Authentication / authorization (used by some ksm collectors)
|
||||
- apiGroups: ["authentication.k8s.io"]
|
||||
resources: [tokenreviews]
|
||||
verbs: [create]
|
||||
- apiGroups: ["authorization.k8s.io"]
|
||||
resources: [subjectaccessreviews]
|
||||
verbs: [create]
|
||||
# Policy
|
||||
- apiGroups: ["policy"]
|
||||
resources: [poddisruptionbudgets]
|
||||
verbs: [list, watch]
|
||||
# Certificate signing
|
||||
- apiGroups: ["certificates.k8s.io"]
|
||||
resources: [certificatesigningrequests]
|
||||
verbs: [list, watch]
|
||||
# Discovery
|
||||
- apiGroups: ["discovery.k8s.io"]
|
||||
resources: [endpointslices]
|
||||
verbs: [list, watch]
|
||||
# Storage
|
||||
- apiGroups: ["storage.k8s.io"]
|
||||
resources:
|
||||
- storageclasses
|
||||
- volumeattachments
|
||||
verbs: [list, watch]
|
||||
# Admission policy
|
||||
- apiGroups: ["admissionregistration.k8s.io"]
|
||||
resources:
|
||||
- mutatingwebhookconfigurations
|
||||
- validatingwebhookconfigurations
|
||||
verbs: [list, watch]
|
||||
# Networking
|
||||
- apiGroups: ["networking.k8s.io"]
|
||||
resources:
|
||||
- networkpolicies
|
||||
- ingressclasses
|
||||
- ingresses
|
||||
verbs: [list, watch]
|
||||
# Coordination (leader election)
|
||||
- apiGroups: ["coordination.k8s.io"]
|
||||
resources: [leases]
|
||||
verbs: [list, watch]
|
||||
# RBAC
|
||||
- apiGroups: ["rbac.authorization.k8s.io"]
|
||||
resources:
|
||||
- clusterrolebindings
|
||||
- clusterroles
|
||||
- rolebindings
|
||||
- roles
|
||||
verbs: [list, watch]
|
||||
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: kube-state-metrics
|
||||
labels:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: honeydue-observability
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: kube-state-metrics
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: kube-state-metrics
|
||||
namespace: kube-system
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: kube-state-metrics
|
||||
namespace: kube-system
|
||||
labels:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: honeydue-observability
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
ports:
|
||||
- name: http-metrics
|
||||
port: 8080
|
||||
targetPort: http-metrics
|
||||
protocol: TCP
|
||||
- name: telemetry
|
||||
port: 8081
|
||||
targetPort: telemetry
|
||||
protocol: TCP
|
||||
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: kube-state-metrics
|
||||
namespace: kube-system
|
||||
labels:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: honeydue-observability
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy:
|
||||
type: Recreate
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: honeydue-observability
|
||||
spec:
|
||||
serviceAccountName: kube-state-metrics
|
||||
automountServiceAccountToken: true
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65534
|
||||
fsGroup: 65534
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
containers:
|
||||
- name: kube-state-metrics
|
||||
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http-metrics
|
||||
- containerPort: 8081
|
||||
name: telemetry
|
||||
args:
|
||||
- --port=8080
|
||||
- --telemetry-port=8081
|
||||
resources:
|
||||
requests:
|
||||
cpu: 25m
|
||||
memory: 64Mi
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 256Mi
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
readOnlyRootFilesystem: true
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /livez
|
||||
port: http-metrics
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 30
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /readyz
|
||||
port: http-metrics
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
@@ -42,6 +42,21 @@ data:
|
||||
- target_label: service
|
||||
replacement: api
|
||||
|
||||
# kube-state-metrics — cluster object state (kube_pod_*, kube_deployment_*,
|
||||
# etc.) needed for Grafana panels that count pods/replicas/etc.
|
||||
- job_name: kube-state-metrics
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
namespaces:
|
||||
names: [kube-system]
|
||||
relabel_configs:
|
||||
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
|
||||
action: keep
|
||||
regex: kube-state-metrics
|
||||
- source_labels: [__meta_kubernetes_endpoint_port_name]
|
||||
action: keep
|
||||
regex: http-metrics
|
||||
|
||||
# honeyDue worker — also exposes /metrics if/when we add it.
|
||||
# Keep this stanza commented until the worker has a /metrics endpoint;
|
||||
# uncommented form drops scrapes silently.
|
||||
@@ -104,6 +119,35 @@ roleRef:
|
||||
name: vmagent
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
|
||||
---
|
||||
# Allow vmagent to discover the kube-state-metrics Service/Endpoints in
|
||||
# kube-system so the kube-state-metrics scrape job can find its target.
|
||||
# Cross-namespace SD needs an explicit RoleBinding here.
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: vmagent-kube-system
|
||||
namespace: kube-system
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: [services, endpoints, pods]
|
||||
verbs: [get, list, watch]
|
||||
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: vmagent-kube-system
|
||||
namespace: kube-system
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: vmagent
|
||||
namespace: honeydue
|
||||
roleRef:
|
||||
kind: Role
|
||||
name: vmagent-kube-system
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
@@ -162,12 +206,31 @@ spec:
|
||||
readOnly: true
|
||||
- name: buffer
|
||||
mountPath: /tmp/vmagent
|
||||
livenessProbe:
|
||||
# Process startup gate. /-/healthy returns 200 once vmagent has
|
||||
# parsed config — gives the agent up to 2 min to come up before
|
||||
# liveness starts evaluating.
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /-/healthy
|
||||
port: http
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 30
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
failureThreshold: 24
|
||||
# Real liveness check: are scrapes actually succeeding?
|
||||
# /-/healthy was the old probe and returned 200 for 17 days even
|
||||
# while vmagent had zero healthy targets (stale k8s SD watch).
|
||||
# This exec probe queries vmagent's own targets API and fails if
|
||||
# NO target is in state "up". Three consecutive failures (3 min)
|
||||
# → kubelet kills the pod → fresh SD watch.
|
||||
livenessProbe:
|
||||
exec:
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- 'n=$(wget -qO- http://localhost:8429/api/v1/targets 2>/dev/null | grep -c ''"health":"up"''); [ "$n" -gt 0 ]'
|
||||
initialDelaySeconds: 120
|
||||
periodSeconds: 60
|
||||
failureThreshold: 3
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /-/healthy
|
||||
|
||||
Reference in New Issue
Block a user