feat(observability): drop health/metrics probe noise from shipped logs
The api logs every request, so k8s liveness/readiness probes on
/api/health/ and vmagent's /metrics scrape drowned Loki in 2xx access
logs. Alloy now drops successful probe/scrape access lines at ingest
(loki.process stage.drop) — a non-2xx health check, or one logged
above info level, still matches nothing and is kept.
Also hardens Alloy's read-offset store: moved /tmp/alloy from an
emptyDir to a hostPath and set loki.source.file tail_from_end=true, so
a pod restart resumes from the saved offset instead of re-reading log
files from the start — which made Loki 400-reject the now-too-old
entries ("entry too far behind") and stalled shipping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -139,15 +139,32 @@ data:
|
|||||||
}
|
}
|
||||||
|
|
||||||
loki.source.file "pod_logs" {
|
loki.source.file "pod_logs" {
|
||||||
targets = local.file_match.pod_logs.targets
|
targets = local.file_match.pod_logs.targets
|
||||||
forward_to = [loki.process.pod_logs.receiver]
|
forward_to = [loki.process.pod_logs.receiver]
|
||||||
|
// With no stored read offset (fresh node, or positions wiped), start
|
||||||
|
// at the END of each file instead of re-shipping history — otherwise
|
||||||
|
// Loki rejects the now-too-old entries ("entry too far behind") and
|
||||||
|
// shipping stalls. Offsets persist on a hostPath (see volumes), so a
|
||||||
|
// normal pod restart resumes exactly where it left off.
|
||||||
|
tail_from_end = true
|
||||||
}
|
}
|
||||||
|
|
||||||
// Parse the CRI log format (timestamp / stream / flags / message).
|
// Parse the CRI log format (timestamp / stream / flags / message),
|
||||||
|
// then drop probe/scrape noise before shipping.
|
||||||
loki.process "pod_logs" {
|
loki.process "pod_logs" {
|
||||||
forward_to = [loki.write.obs.receiver]
|
forward_to = [loki.write.obs.receiver]
|
||||||
|
|
||||||
stage.cri {}
|
stage.cri {}
|
||||||
|
|
||||||
|
// Drop successful probe/scrape access logs. k8s liveness/readiness
|
||||||
|
// hits /api/health/ every few seconds and vmagent scrapes /metrics
|
||||||
|
// on a 15s interval — all 2xx, pure noise that drowns real logs.
|
||||||
|
// A non-2xx health check, or one logged above info level, does NOT
|
||||||
|
// match this regex and is kept.
|
||||||
|
stage.drop {
|
||||||
|
expression = "\"level\":\"info\".*\"path\":\"/(api/health/?|metrics)\".*\"status\":2[0-9][0-9]"
|
||||||
|
drop_counter_reason = "probe_access_ok"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
loki.write "obs" {
|
loki.write "obs" {
|
||||||
@@ -252,6 +269,10 @@ spec:
|
|||||||
hostPath:
|
hostPath:
|
||||||
path: /var/log/pods
|
path: /var/log/pods
|
||||||
type: Directory
|
type: Directory
|
||||||
|
# Alloy's positions/WAL store. A hostPath (not emptyDir) so file read
|
||||||
|
# offsets survive pod restarts — otherwise every restart re-reads log
|
||||||
|
# files from the start and Loki rejects the now-too-old entries.
|
||||||
- name: tmp
|
- name: tmp
|
||||||
emptyDir:
|
hostPath:
|
||||||
sizeLimit: 256Mi
|
path: /var/lib/honeydue-alloy-logs
|
||||||
|
type: DirectoryOrCreate
|
||||||
|
|||||||
Reference in New Issue
Block a user