6f303dbbaa
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
466 lines
18 KiB
Markdown
466 lines
18 KiB
Markdown
# 03 — Networking
|
||
|
||
## Summary
|
||
|
||
The network stack has five layers: the physical/internet layer (Hetzner's
|
||
public network), the node layer (Ubuntu with UFW), the Kubernetes overlay
|
||
(Flannel VXLAN), the service layer (kube-proxy IPVS + CoreDNS), and the
|
||
ingress layer (Traefik). This chapter walks through each, explains how
|
||
they compose, and traces a single HTTP request from browser to Go API
|
||
response showing every hop.
|
||
|
||
## The five layers
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
subgraph L5[Layer 5 — Ingress]
|
||
Traefik
|
||
end
|
||
subgraph L4[Layer 4 — Service discovery]
|
||
KubeProxy[kube-proxy IPVS]
|
||
CoreDNS
|
||
end
|
||
subgraph L3[Layer 3 — Pod overlay]
|
||
Flannel[Flannel VXLAN<br/>UDP 8472]
|
||
end
|
||
subgraph L2[Layer 2 — Node network]
|
||
UFW
|
||
Kernel[Linux kernel<br/>netfilter/iptables]
|
||
end
|
||
subgraph L1[Layer 1 — Physical]
|
||
Hetzner[Hetzner network<br/>public v4 + v6]
|
||
end
|
||
|
||
L5 --> L4 --> L3 --> L2 --> L1
|
||
```
|
||
|
||
### ASCII fallback
|
||
|
||
```
|
||
┌──────────────────────────────────────┐
|
||
│ L5 Traefik (host network, :80/:443)│
|
||
├──────────────────────────────────────┤
|
||
│ L4 kube-proxy (IPVS) + CoreDNS │
|
||
├──────────────────────────────────────┤
|
||
│ L3 Flannel VXLAN overlay │
|
||
│ 10.42.0.0/16 pod CIDR │
|
||
├──────────────────────────────────────┤
|
||
│ L2 Ubuntu + UFW + kernel iptables │
|
||
├──────────────────────────────────────┤
|
||
│ L1 Hetzner public IPv4/IPv6 │
|
||
└──────────────────────────────────────┘
|
||
```
|
||
|
||
## Layer 1 — Physical network
|
||
|
||
Each Hetzner CX33 has:
|
||
- A **public IPv4** address on the internet
|
||
- A **public IPv6** /64 subnet (one address used, the rest unused)
|
||
- **20 TB/mo** outbound traffic included; inbound is free
|
||
- **~1 Gbps** network bandwidth per node
|
||
|
||
All inter-node traffic goes over the **public network**. Hetzner Cloud
|
||
offers a private-network feature (vswitch), but we didn't attach one —
|
||
adding it now would require reconfiguring Flannel's advertise-addr. A
|
||
future improvement: attach a private vSwitch to all three nodes,
|
||
reconfigure Flannel to use it, shrink our public-interface attack surface.
|
||
|
||
## Layer 2 — Node network
|
||
|
||
Each node runs Ubuntu 24.04.3 LTS with:
|
||
|
||
- **Default routing** via the Hetzner-provided gateway
|
||
- **UFW** as the iptables frontend (Chapter 4 lists every rule)
|
||
- **IP forwarding** enabled (`net.ipv4.ip_forward=1`) — required for
|
||
Kubernetes pod routing
|
||
- **Bridge netfilter** enabled (`net.bridge.bridge-nf-call-iptables=1`)
|
||
— required so iptables can see bridged traffic
|
||
|
||
K3s configures the latter two automatically at install time via
|
||
`/etc/sysctl.d/90-kubelet.conf` (or similar; exact file varies by distro).
|
||
|
||
Two additional sysctls we set manually:
|
||
|
||
```
|
||
# /etc/sysctl.d/99-unprivileged-ports.conf
|
||
net.ipv4.ip_unprivileged_port_start=0
|
||
```
|
||
|
||
**Why**: Traefik runs as UID 65532 (non-root) in host network mode to bind
|
||
:80 and :443. Without this sysctl, even with `CAP_NET_BIND_SERVICE`, it
|
||
can't bind privileged ports in the host namespace. Ubuntu 24.04's default
|
||
is 1024 (so ports 1–1023 are "privileged"). Setting it to 0 lets any
|
||
user bind any port.
|
||
|
||
**Security implication**: Minimal. The ports Traefik binds are still
|
||
controlled by the container runtime — other pods on the node can't
|
||
accidentally grab 80/443 because kubelet won't schedule conflicting host
|
||
ports. And the UFW rules still gate what's reachable externally.
|
||
|
||
## Layer 3 — Pod overlay (Flannel VXLAN)
|
||
|
||
### What Flannel is
|
||
|
||
Flannel is a CNI (Container Network Interface) plugin. Its job: give every
|
||
pod in the cluster a routable IP address, and make those IPs reachable
|
||
from any other pod regardless of which node they're on.
|
||
|
||
### The pod CIDR
|
||
|
||
K3s assigns **10.42.0.0/16** as the cluster-wide pod CIDR by default. Each
|
||
node gets a /24 slice:
|
||
|
||
| Node | Pod CIDR |
|
||
|---|---|
|
||
| ubuntu-8gb-nbg1-1 | 10.42.1.0/24 |
|
||
| ubuntu-8gb-nbg1-2 | 10.42.0.0/24 |
|
||
| ubuntu-8gb-nbg1-3 | 10.42.2.0/24 |
|
||
|
||
Each pod gets an IP from its node's slice. So a pod on hetzner2
|
||
(`nbg1-1`) might be `10.42.1.6`; a pod on hetzner3 (`nbg1-3`) might be
|
||
`10.42.2.10`.
|
||
|
||
### How VXLAN works
|
||
|
||
VXLAN ("Virtual Extensible LAN") tunnels Layer-2 frames over UDP. Flannel
|
||
wraps every inter-node packet like so:
|
||
|
||
```
|
||
Original pod → pod packet:
|
||
┌──────────────────────────────────────────────────┐
|
||
│ Ethernet │ IP src=10.42.0.5 → dst=10.42.2.10 │ … │
|
||
└──────────────────────────────────────────────────┘
|
||
|
||
Flannel VXLAN-encapsulates it:
|
||
┌──────────────────────────────────────────────────────────────────┐
|
||
│ Eth │ IP src=178.104.247.152 → dst=178.104.249.189 │ UDP 8472 │ │
|
||
│ VXLAN header │ <original Ethernet+IP+payload> │ │
|
||
└──────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
The outer IP/UDP carries the packet between nodes over Hetzner's public
|
||
network. On arrival, the destination node unwraps the VXLAN header and
|
||
delivers the inner packet to the target pod.
|
||
|
||
**UDP port 8472** is VXLAN's IANA-assigned port. It must be open
|
||
node-to-node in UFW (see Chapter 4).
|
||
|
||
**MTU note**: VXLAN encapsulation adds 50 bytes of overhead (8 VXLAN +
|
||
8 UDP + 20 IP + 14 Ethernet). Hetzner's network uses standard 1500-byte
|
||
MTU, so Flannel's overlay MTU is 1450. Mismatches cause silent packet
|
||
drops. K3s sets this correctly by default.
|
||
|
||
### Flannel config
|
||
|
||
`/var/lib/rancher/k3s/agent/etc/flannel/net-conf.json` on each node:
|
||
|
||
```json
|
||
{
|
||
"Network": "10.42.0.0/16",
|
||
"EnableIPv6": false,
|
||
"EnableIPv4": true,
|
||
"IPv6Network": "::/0",
|
||
"Backend": { "Type": "vxlan" }
|
||
}
|
||
```
|
||
|
||
We did not enable IPv6 in the cluster — an unnecessary complexity for our
|
||
scale, and CoreDNS + kube-proxy + node controllers all work fine in v4-only
|
||
mode.
|
||
|
||
### No encryption (yet)
|
||
|
||
Flannel VXLAN traffic over Hetzner's public network is **not encrypted**.
|
||
This means pod-to-pod traffic between nodes is visible to any attacker
|
||
with packet capture on the path — in practice, nobody between our three
|
||
nodes at Hetzner Nuremberg, but it's still plaintext on the wire.
|
||
|
||
**Mitigation today**: All sensitive inter-pod traffic already uses TLS:
|
||
- api ↔ Neon Postgres: TLS 1.3 (`DB_SSLMODE=require`)
|
||
- api/worker ↔ Backblaze B2: HTTPS
|
||
- api ↔ Fastmail: STARTTLS
|
||
- api ↔ Redis: plaintext but Redis only holds cache + Asynq queue state,
|
||
no user credentials
|
||
|
||
**TODO** (Chapter 20): Switch Flannel to `wireguard-native` backend. K3s
|
||
supports this with a flag at install time; enabling on an existing
|
||
cluster requires a config edit and rolling kubelet restart.
|
||
|
||
## Layer 4 — Service discovery
|
||
|
||
Pods don't talk to each other by IP — IPs are ephemeral, assigned on pod
|
||
creation. They use **service names** resolved by DNS.
|
||
|
||
### CoreDNS
|
||
|
||
K3s runs **CoreDNS** as the cluster DNS server. A pod in the `honeydue`
|
||
namespace resolves `redis` to the Redis Service's ClusterIP:
|
||
|
||
```
|
||
redis → 10.43.7.10 (Service ClusterIP)
|
||
redis.honeydue → 10.43.7.10
|
||
redis.honeydue.svc.cluster.local → 10.43.7.10
|
||
```
|
||
|
||
When an app resolves `redis:6379`:
|
||
|
||
1. The pod's `/etc/resolv.conf` points to `10.43.0.10` (the CoreDNS
|
||
Service).
|
||
2. CoreDNS receives the query, checks its known Services, returns
|
||
`10.43.7.10`.
|
||
3. The pod sends TCP to `10.43.7.10:6379`.
|
||
4. kube-proxy (Layer 4, below) intercepts and routes to the actual pod IP.
|
||
|
||
### The service CIDR
|
||
|
||
K3s assigns **10.43.0.0/16** as the service CIDR. ClusterIPs live here.
|
||
Currently:
|
||
|
||
| Service | ClusterIP |
|
||
|---|---|
|
||
| `api.honeydue` | 10.43.167.83 |
|
||
| `admin.honeydue` | 10.43.136.168 |
|
||
| `redis.honeydue` | 10.43.7.10 |
|
||
| `kubernetes.default` | 10.43.0.1 |
|
||
| `kube-dns.kube-system` | 10.43.0.10 |
|
||
|
||
ClusterIPs are **stable** for the life of the Service — they don't change
|
||
when pods come and go.
|
||
|
||
### kube-proxy (IPVS mode)
|
||
|
||
`kube-proxy` is the dataplane component that makes Services work. It runs
|
||
as a DaemonSet (one per node), watches the k3s API for Service and
|
||
Endpoint changes, and programs the kernel to route traffic.
|
||
|
||
K3s defaults to **IPVS mode** on modern kernels. IPVS is a Linux kernel
|
||
feature for in-kernel L4 load balancing — essentially connection-tracking
|
||
NAT with round-robin or other scheduling.
|
||
|
||
When a pod dials `10.43.7.10:6379`:
|
||
|
||
1. The first packet hits the node's kernel
|
||
2. IPVS sees the destination is a ClusterIP
|
||
3. IPVS picks an endpoint from the Service's endpoint set (e.g.,
|
||
`10.42.0.10:6379` on hetzner2)
|
||
4. IPVS rewrites the destination and forwards
|
||
5. Flannel tunnels it to the destination node (if remote) or delivers
|
||
locally (if the endpoint is on the same node)
|
||
|
||
This happens per-TCP-connection, not per-packet, thanks to conntrack.
|
||
|
||
### Why IPVS over iptables
|
||
|
||
K3s' default kube-proxy mode is IPVS. The alternative (iptables mode) is
|
||
older and slower — for every Service, iptables mode adds a chain of rules
|
||
that grow linearly with Service count. IPVS uses a hash table and scales
|
||
to thousands of Services without performance degradation. At our scale
|
||
either works, but IPVS is the better default.
|
||
|
||
### Headless Services
|
||
|
||
Some of our Services are *not* using a ClusterIP — they're "headless"
|
||
(`clusterIP: None`). Our setup doesn't currently use them but it's worth
|
||
knowing the distinction: headless Services return all endpoint IPs
|
||
directly via DNS, no kube-proxy involvement. Useful for stateful sets
|
||
where clients need to talk to a specific replica.
|
||
|
||
## Layer 5 — Ingress (Traefik)
|
||
|
||
External traffic arrives on the node's public :80 or :443. Traefik
|
||
handles the first mile of routing. See [Chapter 6](./06-traefik-ingress.md)
|
||
for Traefik-specific details; this section just shows how it fits in the
|
||
networking stack.
|
||
|
||
Traefik runs as a **DaemonSet** with `hostNetwork: true`. That means:
|
||
- One Traefik pod per node
|
||
- Each pod is in the **host's network namespace**, not a pod netns
|
||
- Each pod can bind directly to `0.0.0.0:80` and `0.0.0.0:443` on the node
|
||
|
||
When Cloudflare sends a request to `178.104.247.152:80`:
|
||
|
||
1. Packet arrives at hetzner1's NIC
|
||
2. UFW accepts (80/tcp is open from anywhere)
|
||
3. Linux kernel routes to localhost:80 because something's listening
|
||
4. Traefik (running in host namespace) accepts the connection
|
||
5. Traefik reads the `Host:` header
|
||
6. Traefik matches an Ingress rule (api.myhoneydue.com → api Service)
|
||
7. Traefik dials `10.43.167.83:8000` (Service ClusterIP)
|
||
8. Kube-proxy IPVS rewrites to a live api pod endpoint
|
||
9. Flannel VXLAN tunnels if the endpoint is on a remote node
|
||
10. The api pod receives the request, processes, responds
|
||
11. Response flows back the reverse path
|
||
|
||
Full trace in the [end-to-end section](#end-to-end-request-trace) below.
|
||
|
||
## IPs we care about
|
||
|
||
| What | CIDR / IP | Used for |
|
||
|---|---|---|
|
||
| Pod CIDR | 10.42.0.0/16 | All pod IPs cluster-wide |
|
||
| Service CIDR | 10.43.0.0/16 | All ClusterIPs |
|
||
| Flannel VXLAN | UDP 8472 | Pod-to-pod traffic (inter-node) |
|
||
| CoreDNS Service | 10.43.0.10:53 | Cluster DNS |
|
||
| Kubernetes Service | 10.43.0.1:443 | Internal kube-apiserver |
|
||
| Node IPs | See README | External + flannel source/dst |
|
||
| Traefik | host network | Listens on node's :80, :443 |
|
||
|
||
## End-to-end request trace
|
||
|
||
A user in Texas hits `https://api.myhoneydue.com/api/tasks/`. Here's every
|
||
hop:
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
autonumber
|
||
participant U as User (Austin, TX)
|
||
participant CF as Cloudflare edge (DFW POP)
|
||
participant H as hetzner2 (picked by CF)<br/>178.105.32.198
|
||
participant TR as Traefik pod<br/>(hostNetwork)
|
||
participant API as api pod on hetzner3<br/>10.42.2.6:8000
|
||
participant DB as Neon Postgres<br/>(AWS us-east-1)
|
||
|
||
U->>CF: HTTPS :443 GET /api/tasks/
|
||
Note over CF: TLS handshake terminates here
|
||
CF->>H: HTTP :80 (with original Host header)
|
||
H->>TR: Accepted by kernel, delivered to Traefik
|
||
Note over TR: Matches Ingress rule<br/>host: api.myhoneydue.com
|
||
TR->>TR: Resolve api.honeydue → 10.43.167.83
|
||
TR->>H: dial 10.43.167.83:8000
|
||
H->>H: kube-proxy IPVS rewrites<br/>dst → 10.42.2.6:8000
|
||
H->>API: Flannel VXLAN encapsulate<br/>UDP 8472 → hetzner3
|
||
Note over API: Pod receives packet
|
||
API->>DB: SELECT … FROM tasks WHERE user_id = …<br/>TLS :5432
|
||
DB-->>API: Result rows
|
||
API-->>TR: HTTP 200 JSON
|
||
TR-->>CF: HTTP 200
|
||
CF-->>U: HTTPS 200
|
||
```
|
||
|
||
### Timing budget for a cache-miss read
|
||
|
||
| Hop | Typical latency |
|
||
|---|---|
|
||
| User → CF edge (DFW) | 5–15 ms |
|
||
| CF edge → hetzner2 (origin HTTP :80) | 90–120 ms (cross-Atlantic) |
|
||
| UFW + kernel accept | <1 ms |
|
||
| Traefik accept + route | 1–2 ms |
|
||
| kube-proxy + Flannel (same node) | <1 ms |
|
||
| kube-proxy + Flannel (remote node, VXLAN) | 1–3 ms |
|
||
| Go API request handling | 1–5 ms |
|
||
| Neon Postgres query (TLS + SQL) | 20–60 ms (AWS us-east-1) |
|
||
| Return path (reverse) | similar |
|
||
|
||
**Total typical**: ~200–300 ms for a user in North America, dominated by
|
||
the cross-Atlantic CF→origin hop. Cached responses at Cloudflare skip the
|
||
origin hop entirely.
|
||
|
||
## Inter-node routing concretely
|
||
|
||
Here's what `ip route` shows on hetzner2 (not run live, reconstructed from
|
||
typical k3s+flannel+vxlan setup):
|
||
|
||
```
|
||
default via 172.31.1.1 dev eth0 # Hetzner gateway
|
||
10.42.0.0/24 via 10.42.0.0 dev flannel.1 # to hetzner1 pods (via VXLAN iface)
|
||
10.42.1.0/24 dev cni0 # local pods on hetzner2
|
||
10.42.2.0/24 via 10.42.2.0 dev flannel.1 # to hetzner3 pods (via VXLAN iface)
|
||
10.43.0.0/16 via 10.42.1.1 dev cni0 # services via kube-proxy
|
||
```
|
||
|
||
The `flannel.1` interface is the VXLAN tunnel endpoint. Traffic written
|
||
to it gets encapsulated in UDP 8472 and sent to the peer node's public IP.
|
||
|
||
Flannel learns about peer nodes via the Kubernetes API (it watches Node
|
||
resources). When hetzner3 joins, Flannel on hetzner1 and hetzner2 both
|
||
learn its public IP and pod CIDR, update their routes and ARP tables,
|
||
and traffic just works.
|
||
|
||
## Network performance
|
||
|
||
### Within a node (pod to pod, same host)
|
||
|
||
Packets go through `cni0` bridge, never leave the node. Sub-millisecond
|
||
latency, bounded by kernel + veth performance. Easily >10 Gbps.
|
||
|
||
### Between nodes (pod to pod, different host)
|
||
|
||
Packets go through Flannel VXLAN. Added overhead: encap/decap in the
|
||
kernel (~5–10 μs), plus the actual network hop between hetzner nodes
|
||
(~0.5 ms within the same Hetzner datacenter). Throughput is bounded by
|
||
Hetzner's NIC (≈1 Gbps sustained per node).
|
||
|
||
In practice this is fine for everything we do. The slowest link in our
|
||
application is Neon (AWS us-east-1), which is ~100 ms round-trip.
|
||
|
||
## DNS resolution path
|
||
|
||
A pod resolves `redis`:
|
||
|
||
1. App does `getaddrinfo("redis")`.
|
||
2. glibc reads `/etc/resolv.conf`, finds nameserver `10.43.0.10`.
|
||
3. sends UDP 53 to `10.43.0.10`.
|
||
4. Destination is CoreDNS Service ClusterIP.
|
||
5. kube-proxy IPVS load-balances across CoreDNS pods (there's usually 1).
|
||
6. The packet arrives at the CoreDNS pod.
|
||
7. CoreDNS checks its Kubernetes plugin cache for `redis.<ns>.svc.cluster.local`.
|
||
8. Returns `10.43.7.10` (redis Service ClusterIP) with a low TTL.
|
||
|
||
CoreDNS is stateless — if it restarts, pods re-query on their next lookup.
|
||
|
||
**DNS caching in pods**: The Go API uses `net.Resolver` which does not
|
||
cache by default. Each new connection triggers a fresh DNS lookup. This
|
||
is correct behavior for Kubernetes (where Service IPs are stable but
|
||
Endpoints change), but it means a CoreDNS outage breaks new connections
|
||
immediately.
|
||
|
||
Next.js (admin) also uses Node's default resolver, similar behavior.
|
||
|
||
## What breaks if X fails
|
||
|
||
| Failure | Symptom |
|
||
|---|---|
|
||
| Flannel daemon on one node crashes | Pods on that node can't reach other nodes' pods; kube-proxy Services sometimes work (kernel conntrack) |
|
||
| CoreDNS pod crashes (only 1) | New connection DNS lookups fail; existing connections continue |
|
||
| kube-proxy daemon on one node crashes | Pods on that node can't resolve Service ClusterIPs; direct pod IPs still work |
|
||
| UFW misconfigured (port 8472 UDP blocked) | Pods on that node can't reach remote pods over overlay |
|
||
| Node's NIC fails | Node unreachable; Raft loses it; its pods get rescheduled elsewhere |
|
||
| Hetzner datacenter outage | Entire cluster offline |
|
||
|
||
## Operator cheat sheet
|
||
|
||
```bash
|
||
# See all IPs in the cluster
|
||
kubectl get pods -A -o wide # pod IPs + nodes
|
||
kubectl get svc -A # Service ClusterIPs
|
||
|
||
# Test pod-to-pod DNS from inside a pod
|
||
kubectl exec -n honeydue deploy/api -- nslookup redis
|
||
kubectl exec -n honeydue deploy/api -- getent hosts redis
|
||
|
||
# Test pod-to-pod TCP connectivity
|
||
kubectl exec -n honeydue deploy/api -- nc -zv redis 6379
|
||
kubectl exec -n honeydue deploy/api -- wget -q -O- http://admin:3000/
|
||
|
||
# See the node's iptables/IPVS rules (run on a node)
|
||
ssh deploy@hetzner1 "sudo ipvsadm -Ln"
|
||
ssh deploy@hetzner1 "sudo iptables -L -n -t nat | head -50"
|
||
|
||
# See the cluster's flannel state
|
||
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.status.addresses[?(@.type=="InternalIP")].address}{" "}{.spec.podCIDR}{"\n"}{end}'
|
||
```
|
||
|
||
## References
|
||
|
||
- [Kubernetes networking concepts][k8s-net]
|
||
- [Flannel VXLAN backend][flannel-vxlan]
|
||
- [CoreDNS k8s plugin][coredns-k8s]
|
||
- [IPVS mode for kube-proxy][ipvs]
|
||
- [VXLAN RFC 7348][vxlan-rfc]
|
||
|
||
[k8s-net]: https://kubernetes.io/docs/concepts/services-networking/
|
||
[flannel-vxlan]: https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#vxlan
|
||
[coredns-k8s]: https://coredns.io/plugins/kubernetes/
|
||
[ipvs]: https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/
|
||
[vxlan-rfc]: https://datatracker.ietf.org/doc/html/rfc7348
|