Two GKE clusters, identical down to the version, zone, and machine type
— save one thing: how they move a packet. The same
LoadBalancer Service, fronting three nginx pods, runs on
the legacy dataplane (kube-proxy writing iptables) and on Dataplane V2
(managed Cilium, in eBPF). Here they are, traced side by side down every
layer — legacy on the left, eBPF on the right.
Most layers come out identical — the dim ones below.
The dataplane truly diverges at L4
(DNAT) and L6 (failover); then eBPF
adds two things iptables has no answer for —
L7 observability and
L8 policy.
V1kube-proxy + iptables
V2Cilium eBPF
L1Service creation & IP allocation
V1
GCE forwarding rule
$gcloud compute forwarding-rules list \
--filter="IPAddress=136.64.249.50"
NAME REGION IP_ADDRESS IP_PROTOCOL TARGET
ac973ff85436744bfa586378db41283e us-central1 136.64.249.50 TCP us-central1/targetPools/ac973ff85436744bfa586378db41283e
The external VIP is a regional GCE forwarding rule pointing at a target pool.
V1
Service & allocated IPs
$kubectl -n netflow-test get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx LoadBalancer 10.24.9.186 136.64.249.50 80:31057/TCP 50m
One Service, three IPs: ClusterIP 10.24.9.186 (internal), external VIP 136.64.249.50, NodePort 31057.
V1
Service detail
$kubectl -n netflow-test describe svc nginx
Name: nginx
Namespace: netflow-test
Labels: <none>
Annotations: cloud.google.com/neg: {"ingress":true}
networking.gke.io/target-pool: ac973ff85436744bfa586378db41283e
Selector: app=nginx
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.24.9.186
IPs: 10.24.9.186
LoadBalancer Ingress: 136.64.249.50 (VIP)
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 31057/TCP
Endpoints: 10.20.1.13:80,10.20.0.8:80,10.20.0.9:80
Session Affinity: None
External Traffic Policy: Cluster
Internal Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 46m (x4 over 50m) service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 45m (x4 over 49m) service-controller Ensured load balancer
The GKE target-pool annotation and the three backing pod endpoints show up here.
V2
GCE forwarding rule
$gcloud compute forwarding-rules list \
--filter="IPAddress=34.72.78.182"
NAME REGION IP_ADDRESS IP_PROTOCOL TARGET
a433432921d1a44f9b12e84eff8ab128 us-central1 34.72.78.182 TCP us-central1/targetPools/a433432921d1a44f9b12e84eff8ab128
Google's LB frontend is identical regardless of dataplane — the forwarding rule doesn't know or care about eBPF.
V2
Service & allocated IPs
$kubectl -n netflow-test get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx LoadBalancer 10.24.8.39 34.72.78.182 80:30574/TCP 47s
Same three-IP shape as V1: ClusterIP 10.24.8.39, external VIP 34.72.78.182, NodePort 30574. The Kubernetes objects are identical — only the dataplane below differs.
V2
Service detail
$kubectl -n netflow-test describe svc nginx
Name: nginx
Namespace: netflow-test
Labels: <none>
Annotations: cloud.google.com/neg: {"ingress":true}
networking.gke.io/target-pool: a433432921d1a44f9b12e84eff8ab128
Selector: app=nginx
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.24.8.39
IPs: 10.24.8.39
LoadBalancer Ingress: 34.72.78.182 (VIP)
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 30574/TCP
Endpoints: 10.20.1.9:80,10.20.1.10:80,10.20.0.15:80
Session Affinity: None
External Traffic Policy: Cluster
Internal Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 6s (x2 over 47s) service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 1s (x2 over 6s) service-controller Ensured load balancer
Byte-for-byte the same Service spec as V1. The divergence lives at L4, not here.
A packet addressed to the Service's ClusterIP has to be turned into a packet for a real pod — that rewrite is DNAT. kube-proxy implements it as iptables rules: one chain per Service that picks a backend by cumulative probability (1/3, then 1/2 of the rest, then the remainder, so three pods each get an equal share), then jumps to a per-endpoint chain that rewrites the destination to that pod's IP. One Service expands to ~14 rules, and the kernel walks the chain top-to-bottom for every packet — O(N).
V2
eBPF service map
$kubectl -n kube-system exec <anetd-pod> -c cilium-agent \
-- cilium-dbg service list
Same job — turn a ClusterIP packet into a pod packet — but Cilium stores the mapping as rows in an in-kernel hash map instead of a rule tree. Each frontend (a Service IP:port) points at a list of backend pods; the eBPF program does a single O(1) lookup per packet and rewrites the destination. Notice nginx appears three times: ClusterIP, NodePort, and LoadBalancer are three frontends resolving to the same backend set. The whole state is one structured query, versus V1's 100+ lines of chains to read by eye.
externalTrafficPolicy decides what a node does with external traffic that lands on it. 'Cluster' (the default) lets any node forward to a pod on any node — even hopping to another node — so every request succeeds. The cost is an extra hop and the client's IP being replaced by the node's (SNAT). All ten requests return 200 at ~0.1s.
'Local' mode forwards only to a pod on the receiving node — preserving the real client IP (no SNAT) — and relies on a health check to steer around nodes with no local pod. With a pod on every node (symmetric topology), that trade-off is invisible: this looks exactly like Cluster mode. It only bites when the topology is uneven (next).
V1
Local mode — failover window
$for i in $(seq 20); do curl -s -o /dev/null -w \
'%{http_code} %{time_total}\n' http://136.64.249.50; \
done # ETP=Local, 1 pod
With the load balancer still sending to the pod-less node, V1 (iptables) drops those packets into a black hole — the client waits the full 6-second timeout for every one. Nothing recovers inside this 20-request window: the target-pool health check takes ~130 seconds to remove the empty node. Slow, silent failure.
V1
Local mode — after convergence
$for i in $(seq 10); do curl -s -o /dev/null -w \
'%{http_code} %{time_total}\n' http://136.64.249.50; \
done # ~130s after scale-to-1
Re-probing ~130 seconds later confirms V1 does eventually recover: once the target-pool health check finally drops the pod-less node, requests settle to steady 200s. The behavior isn't broken — just slow to converge, which is the inherent cost of health-check-driven, iptables-era failover.
Deployment: nginx replicas=1
Pod: nginx-bb6b8c496-j6s5g on node gke-authlab-gke-primary-c7a308a6-bnmh
Service externalTrafficPolicy: Local
Nodes:
node/gke-authlab-gke-primary-c7a308a6-bnmh
node/gke-authlab-gke-primary-c7a308a6-wvbv
To expose Local mode's trade-off, scale to a single pod so one of the two backend nodes now has no local endpoint. The external load balancer keeps sending to both nodes until its health check notices and removes the empty one — and how fast that happens is precisely where the two dataplanes part ways.
V2
Cluster mode (default)
$for i in $(seq 10); do curl -s -o /dev/null -w \
'%{http_code} %{time_total}\n' http://34.72.78.182; done \
# ETP=Cluster
Cluster mode behaves identically on eBPF at steady state — any node forwards to any pod, every request succeeds at ~0.1s. The externalTrafficPolicy contract is the same; the dataplane difference only surfaces during failover, which we force below.
Same story on eBPF: with a pod on every node, Local mode is indistinguishable from Cluster. Whether the dataplane is iptables or eBPF makes no difference while the topology is symmetric — the divergence appears only once we break it.
V2
Local mode — failover window
$for i in $(seq 20); do curl -s -o /dev/null -w \
'%{http_code} %{time_total}\n' http://34.72.78.182; done \
# ETP=Local, 1 pod
The same scenario on eBPF behaves very differently. Requests to the pod-less node fail in ~50 milliseconds, not 6 seconds — the eBPF program actively rejects when there's no local backend instead of black-holing the packet — and traffic converges to steady success within this same short window. Fast-fail plus quick convergence, versus V1's slow-timeout plus ~130s convergence.
Deployment: nginx replicas=1
Pod: nginx-bb6b8c496-w8kkn on node gke-authlab-gke-primary-a62dd7d5-91t0
Service externalTrafficPolicy: Local
Nodes:
node/gke-authlab-gke-primary-a62dd7d5-91t0
node/gke-authlab-gke-primary-a62dd7d5-q8b5
Same setup on V2: one pod, two backend nodes, so one node has no local endpoint. Identical asymmetry — the interesting part is how differently each dataplane handles the requests that land on the pod-less node while the health check catches up.
L7Flow observability
iptables has no native flow observability. To trace a flow on the legacy dataplane you reconstruct it by hand — conntrack -L, tcpdump — with no identity, verdict, direction, or L7 context. The eBPF datapath emits all of that for free.
This is what eBPF unlocks that iptables cannot: a record of every flow, essentially for free, because the same in-kernel programs that route the packet also emit it. Each line names the source and destination by IDENTITY (world = external, remote-node = another node, plus the nginx endpoint), a VERDICT (FORWARDED), the direction, and TCP flags. On V1 there is no equivalent — you would reconstruct this by hand from conntrack and tcpdump, with no identity or verdict attached.
The relay aggregates flows from the Cilium agent on every node, over mTLS. Status shows it healthy with both nodes connected and ~54 flows/second streaming — the observability pipeline is live before we look at any actual traffic.
V2
Managed Hubble Relay
$kubectl -n gke-managed-dpv2-observability get pods,svc
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hubble-relay-56b84459c4-cqzqj 3/3 Running 0 13m 10.20.1.13 gke-authlab-gke-primary-a62dd7d5-q8b5 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/hubble-relay ClusterIP 10.24.9.240 <none> 443/TCP 13m k8s-app=hubble-relay
Hubble is Cilium's flow-observability layer. On GKE you don't install it — enabling one flag makes Google deploy the Hubble Relay into a managed namespace for you. That's the 'managed' half of managed Cilium: the capability is there, but you flip a switch instead of running the components yourself (and get a leaner surface than a self-managed cluster would).
L8Network policy
Our V1 cluster ran with the network-policy addon OFF — the legacy dataplane enforced nothing, so any pod could reach any pod. Turning it on means the Calico addon: iptables rules matched on pod IP/CIDR, with no notion of workload identity. The eBPF dataplane instead enforces by a numeric identity derived from labels.
V2
The policy (the intent)
$kubectl -n netflow-test get networkpolicy \
nginx-allow-frontend -o yaml
Network policy is a firewall for pod-to-pod traffic. This is a plain Kubernetes NetworkPolicy — note there is no CiliumNetworkPolicy on managed GKE — declaring: pods labeled app=nginx accept ingress only from pods labeled role=frontend, and everything else is denied by default. Next we watch the eBPF dataplane actually enforce it.
V2
Cilium identities, not IPs
$kubectl -n netflow-test get ciliumendpoints -o \
custom-columns=POD:..,IDENTITY:..,IP:..
POD IDENTITY IP
client-allowed 51014 10.20.0.18
client-blocked 6385 10.20.1.14
nginx-bb6b8c496-2gd6m 18565 10.20.1.12
nginx-bb6b8c496-vc6pj 18565 10.20.1.11
nginx-bb6b8c496-w8kkn 18565 10.20.0.15
The idea that makes eBPF policy scale: Cilium doesn't reason about pod IPs, it assigns each workload a numeric IDENTITY derived from its labels. All three nginx pods share one identity (18565) because their labels match; the two clients get their own. Enforcement is then identity-to-identity, so it survives pods being recreated with new IPs — the churn that would endlessly rewrite an IP-based iptables ruleset.
The same request, sent from two different identities. The client labeled role=frontend (identity 51014) gets HTTP 200; the client labeled role=other (6385) times out after 5 seconds — its packets are dropped before nginx ever sees them. The policy is working, but the request itself gives no hint as to why. For that, we watch it in Hubble.
The drop, made visible. Hubble reports the blocked client (identity 6385) reaching nginx (18565) as 'Policy denied DROPPED' — and it happens at the very first SYN, so the TCP handshake never begins (that's why the curl hung for its whole timeout). Attributing a drop to a specific workload identity and policy decision, in real time, is exactly what the legacy dataplane could never show.
POLICY DIRECTION LABELS (source:key[=value]) PORT/PROTO PROXY PORT AUTH TYPE BYTES PACKETS PREFIX
Allow Ingress reserved:host ANY NONE disabled - - 0
reserved:remote-node
Allow Ingress k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=netflow-test ANY NONE disabled 1078 14 0
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=default
k8s:io.kubernetes.pod.namespace=netflow-test
k8s:role=frontend
Allow Egress ANY ANY NONE disabled 30444 279 0
And here is the enforcement itself — not magic, just an eBPF allowlist keyed by identity, programmed onto the nginx endpoint. There is an Allow entry for the role=frontend label set (the byte/packet counters prove real traffic matched it) and no entry at all for role=other, so identity 6385 falls through to default-deny. The whole arc in one map: YAML intent → label-derived identity → this allowlist → the Hubble verdict.