Fixing Kubernetes kube-proxy High CPU and Latency in Large-Scale Clusters
What's the problem?
Resolve kube-proxy performance bottlenecks and CPU spikes in large Kubernetes clusters by disabling unnecessary full iptables synchronization cycles.
Why does this happen?
The kube-proxy 'iptables' mode triggers an aggressive, periodic 'full sync' of network rules every 30 minutes, regardless of cluster size. In large-scale environments with over 1,000 endpoints, this creates massive I/O overhead and CPU spikes, causing network latency and rule flapping.
Code Example
/* Logic modification in pkg/proxy/iptables/proxier.go */
// Original: Forces full sync based on time threshold
doFullSync := proxier.needFullSync || (time.Since(proxier.lastFullSync) > proxyutil.FullSyncPeriod)
// Optimized: Respects largeClusterMode to suppress periodic timers
doFullSync := proxier.needFullSync ||
((time.Since(proxier.lastFullSync) > proxyutil.FullSyncPeriod) && !proxier.largeClusterMode) How to fix it
To resolve this, update your kube-proxy configuration to prioritize incremental updates by suppressing timer-based full synchronizations when operating in large-scale mode. Ensure your proxy implementation accounts for 'largeClusterMode' logic, which forces the system to perform a full sync only when strictly necessary—such as upon state corruption—rather than on a fixed time interval. This reduces resource contention and stabilizes node performance during steady-state operations.