Fixing Premature Traffic Drops During Kubernetes Node Draining

#Kubernetes #LoadBalancer #Networking #CloudController #DevOps #NodeAutoscaling

What's the problem?

Learn how to prevent service load balancer traffic drops during node cordoning by decoupling node scheduling status from cloud load balancer backend management.

Why does this happen?

The issue is caused by an over-aggressive predicate check in the cloud controller that automatically removed cordoned nodes from load balancer pools. This prematurely terminated traffic to nodes during drainage, rather than waiting for the graceful termination of active connections.

Code Example

kubectl label node <node-name> node.kubernetes.io/exclude-from-external-load-balancers=true

How to fix it

To ensure traffic flows until a node is fully decommissioned, remove reliance on the 'unschedulable' status for LB management. Instead, explicitly control load balancer participation by applying the 'node.kubernetes.io/exclude-from-external-load-balancers' label to nodes only when you are ready to stop receiving external traffic.