Fixing Duplicate IP Endpoint Conflicts in Windows L2Bridge Networks

#Kubernetes #Windows #L2Bridge #HNS #kube-proxy #Networking #Troubleshooting

What's the problem?

Resolve connectivity failures and DNS timeouts in Windows K8s clusters caused by stale remote HNS endpoints conflicting with local pod IP reassignments.

Why does this happen?

During high pod churn, Windows L2Bridge networks retain 'stale' remote endpoints for reused IPs. Because kube-proxy fails to distinguish between local and remote status, the Virtual Filtering Platform (VFP) incorrectly routes traffic to outdated nodes instead of the current local pod.

Code Example

// Logic implemented in the proxy sync cycle to prioritize local endpoints
if newEndpoint.isLocal && existingEndpoint.isRemote {
    // Promote local endpoint and queue stale remote for deletion
    remoteEPsWithDupIP.Add(existingEndpoint)
    activeMap[ip] = newEndpoint
}

// Defer cleanup of identified stale remote endpoints
defer deleteAllRemoteEndpointsWithDupIP(remoteEPsWithDupIP)

How to fix it

Upgrade your kube-proxy to a version incorporating the new priority-based HNS reconciliation logic. This update implements a dual-collection strategy in 'getAllEndpointsByNetwork' to detect IP overlaps and explicitly trigger 'deleteAllRemoteEndpointsWithDupIP' during the sync cycle, ensuring local pod endpoints always take precedence over stale remote entries.