Idea: Pod-level probes or exclude some containers from pod readiness

What's the problem?

I swear we have discussed this before but I cannot find it. Today, probes are per-container. This makes sense in a lot of ways - if the specific container is failing liveness, you usually want to restart that specific container. Also, we know that a large majority of pods run with a single container, so this has rarely been a major issue. That said, I think there are cases where it's imperfect. This came up in a user issue a few weeks ago and I meant to ping the old issue (which I cannot find), so I am opening this to discuss. The specific case in question was a pod with multiple containers - one main app container and a small number of background-helper containers (think logs/metrics). The user had configured readiness probe for the "main" app and it was stable, but one of the background helpers was crashy. It had triggered crashloop-backoff and was therefore not-ready. This makes the whole pod not-ready. This was surprising to the user. When I looked at it from thei...

Why does this happen?

Automatically extracted from Kubernetes issue report. Analysis pending LLM integration.

How to fix it

Refer to the original GitHub issue for discussion and potential fixes.

#k8s#sig-network#github-issue