Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Enhancement: app readiness should depend on all containers #79

Open
rosenhouse opened this issue Nov 8, 2019 · 6 comments
Open

Enhancement: app readiness should depend on all containers #79

rosenhouse opened this issue Nov 8, 2019 · 6 comments

Comments

@rosenhouse
Copy link

we're playing with automatic sidecar injection from Istio.

looks like Eirini considers the app "Running" if only 1 of the containers is running.

we probably want to change it so that all (non-init) containers must be Ready before CF sees the App as running.

would y'all be open to a PR?

cc @tcdowney

related:

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/169660862

The labels on this github issue will be updated when the story is started.

@julz
Copy link
Contributor

julz commented Nov 11, 2019

Hey - sounds good to us & a PR would be very welcome!

@julz julz closed this as completed Nov 11, 2019
@julz julz reopened this Nov 11, 2019
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/169674530

The labels on this github issue will be updated when the story is started.

@rosenhouse
Copy link
Author

rosenhouse commented Nov 19, 2019

We're working on this and related things under the heading of "get CATS passing when Eirini has Istio sidecars"

Today's issue: In Diego, with multiple containers (system-provided Envoy, or user-provided sidecar), if any of the containers (garden peas) were to crash, then Diego will tear down and reschedule the whole pod.

In Kubernetes, this doesn't appear to be the behavior. Not default. Not even something directly support in a PodSpec -- we'd have to do a bunch of work (extra wiring somehow) to get K8s to mimic the Diego behavior.

It seems the K8s preferred behavior is restart the crashed container in-place, keeping the pod intact.

Does this sound right to folks?

@julz @alex-slynko @JulzDiverse

cc @emalm @zrob

@julz
Copy link
Contributor

julz commented Nov 19, 2019

I guess off the top of my head the first question is whether you actually need the Diego behaviour. If the sidecar crashes then it'll get restarted, at which point either (a) the main container starts working again or (b) the main container fails its health check, is restarted, works - either way the system is back up and running? Is there a case where we need the whole pod to be torn down if a container fails?

@cwlbraa
Copy link
Contributor

cwlbraa commented Nov 19, 2019

Using liveness checks to determine when to reschedule is actually much better than the Diego Codependent behavior for user provided sidecars, actually.

The user story that comes to mind is when you have a memory hungry APM agent (in Java or Ruby, for example) running next to a lighter weight app. If the APM sidecar exceeds its memory limits, ideally we OOM kill the APM and restart the pod without taking down the app.

The situation that's interesting here is to do in the absence of a user-provided health check... What the cc api calls a "process" type healthcheck. Should Eirini provide a liveness probe that confirms that the main process is running?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants