Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a "Paused" State for Improved Backend Health Check Management #557

Open
shohamyamin opened this issue Nov 29, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@shohamyamin
Copy link

Current Behavior

Trino Gateway currently deactivates a backend if it fails a health check. Once deactivated, the backend remains inactive until manually reset to active, as automatic recovery is not supported.

Proposed Improvement

Introduce a new state, "Paused," to distinguish between:

  1. Intentionally paused backends: Marked as "Paused" by users to ensure no traffic is routed to them intentionally.
  2. Unhealthy backends: Automatically deactivated by the gateway but continuously monitored via health checks.

When a backend becomes healthy after previously failing, the gateway would automatically transition it back to an active state, resuming request handling without manual intervention.

Benefits

  • Enhanced automation: Reduces the need for manual intervention when backends recover from transient issues.
  • Operational clarity: Clearly separates user-initiated pauses from automatic deactivations caused by health check failures.
  • Improved availability: Ensures backends are reintroduced to the pool as soon as they recover, minimizing downtime.

Implementation Details

  1. State Management:

    • Add a "Paused" state to the gateway.
    • Backends marked as "Paused" by users would not be eligible for automatic reactivation.
    • Backends that fail health checks would transition to an "inactive" state and remain eligible for automatic reactivation.
  2. Health Check Monitoring:

    • Continue health checks for "inactive" backends.
    • Automatically transition them to "active" once they pass the health checks.
  3. User Interaction:

    • Users can manually set a backend to "Paused."
    • The gateway should provide clear feedback about the reason for a backend's current state (e.g., paused by the user or deactivated due to health check failure).

Example Workflow

  1. A backend fails a health check and is marked "inactive."
  2. The gateway continues monitoring the backend.
  3. Once the backend is healthy, it automatically transitions to "active."
  4. A user intentionally pauses a backend, marking it as "Paused." This backend will not be reactivated automatically, even if healthy.
@shohamyamin shohamyamin added the enhancement New feature or request label Nov 29, 2024
@rdsarvar
Copy link
Contributor

rdsarvar commented Dec 2, 2024

Is this a duplicate of #80?

@mosabua
Copy link
Member

mosabua commented Dec 2, 2024

I wouldnt call it a duplicate .. but closely related and overlapping ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants