Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split node healthcheck into 3: startup, readiness and liveness #200

Open
dudizimber opened this issue Dec 2, 2024 · 4 comments · May be fixed by #252
Open

Split node healthcheck into 3: startup, readiness and liveness #200

dudizimber opened this issue Dec 2, 2024 · 4 comments · May be fixed by #252
Assignees
Labels
bug Something isn't working

Comments

@dudizimber
Copy link
Collaborator

dudizimber commented Dec 2, 2024

Startup: returns OK - checks if the healthcheck program is running.
Liveness: returns OK if redis server returns any response, including PONG, BUSY, LOADING, etc.
Readiness: returns OK if:

  • standalone: returns PONG
  • replication: master -> PONG, replica -> PONG if it returns error during master synchronization
  • cluster: PONG and cluster status is OK
@MuhammadQadora MuhammadQadora self-assigned this Dec 15, 2024
@MuhammadQadora MuhammadQadora added the bug Something isn't working label Dec 15, 2024
@MuhammadQadora MuhammadQadora linked a pull request Dec 22, 2024 that will close this issue
@MuhammadQadora MuhammadQadora removed a link to a pull request Dec 29, 2024
@MuhammadQadora MuhammadQadora linked a pull request Dec 29, 2024 that will close this issue
@MuhammadQadora
Copy link
Contributor

To solve this, Omnistrate has to separate the Liveness probe from the Readiness probe. Will reopen the issue once they do.

@dudizimber dudizimber changed the title bug: cluster node initializes as healthy Split node healthcheck into 3: startup, readiness and liveness Jan 5, 2025
@dudizimber
Copy link
Collaborator Author

@MuhammadQadora
Following the issue where replicas sometimes use the wrong master IP address, we might want to check the master_link_status:down flag for the health of cluster and replication replicas.
What do you think?

@MuhammadQadora
Copy link
Contributor

@dudizimber agree, can't think of a downside to this.

@dudizimber
Copy link
Collaborator Author

Let's also experiment using ConfigMaps to set the healthcheck code, so it can be easily changed if needed.

@MuhammadQadora MuhammadQadora linked a pull request Feb 12, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants