Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

200 bug cluster node initializes as healthy #252

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
824a221
added sleep to debug
MuhammadQadora Dec 22, 2024
4298c01
test
MuhammadQadora Dec 22, 2024
ee6cf93
remove sleep 600
MuhammadQadora Dec 24, 2024
c0b14a0
moved the healthcheck in the entry point
MuhammadQadora Dec 29, 2024
a0e4141
removed commented part
MuhammadQadora Dec 29, 2024
705348d
Merge branch 'main' of https://github.com/FalkorDB/falkordb-omnistrat…
MuhammadQadora Dec 29, 2024
7e8128a
updated healtch check app
MuhammadQadora Dec 31, 2024
e0946e0
updated healtch check app
MuhammadQadora Dec 31, 2024
15187ff
updated health check app
MuhammadQadora Jan 1, 2025
55aab7c
updated health check app
MuhammadQadora Jan 1, 2025
c2cf7f5
updated health check app
MuhammadQadora Jan 1, 2025
5e3329d
updated health check app
MuhammadQadora Jan 1, 2025
5c502e7
Merge branch 'main' of https://github.com/FalkorDB/falkordb-omnistrat…
MuhammadQadora Feb 12, 2025
fc4d67e
Separated healthcheck, readiness, startup probes
MuhammadQadora Feb 12, 2025
5e375fa
fixed con
MuhammadQadora Feb 12, 2025
3b3d399
Merge branch 'main' of https://github.com/FalkorDB/falkordb-omnistrat…
MuhammadQadora Feb 12, 2025
0d410d4
fixed cluster entry
MuhammadQadora Feb 12, 2025
131d13c
changed healthcheck to liveness
MuhammadQadora Feb 12, 2025
e470e1e
refactored main.rs
MuhammadQadora Feb 13, 2025
7e3b153
added dollar sign to the if statement in action hooks
MuhammadQadora Feb 13, 2025
5c2151a
fixed get_redis_url
MuhammadQadora Feb 13, 2025
6ab1f12
added print
MuhammadQadora Feb 13, 2025
a15c0c1
added logging
MuhammadQadora Feb 13, 2025
af1e9ed
added logging
MuhammadQadora Feb 13, 2025
a143f2d
added another log
MuhammadQadora Feb 13, 2025
7cc0a63
added more logs
MuhammadQadora Feb 14, 2025
12ced0b
used match
MuhammadQadora Feb 14, 2025
249c4bd
fixed missing return
MuhammadQadora Feb 14, 2025
f4c5d6b
removed logging
MuhammadQadora Feb 15, 2025
c94031c
edited get_memory_limit to add B if the the MEMORY LIMIT is ends with…
MuhammadQadora Feb 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions falkordb-cluster/cluster-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ DATE_NOW=$(date +"%Y%m%d%H%M%S")
FALKORDB_LOG_FILE_PATH=$(if [[ $SAVE_LOGS_TO_FILE -eq 1 ]]; then echo $DATA_DIR/falkordb_$DATE_NOW.log; else echo ""; fi)
NODE_CONF_FILE=$DATA_DIR/node.conf


sleep 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to get an explanation for that


if [[ $OMNISTRATE_ENVIRONMENT_TYPE != "PROD" ]];then
DEBUG=1
fi
Expand Down Expand Up @@ -458,10 +461,6 @@ else
echo "Cluster does not exist. Waiting for it to be created"
fi

# Run this before health check to prevent client connections until discrepancies are resolved.
meet_unknown_nodes
ensure_replica_connects_to_the_right_master_ip

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are those removed?


if [[ $RUN_HEALTH_CHECK -eq 1 ]]; then
# Check if healthcheck binary exists
Expand All @@ -473,6 +472,7 @@ if [[ $RUN_HEALTH_CHECK -eq 1 ]]; then
fi
fi


if [[ $RUN_METRICS -eq 1 ]]; then
echo "Starting Metrics"
exporter_url=$(if [[ $TLS == "true" ]]; then echo "rediss://$NODE_HOST:$NODE_PORT"; else echo "redis://localhost:$NODE_PORT"; fi)
Expand Down
2 changes: 1 addition & 1 deletion falkordb-node/node-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -437,7 +437,7 @@ if [ "$RUN_NODE" -eq "1" ]; then
else
echo "port $NODE_PORT" >>$NODE_CONF_FILE
fi

redis-server $NODE_CONF_FILE --logfile $FALKORDB_LOG_FILE_PATH &
falkordb_pid=$!
tail -F $FALKORDB_LOG_FILE_PATH &
Expand Down
86 changes: 64 additions & 22 deletions healthcheck_rs/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ fn main() {
let args: Vec<String> = args().collect();

if args.len() > 1 && args[1] == "sentinel" {
start_health_check_server(true);
start_probes_check_server(true);
} else {
start_health_check_server(false);
start_probes_check_server(false);
}
}

Expand All @@ -26,7 +26,7 @@ fn main() {
/// # Arguments
///
/// * `is_sentinel` - A boolean that indicates whether the health check server is for a Redis Sentinel instance.
fn start_health_check_server(is_sentinel: bool) {
fn start_probes_check_server(is_sentinel: bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for the name change?

let port = if is_sentinel {
env::var("HEALTH_CHECK_PORT_SENTINEL").unwrap_or_else(|_| "8082".to_string())
} else {
Expand All @@ -38,14 +38,26 @@ fn start_health_check_server(is_sentinel: bool) {
let server = Server::new(addr, move |request| {
router!(request,
(GET) (/healthcheck) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change the route to /liveness?

let health = health_check_handler(is_sentinel).unwrap_or_else(|_| false);
let health = probes_check_handler(is_sentinel,false,true).unwrap_or_else(|_| false);

if health {
Response::text("OK")
} else {
Response::text("Not ready").with_status_code(500)
}
},
(GET) (/readiness) => {
let health = probes_check_handler(is_sentinel,true,false).unwrap_or_else(|_| false);

if health {
Response::text("OK")
} else {
Response::text("Not ready").with_status_code(500)
}
},
(GET) (/startup) => {
Response::text("OK")
},
_ => Response::empty_404()
)
})
Expand Down Expand Up @@ -74,7 +86,7 @@ fn start_health_check_server(is_sentinel: bool) {
/// # Errors
///
/// The function returns a RedisError if there is an error connecting to the Redis server.
fn health_check_handler(is_sentinel: bool) -> Result<bool, redis::RedisError> {
fn probes_check_handler(is_sentinel: bool,readiness: bool, healthcheck: bool) -> Result<bool, redis::RedisError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think aadding bool args are a clean solution. Maybe split into 3 functions and separate common components in util functions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that goes for all functions you added bools as params.
Let's make this easier to read

let password = match env::var("ADMIN_PASSWORD") {
Ok(password) => password,
Err(_) => {
Expand All @@ -84,7 +96,7 @@ fn health_check_handler(is_sentinel: bool) -> Result<bool, redis::RedisError> {
.unwrap_or_else(|_| String::new())
}
};

let node_port = if is_sentinel {
env::var("SENTINEL_PORT").unwrap_or_else(|_| "26379".to_string())
} else {
Expand Down Expand Up @@ -117,7 +129,7 @@ fn health_check_handler(is_sentinel: bool) -> Result<bool, redis::RedisError> {
let is_cluster = db_info.contains("cluster_enabled:1");

if is_cluster {
return get_status_from_cluster_node(db_info, &mut con);
return get_status_from_cluster_node(db_info, &mut con,readiness , healthcheck);
}

let role_regex = regex::Regex::new(r"role:(\w+)").unwrap();
Expand All @@ -130,9 +142,9 @@ fn health_check_handler(is_sentinel: bool) -> Result<bool, redis::RedisError> {
let role = role_matches.unwrap().get(1).unwrap().as_str();

if role == "master" {
get_status_from_master(&db_info)
get_status_from_master(&db_info,&mut con,readiness,healthcheck)
} else {
get_status_from_slave(&db_info)
get_status_from_slave(&db_info,&mut con,readiness,healthcheck)
}
}

Expand All @@ -153,13 +165,29 @@ fn health_check_handler(is_sentinel: bool) -> Result<bool, redis::RedisError> {
/// # Errors
///
/// The function returns a RedisError if there is an error querying the Redis server.

//IAM NOT SURE ABOUT CHANGING THIS FUNCTION
fn get_status_from_cluster_node(
_db_info: String,
con: &mut redis::Connection,
readiness: bool,
healthcheck: bool,
) -> Result<bool, redis::RedisError> {
let cluster_info: String = redis::cmd("CLUSTER").arg("INFO").query(con)?;

Ok(cluster_info.contains("cluster_state:ok"))
if healthcheck {
let cluster_state: bool = cluster_info.contains("cluster_state:ok");
let loading: bool = cluster_info.contains("LOADING");
let busy: bool = cluster_info.contains("BUSY"); // This might not exist in Redis.
let master_down: bool = cluster_info.contains("MASTERDOWN");
if cluster_state || loading || busy || master_down {
return Ok(true);
}
} else if readiness {
return Ok(cluster_info.contains("cluster_state:ok"));
}

Ok(false) // Default return to avoid missing a return value
}

/// Checks the status of the Redis master.
Expand All @@ -174,11 +202,20 @@ fn get_status_from_cluster_node(
/// # Returns
///
/// A boolean value that indicates whether the Redis master is ready
fn get_status_from_master(db_info: &str) -> Result<bool, redis::RedisError> {
if db_info.contains("loading:1") {
return Ok(false);
fn get_status_from_master(db_info: &str,con: &mut redis::Connection,readiness: bool, healthcheck: bool) -> Result<bool, redis::RedisError> {
let result : String = redis::cmd("PING").query(con)?;
if healthcheck {
if result.contains("PONG") || result.contains("LOADING") || result.contains("BUSY") || result.contains("MASTERDOWN"){
return Ok(true);
}

} else if readiness {
if result.contains("PONG") && db_info.contains("loading:0") {
return Ok(true);
}
}
Ok(true)

Ok(false)
}

/// Checks the status of the Redis slave.
Expand All @@ -193,16 +230,21 @@ fn get_status_from_master(db_info: &str) -> Result<bool, redis::RedisError> {
/// # Returns
///
/// A boolean value that indicates whether the Redis slave is ready
fn get_status_from_slave(db_info: &str) -> Result<bool, redis::RedisError> {
if db_info.contains("loading:1") {
return Ok(false);
}

if !db_info.contains("master_link_status:up") || db_info.contains("master_sync_in_progress:1") {
return Ok(false);
fn get_status_from_slave(db_info: &str, con: &mut redis::Connection, readiness: bool,healthcheck: bool) -> Result<bool, redis::RedisError> {

let result : String = redis::cmd("PING").query(con)?;
if healthcheck {
if result.contains("PONG") || result.contains("LOADING") || result.contains("BUSY") || result.contains("MASTERDOWN") {
return Ok(true);
}

} else if readiness {
if result.contains("PONG") && db_info.contains("loading:0") && db_info.contains("master_link_status:up") && db_info.contains("master_sync_in_progress:0") {
return Ok(true);
}
}

Ok(true)
Ok(false)
}

/// Resolves the host using the dns_lookup crate.
Expand Down
Loading
Loading