- fix(runtime-watcher): do not exit disruption goroutine on errors
Unerecoverable errors are normal when updating PDB, for instance k8s might be updating at the same time, or we get throttled by Kube API, or we simply fail to get from Redis. In either way, if we return/exit we don't close the watcher since there's another goroutine running, leaving to scenarios that watcher is running only game room events loop. For now, if disruption failed, we log and retry in the next loop
Also:
-
fix(runtime-watcher): make Stop() indempotent
-
fix(runtime-watcher): default disruption loop to 60s
Previously the mitigate disruption loop that updates PDB was running
every 5s, changing to 60s. Also, moved the logs to INFO so we can track
this in production, one log entry per scheduler per min shouldn't spam
the logs
Full Changelog: v10.10.5...v10.10.6