Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4253 - Upgrade Redis Part 2 #4309

Merged
merged 7 commits into from
Feb 5, 2025
Merged

Conversation

guru-aot
Copy link
Collaborator

@guru-aot guru-aot commented Jan 31, 2025

As part of the existing redis files removal, these files should be removed by creating the new Pull request once the new redis-cluster is deployed succesfully.

image image
  • PVC is updated with 1GB size in sync with the existing PROD size
  • Service account is required while creating the cluster as it helps the Redis Pods the necessary RBAC (Role-Based Access Control) to interact with the other objects created during installation like the secrets, configmaps and PVCs.
  • Existing Makefile commands are removed for the old redis in devops folder as we usse helm installation for the new redis and is available in devops/helm/redis-cluster folder.
  • Redis Creds will have 32 alphanumeric generated password as previously generated ones.

AOF vs RDB
As part of the analysis, finding the right persistence mechanism for our project was crucial, so on checking the official documentations of the https://redis.io/docs/latest/operate/oss_and_stack/management/persistence/, here are some of the answer.

Common functionalities of AOF and RDB and how it is used during disaster recovery

  • We have enabled PVC for our DB, so both AOF and RDB gets saved into it.
  • Even if we uninstall the helm chart, the PVCs stay and when tried to install again with a different version or after disaster recovery, the existing PVC is connected automatically by the helm current configurations and there is no loss of data

AOF
Is kind of a write operation to the disk in a file appending everytime, usually it will have serious of files which does base file, incremental update file and manifest file. This can be found by running the below command and answers as below in the redis-cli in any of the redis-cluster pods.

$ cat /opt/bitnami/redis/etc/redis.conf | grep appendonly
appendonly yes
# For example, if appendfilename is set to appendonly.aof, the following file
# - appendonly.aof.1.base.rdb as a base file.
# - appendonly.aof.1.incr.aof, appendonly.aof.2.incr.aof as incremental files.
# - appendonly.aof.manifest as a manifest file.
appendfilename "appendonly.aof"
appenddirname "appendonlydir"

There files are present in /bitnami/redis/data folder and the file appendonly.aof.1.base.rdb is the base file and appendonly.aof.1.incr.aof, appendonly.aof.2.incr.aof are the incremental files and the appendonly.aof.manifest is the manifest file, where it has the metadata/configuration of the aof files.
The reason we have 2 incremental files appendonly.aof.2.incr.aof, is when the base file corrupts and the new base file needs to be replaced, with the child creating a new base AOF file while the parent logs updates in an incremental AOF; once rewriting completes, Redis atomically updates the manifest and cleans up old files to ensure a consistent dataset. This is a feature we have in Redis 7+, as we are using 7.4.2-debian-12-r0, it available.

PROS and CONS:
The only downside of AOF is, as the filesize is very large due to the incremental updates, it will be take more time to recover but the loss of data in case of disaster is maximum one sec, this is done using the configuration below.
image

RDB
Is a file which takes a SNAPSHOT of the current dataset more like a backup strategy that run in certain intervals as configured. It is a single file and can be found running the below command in the redis-cli of the redis-cluster pods.

$ cat /opt/bitnami/redis/etc/redis.conf | grep dbfilename
# and 'dbfilename') and that aren't usually modified during runtime
dbfilename dump.rdb
# above using the 'dbfilename' configuration directive.

The file is present in /bitnami/redis/data and the file dump.rdb contains the snapshot of the dataset, The configuration for them is done in the save configuration as below.
image

Time Interval (seconds) Minimum Number of Changes
900 seconds (15 minutes) 1 change
300 seconds (5 minutes) 10 changes
60 seconds (1 minute) 10,000 changes

PROS and CONS:
RDB can recover the Data quickly as it does not have to run through multiple files or the filesize is relatively smaller than the AOF. But the only downside is the interval in which the changes are saved as per the current configuration for minimal changes as 10 is around 5 minutes and if there is only one change it is 15 min. So if there is any 9 data changes, as per the RDB configuration the change to save in the disk will take 15 min, and during this time if there is a disaster, it will lose those 9 data changes.

Conclusion
To have the best of both worlds of RDB and AOF, enabling both of them at the same time, solves the recovery strategy. Also after the implementation of the helm installation for Redis, the upgrade and full disaster recovery can be done via the github actions.

Installation and upgrade of redis
Installing/Upgrade of redis-cluster is handled by the GHA Redis Cluster - Install/Upgrade .
image

Issues in the Redis Cluster
Troubleshooting guides as per the BC GOV is given clearly in the given links
https://github.com/bcgov/common-service-showcase/wiki/Redis-Troubleshooting
Also if the cluster fails completely, we can uninstall the redis using the
helm delete redis-cluster . -n {NAMESPACE} commands run from the /devops/helm/redis-cluster folder. This ensures the PVC's are not deleted and cluster is removed. So when installing the redis-cluster using the GHA in the previous steps, it can be recovered, without minimum or no data loss.

Migration from Old Redis

  • Bring the old redis pods in the statefulset to 0
    image
  • Install redis-cluster using the GHA Redis Cluster - Install/Upgrade .
  • Deploying the release tag - this ensure all the applications will have the updated redis host and password from the new redis and once the deployment is successful, the API, queue-consumers and workers connections should work seemlessly.
  • Currently backup and recovery of the redis keys from old to new redis steps are not requested, but can be done by port-forwarding locally the existing redis and backing up and restoring into the new redis-cluster.

Rollback Procedures

  • During rollback the newly created redis-cluster statefulset pods should be bring down to 0 image
  • Bring the old redis from 0 to 6 image
  • Continue the rollback steps in the release notes.

Note:
Once the deployment is complete and the redis-cluster is in place, the wiki will be updated.

@guru-aot guru-aot self-assigned this Feb 4, 2025
@guru-aot guru-aot added Devops Devops DB DB migration involved Technical Debt labels Feb 4, 2025
@guru-aot guru-aot marked this pull request as ready for review February 4, 2025 17:04
@guru-aot guru-aot changed the title Initial commit #4253 - Upgrade Redis Part 2 Feb 4, 2025
Copy link
Collaborator

@bidyashish bidyashish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few Points:

  1. Is old Redis PVC needs to be deleted manually or it will automatically deleted?
  2. Where is the new command to deploy or upgrade Redis?

devops/Makefile Outdated Show resolved Hide resolved
@bidyashish
Copy link
Collaborator

Add documentation for:

  • New deployment/upgrade process using helm
  • Disaster recovery procedures i.e any GHA scripts?
  • Migration steps from old Redis to new setup ?
  • Any rollback procedures?

Copy link
Collaborator

@andrewsignori-aot andrewsignori-aot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the info on AOF vs RDB. I will wait for the final version with the expected decision about release/rollback intention to take a final look.

Copy link
Collaborator

@bidyashish bidyashish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guru-aot

Looks good

Copy link

sonarqubecloud bot commented Feb 4, 2025

Copy link

github-actions bot commented Feb 4, 2025

E2E Workflow Workers Coverage Report

Totals Coverage
Statements: 65.59% ( 589 / 898 )
Methods: 59.63% ( 65 / 109 )
Lines: 68.72% ( 468 / 681 )
Branches: 51.85% ( 56 / 108 )

Copy link

github-actions bot commented Feb 4, 2025

Backend Unit Tests Coverage Report

Totals Coverage
Statements: 22.22% ( 3885 / 17486 )
Methods: 10.18% ( 226 / 2220 )
Lines: 25.54% ( 3351 / 13123 )
Branches: 14.37% ( 308 / 2143 )

Copy link

github-actions bot commented Feb 4, 2025

E2E Queue Consumers Coverage Report

Totals Coverage
Statements: 87.15% ( 1404 / 1611 )
Methods: 84.66% ( 160 / 189 )
Lines: 89.38% ( 1161 / 1299 )
Branches: 67.48% ( 83 / 123 )

@guru-aot
Copy link
Collaborator Author

guru-aot commented Feb 4, 2025

Few Points:

1. Is old Redis PVC needs to be deleted manually or it will automatically deleted?

2. Where is the new command to deploy or upgrade Redis?

The old redis needs to be manually deleted and it will be done, once the new redis-cluster is in place and running for a sprint, similar to the switchover we did for Patroni to Crunchy.

Copy link

github-actions bot commented Feb 4, 2025

E2E SIMS API Coverage Report

Totals Coverage
Statements: 68.11% ( 6040 / 8868 )
Methods: 65.72% ( 744 / 1132 )
Lines: 71.97% ( 4731 / 6574 )
Branches: 48.62% ( 565 / 1162 )

@guru-aot
Copy link
Collaborator Author

guru-aot commented Feb 4, 2025

Add documentation for:

* New deployment/upgrade process using helm

* Disaster recovery procedures i.e any GHA scripts?

* Migration steps from old Redis to new setup ?

* Any rollback procedures?

Updated in the description.

Copy link
Collaborator

@andrewsignori-aot andrewsignori-aot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, hence approving it 👍
Once release/rollback instructions are updated in the release instructions I would like to take a closer look. Please share once that is available 😉

@guru-aot guru-aot added this pull request to the merge queue Feb 5, 2025
Merged via the queue into main with commit 129f923 Feb 5, 2025
21 checks passed
@guru-aot guru-aot deleted the feature/#4253-Upgrade_Redis-Part_2 branch February 5, 2025 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB DB migration involved Devops Devops Technical Debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants