Skip to content

Commit

Permalink
#4253 - Upgrade Redis Part 2 (#4309)
Browse files Browse the repository at this point in the history
As part of the existing redis files removal, these files should be
removed by creating the new Pull request once the new redis-cluster is
deployed succesfully.

<img width="320" alt="image"
src="https://github.com/user-attachments/assets/db64333f-82c7-41f4-a189-f7a27809584b"
/>

<img width="312" alt="image"
src="https://github.com/user-attachments/assets/c270cfc0-c1f5-4a45-b8a8-90489db9e689"
/>

- PVC is updated with 1GB size in sync with the existing PROD size
- Service account is required while creating the cluster as it helps the
Redis Pods the necessary RBAC (Role-Based Access Control) to interact
with the other objects created during installation like the secrets,
configmaps and PVCs.
- Existing Makefile commands are removed for the old redis in devops
folder as we usse helm installation for the new redis and is available
in devops/helm/redis-cluster folder.
- Redis Creds will have 32 alphanumeric generated password as previously
generated ones.

**AOF vs RDB**
As part of the analysis, finding the right persistence mechanism for our
project was crucial, so on checking the official documentations of the
https://redis.io/docs/latest/operate/oss_and_stack/management/persistence/,
here are some of the answer.

**Common functionalities of AOF and RDB and how it is used during
disaster recovery**
- We have enabled PVC for our DB, so both AOF and RDB gets saved into
it.
- Even if we uninstall the helm chart, the PVCs stay and when tried to
install again with a different version or after disaster recovery, the
existing PVC is connected automatically by the helm current
configurations and there is no loss of data

**AOF**
Is kind of a write operation to the disk in a file appending everytime,
usually it will have serious of files which does base file, incremental
update file and manifest file. This can be found by running the below
command and answers as below in the redis-cli in any of the
redis-cluster pods.
```
$ cat /opt/bitnami/redis/etc/redis.conf | grep appendonly
appendonly yes
# For example, if appendfilename is set to appendonly.aof, the following file
# - appendonly.aof.1.base.rdb as a base file.
# - appendonly.aof.1.incr.aof, appendonly.aof.2.incr.aof as incremental files.
# - appendonly.aof.manifest as a manifest file.
appendfilename "appendonly.aof"
appenddirname "appendonlydir"
```
There files are present in /bitnami/redis/data folder and the file
appendonly.aof.1.base.rdb is the base file and
appendonly.aof.1.incr.aof, appendonly.aof.2.incr.aof are the incremental
files and the appendonly.aof.manifest is the manifest file, where it has
the metadata/configuration of the aof files.
The reason we have 2 incremental files appendonly.aof.2.incr.aof, is
when the base file corrupts and the new base file needs to be replaced,
with the child creating a new base AOF file while the parent logs
updates in an incremental AOF; once rewriting completes, Redis
atomically updates the manifest and cleans up old files to ensure a
consistent dataset. This is a feature we have in Redis 7+, as we are
using 7.4.2-debian-12-r0, it available.

PROS and CONS:
The only downside of AOF is, as the filesize is very large due to the
incremental updates, it will be take more time to recover but the loss
of data in case of disaster is maximum one sec, this is done using the
configuration below.
<img width="248" alt="image"
src="https://github.com/user-attachments/assets/b22b2ada-f175-46d9-a656-1b68f9619272"
/>

**RDB**
Is a file which takes a SNAPSHOT of the current dataset more like a
backup strategy that run in certain intervals as configured. It is a
single file and can be found running the below command in the redis-cli
of the redis-cluster pods.
```
$ cat /opt/bitnami/redis/etc/redis.conf | grep dbfilename
# and 'dbfilename') and that aren't usually modified during runtime
dbfilename dump.rdb
# above using the 'dbfilename' configuration directive.
```
The file is present in /bitnami/redis/data and the file dump.rdb
contains the snapshot of the dataset, The configuration for them is done
in the save configuration as below.
<img width="549" alt="image"
src="https://github.com/user-attachments/assets/13aa1f8a-1e09-4964-8349-50d059b84b46"
/>
| **Time Interval (seconds)** | **Minimum Number of Changes** |
|----------------------------|------------------------------|
| 900 seconds (15 minutes)   | 1 change                     |
| 300 seconds (5 minutes)    | 10 changes                   |
| 60 seconds (1 minute)      | 10,000 changes               |

PROS and CONS:
RDB can recover the Data quickly as it does not have to run through
multiple files or the filesize is relatively smaller than the AOF. But
the only downside is the interval in which the changes are saved as per
the current configuration for minimal changes as 10 is around 5 minutes
and if there is only one change it is 15 min. So if there is any 9 data
changes, as per the RDB configuration the change to save in the disk
will take 15 min, and during this time if there is a disaster, it will
lose those 9 data changes.

**Conclusion**
To have the best of both worlds of RDB and AOF, enabling both of them at
the same time, solves the recovery strategy. Also after the
implementation of the helm installation for Redis, the upgrade and full
disaster recovery can be done via the github actions.

**Installation and upgrade of redis**
Installing/Upgrade of redis-cluster is handled by the GHA `Redis Cluster
- Install/Upgrade` .

![image](https://github.com/user-attachments/assets/cc525402-70b2-437f-8531-2e3820415b30)

**Issues in the Redis Cluster**
Troubleshooting guides as per the BC GOV is given clearly in the given
links

**https://github.com/bcgov/common-service-showcase/wiki/Redis-Troubleshooting**
Also if the cluster fails completely, we can uninstall the redis using
the
`helm delete redis-cluster . -n {NAMESPACE}` commands run from the
`/devops/helm/redis-cluster` folder. This ensures the PVC's are not
deleted and cluster is removed. So when installing the redis-cluster
using the GHA in the previous steps, it can be recovered, without
minimum or no data loss.

**Migration from Old Redis**
- Bring the old redis pods in the statefulset to 0 

![image](https://github.com/user-attachments/assets/636c9f28-76a3-4b62-a088-c37664d75357)
- Install redis-cluster using the GHA `Redis Cluster - Install/Upgrade`
.
- Deploying the release tag - this ensure all the applications will have
the updated redis host and password from the new redis and once the
deployment is successful, the API, queue-consumers and workers
connections should work seemlessly.
- Currently backup and recovery of the redis keys from old to new redis
steps are not requested, but can be done by port-forwarding locally the
existing redis and backing up and restoring into the new redis-cluster.

**Rollback Procedures**
- During rollback the newly created redis-cluster statefulset pods
should be bring down to 0
![image](https://github.com/user-attachments/assets/0d68f9ec-fefb-446d-a03c-c9c2c632237a)
- Bring the old redis from 0 to 6
![image](https://github.com/user-attachments/assets/404ba468-0197-48a6-9433-a46e3b505b94)
- Continue the rollback steps in the release notes.

**Note:**
Once the deployment is complete and the redis-cluster is in place, the
wiki will be updated.
  • Loading branch information
guru-aot authored Feb 5, 2025
1 parent c7a35a5 commit 129f923
Show file tree
Hide file tree
Showing 9 changed files with 150 additions and 132 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/env-setup-sysdig-teams.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
- name: Log in to OpenShift
run: |
oc login --token=${{ secrets.SA_TOKEN }} --server=${{ vars.OPENSHIFT_CLUSTER_URL }}
- name: Delete Redis
- name: Updating Sysdig Team
working-directory: "./devops/"
run: |
make update-sysdig-team
2 changes: 1 addition & 1 deletion devops/helm/redis-cluster/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ Return Redis&reg; password
{{- else if not (empty .Values.password) -}}
{{- .Values.password -}}
{{- else -}}
{{- randAlphaNum 10 -}}
{{- randAlphaNum 32 -}}
{{- end -}}
{{- end -}}

Expand Down
2 changes: 1 addition & 1 deletion devops/helm/redis-cluster/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1396,7 +1396,7 @@ data:
#
# Please check https://redis.io/topics/persistence for more information.
appendonly no
appendonly yes
# The base name of the append only file.
#
Expand Down
51 changes: 27 additions & 24 deletions devops/helm/redis-cluster/values-0c27fb-dev.yaml
Original file line number Diff line number Diff line change
@@ -1,40 +1,43 @@
persistence:
size: 1Gi

volumePermissions:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param volumePermissions.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 250m
# memory: 512Mi
# limits:
# cpu: 100m
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
redis:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param redis.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
## Cluster update job settings
updateJob:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param updateJob.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
51 changes: 27 additions & 24 deletions devops/helm/redis-cluster/values-0c27fb-prod.yaml
Original file line number Diff line number Diff line change
@@ -1,40 +1,43 @@
persistence:
size: 1Gi

volumePermissions:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param volumePermissions.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 250m
# memory: 512Mi
# limits:
# cpu: 100m
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
redis:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param redis.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
## Cluster update job settings
updateJob:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param updateJob.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
51 changes: 27 additions & 24 deletions devops/helm/redis-cluster/values-0c27fb-test.yaml
Original file line number Diff line number Diff line change
@@ -1,40 +1,43 @@
persistence:
size: 1Gi

volumePermissions:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param volumePermissions.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 250m
# memory: 512Mi
# limits:
# cpu: 100m
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
redis:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param redis.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
## Cluster update job settings
updateJob:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param updateJob.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
21 changes: 12 additions & 9 deletions devops/helm/redis-cluster/values-a6ef19-dev.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
persistence:
size: 1Gi

volumePermissions:
# resourcesPreset: "nano"
## @param volumePermissions.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
resources:
requests:
cpu: 250m
memory: 512Mi
cpu: 500m
memory: 1024Mi
limits:
cpu: 100m
cpu: 500m
memory: 1024Mi

# resources: {}
Expand All @@ -17,10 +20,10 @@ redis:
## Example:
resources:
requests:
cpu: 1
memory: 512Mi
cpu: 500m
memory: 1024Mi
limits:
cpu: 2
cpu: 500m
memory: 1024Mi

# resources: {}
Expand All @@ -31,10 +34,10 @@ updateJob:
## Example:
resources:
requests:
cpu: 1
memory: 512Mi
cpu: 500m
memory: 1024Mi
limits:
cpu: 2
cpu: 500m
memory: 1024Mi

# resources: {}
51 changes: 27 additions & 24 deletions devops/helm/redis-cluster/values-a6ef19-prod.yaml
Original file line number Diff line number Diff line change
@@ -1,40 +1,43 @@
persistence:
size: 1Gi

volumePermissions:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param volumePermissions.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 250m
# memory: 512Mi
# limits:
# cpu: 100m
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
redis:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param redis.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
## Cluster update job settings
updateJob:
resourcesPreset: "nano"
# resourcesPreset: "nano"
## @param updateJob.resources Set container requests and limits for different resources like CPU or memory (essential for production workloads)
## Example:
# resources:
# requests:
# cpu: 1
# memory: 512Mi
# limits:
# cpu: 2
# memory: 1024Mi
resources:
requests:
cpu: 500m
memory: 1024Mi
limits:
cpu: 500m
memory: 1024Mi

# resources: {}
Loading

0 comments on commit 129f923

Please sign in to comment.