-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with GC and keeplast -> stuck at the number of snapshots #204
Comments
What about an option to clean 'everything' except the most recent snapshot? |
That’s a good idea, but I think we need to find a strategy that is as automated as possible. However, the 2 proposals are not incompatible :) |
Speaking form experience of dealing with customers who let their SQL DBs run out of disk space - there isn't much you can do without disk space, except note that you've run out of it, and operations can resume when disk space is restored. One possibility is to set a threshold of required free space, which will fail zsys snapshotting operations unless disk space is freed. |
The issue described here is that if the disk space is taken by the minimal amount of snapshots to keep (20 in the default), then, nothing can happen:
So, nothing is collected, nothing new is created. I think we can push the GC or make it more agressive with decaying snapshots in that situation (with a manual user interaction, ofc) |
Perhaps a time component should also come into play when deleting snapshots. For example, snapshots could still be deleted under the following condition: Free disk space < 20%, there are less than 21 snapshots -> Delete the oldest snapshot if it is older than 30 days (for example). This brings a time component into play. At least we get out of the deadlock over time. But it also doesn't blanket delete everything immediately. |
If I understand this correctly, the problem is reduced space is causing zsys to stop snapshotting. Theoretically, releasing an old snapshot will effectively release 'some' space, the question is how much space is enough. One option could be a cleanup-size-threshold, which means:
I would strongly argue this shouldn't be part of the GC's automatic behavior, but must be invoked by the user. |
We can’t decide automatically to remove the oldest snapshots:
I agree that those decisions should not be automatic. There is already a hint in the message to free up manually (with |
In ZFS I can set a snapshot to hold. This way it cannot be deleted by mistake. The same principle should apply in zsys. I should be able to put certain states on hold and the rest may be deleted according to the rules. Currently the automatic rules lead to the described dead-lock. It is imperative to avoid this. |
Both excellent points. |
Not (quite) true. ZFS shows you exactly what each snapshot takes, by itself. Deleting this (single) snapshot will free the space unique to that snapshot. What's harder is figuring out which range of snapshots share what amount of space, such that deleting the fewest adjacent ones can free the most amount of space. For this, you can get some information from the 'written' property. After that, in modern ZFS the send space estimator is pretty usable. This does come down to "diffing multiple states", but it at least somewhat less intricate. However, I agree that admin intervention and policy is probably required before this point. As for age, I agree that oldest is not necessarily best / most desirable. A strategy of "thinning" could help: find the two snapshots closest together in time, and remove the one that has the closer neighbour on the opposite side (or the one with the most unique space). Also, zsysd seems to have some properties that mark successful vs unsuccessful boots, and last booted time, which could factor in for candidate selection Using (and paying attention to) ZFS holds is absolutely the right way to go, as this is a mechanism that can be shared between multiple parties, including backup/replication tools and the admin's manual choices. |
Where can I have a current discussion on getting to a work-around for this problem? Thanks... |
Don’t remove individual datasets but remove rather states: https://didrocks.fr/2020/06/02/zfs-focus-on-ubuntu-20.04-lts-zsys-commands-for-state-management/. The other way is to change the |
apologies, I'm meaning states when I say snapshots. I've spent more time with zfs - I was referring to zsysctl 'states' I have tried removing a lot of zsysctl states which has no effect. Here is the output of
I have changed the keeplast property.
|
Did you check Issue #155 - there is some code shared there on how to do cleanups. |
Let’s say you hit the upper limit of free space
Then, any snapshot attempt will say, for instance:
ERROR couldn't save system state: Minimum free space to take a snapshot and preserve ZFS performance is 20%. Free space on pool "bpool" is 14%.
That’s ok, but let’s say you hit this threshold with the minimum of snapshots to keep (like 20), then you are stuck:
We need to define and implement a strategy.
The text was updated successfully, but these errors were encountered: