Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBS: Expired entries stay much longer than TTL and consume disk space #15121

Open
Tracked by #14931
carsonip opened this issue Jan 3, 2025 · 2 comments
Open
Tracked by #14931

Comments

@carsonip
Copy link
Member

carsonip commented Jan 3, 2025

#14923 focuses on resolving the state where apm-server is stuck at exceeding storage limit indefinitely. On the other hand, this Issue focuses on the fact that badger DB disk space is not used efficiently, as entries are staying around for longer than TTL in general. It violates the assumption that badger DB is a factor of event ingest throughput * TTL.

Compactions in badger DB is only triggered by reaching level size target, and may take a long time. If the value size is greater than ValueThreshold, the value is generally cleared on vlog gc, and what's left in LSM tree for longer than TTL is just the key. But if value size is smaller than ValueThreshold, both the key and value live in LSM tree will need to wait for the next compaction.

Potential solution: Lmax to Lmax compaction in badger v4 could be helpful to clear out expired entries without waiting for level size target, but the performance impact is unclear.

@axw
Copy link
Member

axw commented Jan 6, 2025

Another option would be to create a new database for each time interval (say 1 minute), and delete databases after TTL + interval. (It would be necessary to add the DB bucketing interval in case an event falls right at the end of the interval.)

This option has the benefit that it doesn't rely on the TTL feature of Badger, so we could potentially consolidate on Pebble for an LSM needs.

@carsonip
Copy link
Member Author

carsonip commented Jan 6, 2025

Another option would be to create a new database for each time interval (say 1 minute), and delete databases after TTL + interval. (It would be necessary to add the DB bucketing interval in case an event falls right at the end of the interval.)

Forgot to write that down in this issue. I have a PoC on a variant of your idea (time interval = TTL) implemented in carsonip@2868f22 using badger. DB is created every TTL, and we keep 2 DBs alive at any given point of time, such that the age of event in the combination of DB is strictly bounded by 2 * TTL. With the refactoring done in #15112 this can be implemented relatively easily on main.

Pebble is another story, but I agree that will then be easier to adopt as we no longer rely on any TTL feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants