You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#14923 focuses on resolving the state where apm-server is stuck at exceeding storage limit indefinitely. On the other hand, this Issue focuses on the fact that badger DB disk space is not used efficiently, as entries are staying around for longer than TTL in general. It violates the assumption that badger DB is a factor of event ingest throughput * TTL.
Compactions in badger DB is only triggered by reaching level size target, and may take a long time. If the value size is greater than ValueThreshold, the value is generally cleared on vlog gc, and what's left in LSM tree for longer than TTL is just the key. But if value size is smaller than ValueThreshold, both the key and value live in LSM tree will need to wait for the next compaction.
Potential solution: Lmax to Lmax compaction in badger v4 could be helpful to clear out expired entries without waiting for level size target, but the performance impact is unclear.
The text was updated successfully, but these errors were encountered:
Another option would be to create a new database for each time interval (say 1 minute), and delete databases after TTL + interval. (It would be necessary to add the DB bucketing interval in case an event falls right at the end of the interval.)
This option has the benefit that it doesn't rely on the TTL feature of Badger, so we could potentially consolidate on Pebble for an LSM needs.
Another option would be to create a new database for each time interval (say 1 minute), and delete databases after TTL + interval. (It would be necessary to add the DB bucketing interval in case an event falls right at the end of the interval.)
Forgot to write that down in this issue. I have a PoC on a variant of your idea (time interval = TTL) implemented in carsonip@2868f22 using badger. DB is created every TTL, and we keep 2 DBs alive at any given point of time, such that the age of event in the combination of DB is strictly bounded by 2 * TTL. With the refactoring done in #15112 this can be implemented relatively easily on main.
Pebble is another story, but I agree that will then be easier to adopt as we no longer rely on any TTL feature.
#14923 focuses on resolving the state where apm-server is stuck at exceeding storage limit indefinitely. On the other hand, this Issue focuses on the fact that badger DB disk space is not used efficiently, as entries are staying around for longer than TTL in general. It violates the assumption that badger DB is a factor of event ingest throughput * TTL.
Compactions in badger DB is only triggered by reaching level size target, and may take a long time. If the value size is greater than ValueThreshold, the value is generally cleared on vlog gc, and what's left in LSM tree for longer than TTL is just the key. But if value size is smaller than ValueThreshold, both the key and value live in LSM tree will need to wait for the next compaction.
Potential solution: Lmax to Lmax compaction in badger v4 could be helpful to clear out expired entries without waiting for level size target, but the performance impact is unclear.
The text was updated successfully, but these errors were encountered: