You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an immudb database is using S3 storage, it is now impossible to perform a index compaction:
$ immuadmin database clean
rpc error: code = Unknown desc = comapction is unsupported when remote storage is used
I'd like to have some form of compaction for those databases too.
Why is this needed
Index on immudb grows big; normally one should periodically compact them using immuadmin database clean, but this is not the case with S3 storage.
In many cases, index is growing faster than actual data, so without index compaction, the actual local
disk occupied by immudb databases is bigger when using S3
Additional context
As an example, a database on local storage containing 1 million log lines occupies on disk 298MB after compaction.
Same database, using S3, is taking 998 MB of disk space, most of them for the indexes.
The text was updated successfully, but these errors were encountered:
@SimoneLazzaris apart from the unsupported index cleanup, there was also a bug with index not being uploaded to s3, I've fixed this with #855 - that way the local disk size should not an issue anymore.
The implementation of index compaction should be also added because fragmented index may slow down over time. It's not trivial though.
When done on a local disk, a new index is created in a separate folder and then it's renamed to a location where the currently used index is stored. S3 does not have such renaming built-in - such operation would take a significant amount of time and would not be atomic.
I was thinking about an alternative solution that do not require renaming:
The index would not have a strict folder location such as <db_name>/index, instead it could be placed either in this folder (backwards compatibility) or a folder suffixed with an increasing number: <db_name>/index_0000000x.
When opening a DB a proper folder for the index is selected (the highest number suffix with extra check if the compaction finished).
When a compaction is started - a new folder with a prefix higher than the previous one should be created, once the compacted index is stored there, the new one should be used and the old folder removed. That would require putting some guard file into the folder during index compaction to ensure it is skipped if the compaction did not finalize yet.
By doing so, a rename operation wont be needed.
What would you like to be added or enhanced
If an immudb database is using S3 storage, it is now impossible to perform a index compaction:
I'd like to have some form of compaction for those databases too.
Why is this needed
Index on immudb grows big; normally one should periodically compact them using
immuadmin database clean
, but this is not the case with S3 storage.In many cases, index is growing faster than actual data, so without index compaction, the actual local
disk occupied by immudb databases is bigger when using S3
Additional context
As an example, a database on local storage containing 1 million log lines occupies on disk 298MB after compaction.
Same database, using S3, is taking 998 MB of disk space, most of them for the indexes.
The text was updated successfully, but these errors were encountered: