Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop BoltDB from write-cache #3091

Merged
merged 2 commits into from
Jan 30, 2025
Merged

Conversation

End-rey
Copy link
Contributor

@End-rey End-rey commented Jan 27, 2025

Closes #3076.

Do I understand correctly that the workers were only for parallel processing from the BoltDB?

Copy link

codecov bot commented Jan 27, 2025

Codecov Report

Attention: Patch coverage is 33.61345% with 79 lines in your changes missing coverage. Please review.

Project coverage is 22.23%. Comparing base (25314f5) to head (3364ab5).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
pkg/local_object_storage/writecache/writecache.go 40.25% 45 Missing and 1 partial ⚠️
cmd/neofs-lens/internal/writecache/get.go 0.00% 9 Missing ⚠️
cmd/neofs-lens/internal/writecache/root.go 0.00% 6 Missing ⚠️
pkg/local_object_storage/writecache/flush.go 33.33% 6 Missing ⚠️
cmd/neofs-lens/internal/writecache/list.go 0.00% 4 Missing ⚠️
pkg/local_object_storage/writecache/storage.go 0.00% 4 Missing ⚠️
cmd/neofs-lens/internal/printers.go 0.00% 1 Missing ⚠️
pkg/local_object_storage/writecache/delete.go 0.00% 1 Missing ⚠️
pkg/local_object_storage/writecache/status.go 0.00% 1 Missing ⚠️
pkg/services/control/server/object_status.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3091      +/-   ##
==========================================
- Coverage   22.40%   22.23%   -0.18%     
==========================================
  Files         763      762       -1     
  Lines       58632    58276     -356     
==========================================
- Hits        13136    12956     -180     
+ Misses      44603    44436     -167     
+ Partials      893      884       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@End-rey End-rey force-pushed the 3076-drop-boltdb-from-write-cache branch from 22d9414 to 9957225 Compare January 28, 2025 12:24
pkg/local_object_storage/writecache/flush.go Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
cmd/neofs-lens/internal/writecache/get.go Outdated Show resolved Hide resolved
pkg/local_object_storage/shard/shutdown_test.go Outdated Show resolved Hide resolved
pkg/local_object_storage/writecache/flush.go Show resolved Hide resolved
pkg/local_object_storage/writecache/writecache.go Outdated Show resolved Hide resolved
pkg/local_object_storage/writecache/writecache.go Outdated Show resolved Hide resolved
pkg/local_object_storage/writecache/writecache.go Outdated Show resolved Hide resolved
pkg/local_object_storage/writecache/writecache.go Outdated Show resolved Hide resolved
pkg/local_object_storage/writecache/writecache.go Outdated Show resolved Hide resolved
@End-rey End-rey force-pushed the 3076-drop-boltdb-from-write-cache branch from 9957225 to b9784dd Compare January 28, 2025 16:48
@End-rey
Copy link
Contributor Author

End-rey commented Jan 28, 2025

An error is currently being returned after migration. Can you check which errors may appear, specifically I am concerned when there are no default bucket or other errors inside the database View.

pkg/local_object_storage/writecache/writecache.go Outdated Show resolved Hide resolved
// defaultFlushInterval is default time interval between successive flushes.
defaultFlushInterval = time.Second
defaultFlushInterval = 10 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be reflected in documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did't find where in the documentation it is specified about the flush interval for the write-cache. It's not set up anywhere.

pkg/local_object_storage/shard/shutdown_test.go Outdated Show resolved Hide resolved
pkg/local_object_storage/writecache/writecache.go Outdated Show resolved Hide resolved
Copy link
Member

@carpawell carpawell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the workers were only for parallel processing from the BoltDB?

TBH, I do not understand what is the key difference b/w db and FSTree if we are talking about flushing. I think it is possible to speed up flushing if there are 2+ workers. We do spend some time on object reading from WC cache and its unmarshaling. @roman-khimov

Also, we store raw objects on FS now, I think closing this issue should be done with more precise WC size handling:

func (c *cache) estimateCacheSize() uint64 {
return c.objCounters.DB()*c.smallObjectSize + c.objCounters.FS()*c.maxObjectSize
}

@roman-khimov

storagelog.Write(c.log,
storagelog.AddressField(saddr),
storagelog.StorageTypeField(wcStorageType),
storagelog.OpField("db DELETE"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also can drop "fstree" in every WC operation log since it is now clear that it was dropped from FSTREE

// Write-cache has 2 components:
// 1. Key-value (bbolt) database for storing small objects.
// 2. Filesystem tree for storing big objects.
// Write-cache uses filesystem tree for storing objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"file system" (separate)?

c.modeMtx.Unlock()

// Migration part
if !readOnly {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we prevent ! and just invert branches?

return nil
}

func (c *cache) migrate() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have some simple migrate test?

CHANGELOG.md Outdated

### Updated

### Updating from v0.44.2
`small_object_size`, `workers_number`, `max_batch_size` and `max_batch_delay`
paramteters are removed from `writecache` config. These parameters are related
to the BoltDB part of the write-cache, which is dropped from the code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add, please, a few words that migration will be performed automatically

db, err := writecache.OpenDB(vPath, true)
// openWC opens and returns read-only writecache.Cache located in vPath.
func openWC() (writecache.Cache, error) {
ws := writecache.New(writecache.WithPath(vPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ws := writecache.New(writecache.WithPath(vPath))
wc := writecache.New(writecache.WithPath(vPath))

}
}
}

// WithNoSync sets an option to allow returning to caller on PUT before write is persisted.
// Note, that we use this flag for FSTree only and DO NOT use it for a bolt DB because
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be dropped

@roman-khimov
Copy link
Member

Estimations and workers are separate things to be done in separate issues, please open them if they're not here yet. Eventually I'd like wc to keep it's object list in memory (with sizes) and yes, it needs better flushing mechanism. But this doesn't change anything anything here, Bolt is to be removed and it is removed.

Although there is `ignoreErrors` flag, but when `flushObjects` occurs in
FSTree, the error should be returned as with Bolt.

Signed-off-by: Andrey Butusov <[email protected]>
Previously, the write-cache has 2 components: bolt database and fstree. BoltDB
is absolutely useless for write caching given that the cache uses SSD, so it was
decided to drop this database.

Drop all code related to BoltDB. Migrate from this database to the
main storage, by flushing objects. Remove config parameters for the write-cache:
`small_object_size`, `workers_number`, `max_batch_delay`, `max_batch_size`.
Update docs. Rewrite lens commands for the current version of the write-cache.

Closes #3076.

Signed-off-by: Andrey Butusov <[email protected]>
@carpawell
Copy link
Member

Estimations and workers are separate things to be done in separate issues, please open them if they're not here yet.

Do not agree. It used to be parallel, it is not now, it is a degradation. It used to have incorrect estimations only because of the existing bbolt base, now we do not have it but still, our expectation about WC size is numOfObjs*theBiggestPossibleObj. I see no better place for these fixes except this PR.

Opened: #3100, #3101.

@roman-khimov
Copy link
Member

It used to be parallel

Bolt-only. FSTree is all the same.

@roman-khimov roman-khimov merged commit 36f1d3e into master Jan 30, 2025
22 checks passed
@roman-khimov roman-khimov deleted the 3076-drop-boltdb-from-write-cache branch January 30, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop BoltDB from write cache
3 participants