-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch starvation can cause OOMs #16839
Comments
One clarification, access to prevKey in etcd/server/storage/mvcc/kvstore_txn.go Line 203 in a4f507c
This means that prevKey is not necessarily accessed in apply loop (only if put has
The tradeoff is not clear, we need a proper scalability testing to verify performance in different scenarios. |
xref. #17529 (comment). The It just reminds me of #12896 which changes to |
The bigger issue about Hence it is not efficient IMHO. |
Indeed there are two issues:
|
/cc |
Deleted the comment above because is was totally inaccurate. tldr; Issue is not related to prevKV, but to overall watch efficiency during watch starvation. We need to implement #17563 to fix the etcd memory issue. The main reason that enabling prevKV had such impact is fact that it doubles the event size, meaning it puts the twice as much constrain on the response stream. In my testing I was close to the response throughput limit (~1.6GB/s) even without prevKV, so enabling it caused the memory leak. I repeated the tests where I compared watch streams based on the response throughput (measured using In the repeated tests I have found that prevKV is totally not at fault, the issues is caused by inefficient behavior of
The table above shows 2 sets of tests for response through below the limit 1GB/s, and one above the limit 1.6GB/s. I adjusted the object size between enabled and disabled prevKV to get the similar response throughput. Conclusions: Same as before, changes by @chaochn47 will not address the issue. No need to improve prevKV (at least for now), to address the issue of etcd memory bloating we need to improve the reuse of event history like in #17563 |
Have one more idea, reduce the frequency of |
Amazing just reducing the syncWatchers frequency improves watch latency and memory usage.
Looks like the improvements peek around 1s, improving the watch response throughput by 4%, reducing memory allocations by 4-5 times (base memory for puts is 850MB) and reducing the latency 30-45%. This already looks like great improvement. A proper implementation of reusing events will be complicated, making backport questionable, thus I propose to a simple backportable solution to reduce impact of slow watchers. Proposal: Reduce the syncLoop period from 0.1s to 1s Want to confirm how the original value was proposed. Note; there is no risk of syncLoop not catching up with unsynched watchers as it also includes shortcut if it cannot keep up etcd/server/storage/mvcc/watchable_store.go Lines 213 to 245 in ddf5471
|
Validated that 1s performs better than 0.1s for different number of streams:
Tested all sensible number of watchers from minimal (minimal value that below max write size <1MB limit) up to 10`000, which is 20 times more than limit of watchers per sync. Adjusted object size to ensure that watch stream is congested. |
synchWatchers period was always set to 100ms from first implementation of watch #3507 Didn't find any discussion pointing to why 100ms was picked. This makes me feel convinced that we can change it to improve handling of watch stream starvation. |
Thanks for the performance test results.
I think it's more prudent to introduce a config item instead of hard coding the sync loop interval/frequency. Since the history will never change (of course it can be compacted), so the idea of reusing events (#17563) should be OK. We should also consider to limit the max data/events to read. Currently it always read all the data/events in range [minRev, curRev]. Obviously it may consume huge memory if there are huge number of events in the range. etcd/server/storage/mvcc/watchable_store.go Line 363 in d639abe
|
Reducing syncLoop causes issue with resynchronization of new watch requests opened on concrete revision. Instead I'm proposing the two improvements: |
What would you like to be added?
Please read #16839 (comment) for up to date information about the problem
The text was updated successfully, but these errors were encountered: