-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Still return continuous WAL entries when running into ErrSliceOutOfRange #19095
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Benjamin Wang <[email protected]>
Confirmed that this PR can fix the error in #19038 (comment). @siyuanfoundation please let me know if you can still reproduce it in your environment. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
... and 24 files with indirect coverage changes @@ Coverage Diff @@
## main #19095 +/- ##
==========================================
- Coverage 68.77% 68.71% -0.06%
==========================================
Files 420 420
Lines 35642 35642
==========================================
- Hits 24513 24492 -21
- Misses 9703 9719 +16
- Partials 1426 1431 +5 Continue to review full report in Codecov by Sentry.
|
Thanks for the confirmation. Can we get this merged firstly? PTAL cc @serathius |
After syncing my repo, I just found the robustness test still fails even with this fix. Because
|
@siyuanfoundation how often did you see this error? Or in other words, is it easy to reproduce this error? If I understood it correctly, the robustness test error means that the last client write which already got successful response, but it wasn't persisted in WAL file. Please let me know if I misunderstood it. Each time when we see an issue, the first thing is to figure out whether it's a real issue from end user perspective. can you manually double check whether the last successful client write was persisted in the WAL files of majorities members, and also the bbolt db? Also I see that robustness test might not process the WAL records correctly, the longest one might not be he correct one. As long as the WAL records were not committed yet, they may be overwritten by following WAL records. etcd/tests/robustness/report/wal.go Lines 78 to 79 in fce823a
I regard it as a test issue for now, please raise a separate issue to track it. Thanks. |
You are right that normally longest WAL is not necessarily include the longest commit sequence, however in robustness test we explicitly make a single additional transaction after the test is finished, this should ensure that there are no any other uncommitted transactions. We require the transaction to succeed and later use it to assert that WAL is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good, however I haven't validated how it works with repair.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Please read #19038 (comment) and #19038 (comment)
cc @serathius @siyuanfoundation
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.