Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still return continuous WAL entries when running into ErrSliceOutOfRange #19095

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Dec 21, 2024

@ahrtr
Copy link
Member Author

ahrtr commented Dec 21, 2024

Confirmed that this PR can fix the error in #19038 (comment). @siyuanfoundation please let me know if you can still reproduce it in your environment.

Copy link

codecov bot commented Dec 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.71%. Comparing base (40b856e) to head (152de1f).

Additional details and impacted files
Files with missing lines Coverage Δ
server/storage/wal/wal.go 57.88% <100.00%> (ø)

... and 24 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19095      +/-   ##
==========================================
- Coverage   68.77%   68.71%   -0.06%     
==========================================
  Files         420      420              
  Lines       35642    35642              
==========================================
- Hits        24513    24492      -21     
- Misses       9703     9719      +16     
- Partials     1426     1431       +5     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40b856e...152de1f. Read the comment docs.

@siyuanfoundation
Copy link
Contributor

I can confirm this fixes the failure in #19038. Thank you @ahrtr !

@ahrtr
Copy link
Member Author

ahrtr commented Dec 24, 2024

I can confirm this fixes the failure in #19038. Thank you @ahrtr !

Thanks for the confirmation.

Can we get this merged firstly? PTAL cc @serathius

@ahrtr
Copy link
Member Author

ahrtr commented Dec 24, 2024

cc @fuweid @ivanvc @jmhbnz

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Jan 3, 2025

After syncing my repo, I just found the robustness test still fails even with this fix. Because validatePersistedRequestMatchClientRequests requires the lastOp to be persisted, partial WAL entries would not work for this check.
I got the error of:

last succesful client write {"Type":"txn","LeaseGrant":null,"LeaseRevoke":null,"Range":null,"Txn":{"Conditions":null,"OperationsOnSuccess":[{"Type":"put-operation","Range":{"Start":"","End":"","Limit":0},"Put":{"Key":"tombstone","Value":{"Value":"true","Hash":0},"LeaseID":0},"Delete":{"Key":""}}],"OperationsOnFailure":null},"Defragment":null,"Compact":null} was not persisted, required to validate

@ahrtr
Copy link
Member Author

ahrtr commented Jan 4, 2025

After syncing my repo, I just found the robustness test still fails even with this fix. Because validatePersistedRequestMatchClientRequests requires the lastOp to be persisted, partial WAL entries would not work for this check. I got the error of:

last succesful client write {"Type":"txn","LeaseGrant":null,"LeaseRevoke":null,"Range":null,"Txn":{"Conditions":null,"OperationsOnSuccess":[{"Type":"put-operation","Range":{"Start":"","End":"","Limit":0},"Put":{"Key":"tombstone","Value":{"Value":"true","Hash":0},"LeaseID":0},"Delete":{"Key":""}}],"OperationsOnFailure":null},"Defragment":null,"Compact":null} was not persisted, required to validate

@siyuanfoundation how often did you see this error? Or in other words, is it easy to reproduce this error?

If I understood it correctly, the robustness test error means that the last client write which already got successful response, but it wasn't persisted in WAL file. Please let me know if I misunderstood it.

Each time when we see an issue, the first thing is to figure out whether it's a real issue from end user perspective. can you manually double check whether the last successful client write was persisted in the WAL files of majorities members, and also the bbolt db?

Also I see that robustness test might not process the WAL records correctly, the longest one might not be he correct one. As long as the WAL records were not committed yet, they may be overwritten by following WAL records.

if len(memberRequests) > len(persistedRequests) {
persistedRequests = memberRequests

I regard it as a test issue for now, please raise a separate issue to track it. Thanks.

@serathius
Copy link
Member

Also I see that robustness test might not process the WAL records correctly, the longest one might not be he correct one. As long as the WAL records were not committed yet, they may be overwritten by following WAL records.

You are right that normally longest WAL is not necessarily include the longest commit sequence, however in robustness test we explicitly make a single additional transaction after the test is finished, this should ensure that there are no any other uncommitted transactions. We require the transaction to succeed and later use it to assert that WAL is complete.

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good, however I haven't validated how it works with repair.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants