-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry logic to image layer fetching and decompression #291
Conversation
} | ||
|
||
warn!("Retrying layer image download..."); | ||
continue; // Retry fetching the layer image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but would it make sense to sleep here for a bit? Presumably we will run against the deadline that containerd has, so cannot sleep for too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. It would make sense to sleep a bit if it's truely a network issue. I think the specific timeout is managed by the client (k8s) instead of containerd itself. Given that I set sleep time to be 500ms for now.
file.rewind().context("failed to rewind the file handle")?; | ||
tarindex::append_index(&mut file).context("failed to append tar index")?; | ||
// Process the layer | ||
let process_result = tokio::task::spawn_blocking({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the download itself be part of the spawn_blocking block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved download itself to be part of the new function to be run inside the while loop.
if let Err(e) = std::io::copy(&mut gz_decoder, &mut file) { | ||
let copy_error = format!("failed to copy payload from gz decoder {:?}", e); | ||
error!("{}", copy_error); | ||
return Err(anyhow::anyhow!(copy_error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the error we hit; should we trigger retry with a new download here as well? Or what are we doing to resolve it?
failed to extract image layer: failed to copy payload from gz decoder Error { kind: UnexpectedEof, message: \"failed to fill whole buffer\" }: unknown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, failing here will trigger a new download.
Merge Checklist
upstream/missing
label (orupstream/not-needed
) has been set on the PR.Summary
Test Methodology