-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide content length for the put method #78
Provide content length for the put method #78
Conversation
Signed-off-by: Ben Tam <[email protected]>
Thanks! |
Signed-off-by: Ben Tam <[email protected]>
Thanks for your message. |
Signed-off-by: Ben Tam <[email protected]>
Unit test is added, thank you for your reminding |
If this could be merged and released, I'd be very happy. It's currently a blocker for me using Kopia. Thank you 🙏 |
As Kopia maintainer I would also love to see this merged. |
hi all.
my fault, i missed some notifications.
going to review and merge it next week.
thanks for the ping.
…On December 13, 2024 5:06:22 AM UTC, Jarek Kowalski ***@***.***> wrote:
As Kopia maintainer I would also love to see this merged.
|
thanks for your contribution. |
contentLength, err = io.Copy(io.Discard, stream) | ||
buffer := bytes.NewBuffer(make([]byte, 0, 1024 * 1024 /* 1MB */)) | ||
|
||
contentLength, err = io.Copy(buffer, stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sucks for large files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your reporting, I will have a look first and provide a patch asap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chripo I see that the issue is related to bytes.growSlice allocate many memory during copying.
However, I can only think three other way due to the limitation of the Reader interface
- perform manual GC after each time of bytes.growSlice, but it needs a custom copy function and I think it is not a good practice
- copy to disk space first for calculation, but io is more expensive in this way
- make a new function that allow user to provide content length, this is most effortless but more changes in needed in user code base
Also, there is an additional way which is the server implementation need to handle the content length correctly without relying the client, but I think we need provide the value correctly if we send it (although giving 0 is conventional in go default http client or even in other http client implementation in other language)
May I know your opinion on how to handle this problem. Thx a lot :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another pattern commonly used in Go is to dynamically check whether io.Reader can also provide length by trying to cast to:
Len() int
}
(possibly also with other method signatures (Length() int64
, Size() int
) and so on)
This will support bytes.Buffer and other types that are buffers with Len(). Any other reader can be quickly wrapped in a simple structure that implements Read() and Len() only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a terrible idea to buffer the whole stream in memory.
I see that the issue is related to bytes.growSlice allocate many memory during copying.
That's not all, what if the whole thing does not even fit into memory?
- perform manual GC after each time of bytes.growSlice, but it needs a custom copy function and I think it is not a good practice
please no
I think this library should not try too hard to determine a content length. The implementation via io.Seeker
is fine. Maybe another check if the stream is a *bytes.Buffer
and call its Len()
method. This should cover most cases, including *os.File
.
Anything else like guessing a Len() int
, Length() int
or Size() int
via anonymous interface, which might not contain the correct value we want for content length (like https://pkg.go.dev/bufio#Reader.Size returns the size of the buffer, not the containing reader) or buffering the whole stream in memory would come unexpected for me as a library user.
In cases where we cannot determine the size of the stream easily, we should just set the contentLength to -1
, meaning "unknown size" or keep it at 0
.
- copy to disk space first for calculation, but io is more expensive in this way
If a library user really needs the content length to be set, the buffering to disk should be a deliberate decision and done beforehand. WriteStream
can then use a *os.File
as reader.
- make a new function that allow user to provide content length, this is most effortless but more changes in needed in user code base
This seems reasonable but would also make the library user responsible for providing the correct value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you everyone for the great idea, I would like provide a fix on getting the content length with the buffer like the one done in http.NewRequestWithContext, the current seeker method and remain 0 if it cannot be determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice collaboration!
Since there is some webdav server implementation (such as nextcloud) give unexpected behavior when there is no content length is set in the put request. Therefore, I introduce the contentLength parameter to the
*Client.put()
method to provide the correct content length of the body to the server to prevent the behavior mention above.