Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide content length for the put method #78

Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion client.go
Original file line number Diff line number Diff line change
Expand Up @@ -435,10 +435,14 @@ func (c *Client) WriteStream(path string, stream io.Reader, _ os.FileMode) (err
return err
}
} else {
contentLength, err = io.Copy(io.Discard, stream)
buffer := bytes.NewBuffer(make([]byte, 0, 1024 * 1024 /* 1MB */))

contentLength, err = io.Copy(buffer, stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sucks for large files.

Copy link
Contributor Author

@murasakiakari murasakiakari Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reporting, I will have a look first and provide a patch asap

Copy link
Contributor Author

@murasakiakari murasakiakari Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chripo I see that the issue is related to bytes.growSlice allocate many memory during copying.
However, I can only think three other way due to the limitation of the Reader interface

  1. perform manual GC after each time of bytes.growSlice, but it needs a custom copy function and I think it is not a good practice
  2. copy to disk space first for calculation, but io is more expensive in this way
  3. make a new function that allow user to provide content length, this is most effortless but more changes in needed in user code base

Also, there is an additional way which is the server implementation need to handle the content length correctly without relying the client, but I think we need provide the value correctly if we send it (although giving 0 is conventional in go default http client or even in other http client implementation in other language)

May I know your opinion on how to handle this problem. Thx a lot :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pattern commonly used in Go is to dynamically check whether io.Reader can also provide length by trying to cast to:

  Len() int
}

(possibly also with other method signatures (Length() int64, Size() int) and so on)

This will support bytes.Buffer and other types that are buffers with Len(). Any other reader can be quickly wrapped in a simple structure that implements Read() and Len() only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a terrible idea to buffer the whole stream in memory.

I see that the issue is related to bytes.growSlice allocate many memory during copying.

That's not all, what if the whole thing does not even fit into memory?

  1. perform manual GC after each time of bytes.growSlice, but it needs a custom copy function and I think it is not a good practice

please no


I think this library should not try too hard to determine a content length. The implementation via io.Seeker is fine. Maybe another check if the stream is a *bytes.Buffer and call its Len() method. This should cover most cases, including *os.File.
Anything else like guessing a Len() int, Length() int or Size() int via anonymous interface, which might not contain the correct value we want for content length (like https://pkg.go.dev/bufio#Reader.Size returns the size of the buffer, not the containing reader) or buffering the whole stream in memory would come unexpected for me as a library user.

In cases where we cannot determine the size of the stream easily, we should just set the contentLength to -1, meaning "unknown size" or keep it at 0.

  1. copy to disk space first for calculation, but io is more expensive in this way

If a library user really needs the content length to be set, the buffering to disk should be a deliberate decision and done beforehand. WriteStream can then use a *os.File as reader.

  1. make a new function that allow user to provide content length, this is most effortless but more changes in needed in user code base

This seems reasonable but would also make the library user responsible for providing the correct value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you everyone for the great idea, I would like provide a fix on getting the content length with the buffer like the one done in http.NewRequestWithContext, the current seeker method and remain 0 if it cannot be determined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice collaboration!

if err != nil {
return err
}

stream = buffer
}

s, err := c.put(path, stream, contentLength)
Expand Down