Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected S3 behavior: Issues with local S3 server #318

Open
codybum opened this issue Feb 13, 2024 · 7 comments
Open

Expected S3 behavior: Issues with local S3 server #318

codybum opened this issue Feb 13, 2024 · 7 comments

Comments

@codybum
Copy link

codybum commented Feb 13, 2024

We are experimenting with the use of a locally deployed java-based (https://github.com/mindmill/ladon-s3-server) S3 server with DSA. We have made numerous updates to the base code, which now supports most boto3 (https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) and CyberDuck (https://cyberduck.io/) commands. We are testing with a single 800M Aperio SVS file "test.svs".

We are able to successfully add the local S3 server as an Assetstore without issue. Girder PUT and DELETE test complete successfully without error. Import of the test file is slow, but it completes successfully according to DSA. We are able to access the test file in a Collection and view the slide in HistomicsUI. While things mostly work, tile updates are very slow and sporadic.

On the S3 side there are no errors until slide import. In network captures we can see ~50 HTTP request to the S3 server during the single file import process. We see DSA request the list of objects in the bucket, then a complete object request for test.svs. The S3 server starts the transfer to DSA for the complete object request, we see the window size on the DSA size for the transfer +49k until ~10 packets, at which point it is reduced to 8192, and then in the next packet we see a TCP reset (RST,ACK) from DSA. A (RST,ACK) from DSA would typically indicate that the socket on the DSA side has been closed. We observe a write error on the S3 side that also throws an error saying the socket is already closed.

The packet capture can be seen here:

image

In subsequent DSA request the "Range: bytes=[start_byte-end_byte]" header is set, but as with the full object request after some number of packets we see the the TCP window size eventually decrease and we see a reset from the DSA side. The range/partial requests are not the same, the offset (start_byte) changes while the end_byte remains the same. All subsequent ranged request are toward the end of the test.svs file.

In attempts to debug this issue, I partially re-implemented the downloadFile function (https://girder.readthedocs.io/en/latest/_modules/girder/utility/s3_assetstore_adapter.html#S3AssetstoreAdapter.downloadFile) from Girder in my test code. I took the HTTP request from the import and re-ran the request, confirming that the MD5 hashes provided by the S3 server are identical to the local file. The transfers end with (FIN,ACK), as expected.

image

It is far more likely there is something wrong with the S3 implementation than with DSA, but I am having a hard time reproducing these issues outside of DSA.

A few questions to help me along:

  • Are the closed (without proper FIN) connections on the DSA side expected behavior or is there an exception of the DSA side that is pre-maturely killing the socket, if so, where do I find this in the logs?
  • Is it normal for DSA to make a full object request, then numerous (+45) partial request on file import?
  • Is there a way to set a proxy server for DSA for me to check the behavior against AWS S3? I tried env HTTP_PROXY and HTTPS_PROXY, but that didn't see to work.
@manthey
Copy link
Contributor

manthey commented Feb 13, 2024

DSA uses a variety of libraries to read images. Many of these libraries require file-like access to the images, so rather than fetching them directly from S3, we expose the data in girder in a FUSE file system and the image libraries read those files. Some of the image libraries (notably openslide) do a lot of open-seek-read-close on small fragments for files (especially on ndpi files) and these manifest as range requests.

Internally, the FUSE file system uses the python requests library to fetch from s3 with a get call with stream=True. I don't see any code where we explicitly end that call; we let the python garbage collector do whatever it does and trust the requests library to do the right thing. Perhaps this should be explicitly closed, but it isn't.

There is an option to enable diskcache on the mount (see https://github.com/DigitalSlideArchive/digital_slide_archive/blob/master/devops/dsa/docker-compose.yml#L26-L28); this will make 128kB range requests (I think) rather than really small range requests and cache the results so you end up with vastly fewer partial requests.

I would expect reading the first bytes of the file, even if the image library doesn't mean to read the whole thing, will appear as a full object request (since from a file system level it is an open without seek). This is dependent on the image library doing the reading as well as the file format.

I'm not sure how to proxy requests to one S3 server to another.

@manthey
Copy link
Contributor

manthey commented Feb 15, 2024 via email

@codybum
Copy link
Author

codybum commented Feb 16, 2024

@manthey I think I have finally poked around enough to see what is going on. I have a test suite that replicates in part how the Girder (file -> file model -> file handle -> abstract asset -> s3 asset -> s3 -> file) process. I posted an issue on the Girder repo in case it is an actual issue (girder/girder#3513).

We will make sure our S3 server can deal with the existing transfer methods and will modify our instance of Girder if needed. If you feel that the proposed issue is an issue, I would be happy to create a pull request and/or test for an alternative solution.

@codybum
Copy link
Author

codybum commented Feb 17, 2024

DSA uses a variety of libraries to read images. Many of these libraries require file-like access to the images, so rather than fetching them directly from S3, we expose the data in girder in a FUSE file system and the image libraries read those files. Some of the image libraries (notably openslide) do a lot of open-seek-read-close on small fragments for files (especially on ndpi files) and these manifest as range requests.

Internally, the FUSE file system uses the python requests library to fetch from s3 with a get call with stream=True. I don't see any code where we explicitly end that call; we let the python garbage collector do whatever it does and trust the requests library to do the right thing. Perhaps this should be explicitly closed, but it isn't.

There is an option to enable diskcache on the mount (see https://github.com/DigitalSlideArchive/digital_slide_archive/blob/master/devops/dsa/docker-compose.yml#L26-L28); this will make 128kB range requests (I think) rather than really small range requests and cache the results so you end up with vastly fewer partial requests.

I would expect reading the first bytes of the file, even if the image library doesn't mean to read the whole thing, will appear as a full object request (since from a file system level it is an open without seek). This is dependent on the image library doing the reading as well as the file format.

I'm not sure how to proxy requests to one S3 server to another.

Is this the correct way to enable caching: DSA_USER=$(id -u):$(id -g) DSA_GIRDER_MOUNT_OPTIONS="-o diskcache,diskcache_size_limit=2147483648" docker-compose up

I think I have caching enabled, I see new files and directories being created in: ~/.cache/girder-mount, but there is only 20M or so being cached out of many G being transfered.

It looks like the 128k you mentioned is the size of the cache chunks, but this does not seem to have any impact on the size of the S3 requests:
https://github.com/girder/girder/blob/89ab9976b1b085df279c9082b2df43ab7e24cd60/girder/cli/mount.py#L106

It is possible I don't have caching configured correctly.

@manthey
Copy link
Contributor

manthey commented Feb 27, 2024

Yes, the diskcache 128k is the granularity of the cache. I'd expect the requests to S3 to all start a byte multiples of 128k and request lengths that are either unbound or multiples of 128k. I'll set up a local/minio S3 mount to see if I can get the same results as you.

@codybum
Copy link
Author

codybum commented Apr 15, 2024

@manthey you were right about the 128k request. I could not initially see the request size because of the Girder issue reported here: girder/girder#3513

If we set the download with the endbyte, in this case offset + 128k, we see 128k ranged requests. If we don't set the endbyte on read, we see offset + file_size requests.

@codybum
Copy link
Author

codybum commented Apr 15, 2024

@manthey things are working, but are a bit slow, and I wonder if the number of concurrent request are constrained. Our test VM only has two vCPU, so I suspect this is an issue. Is there a setting for concurrent Girder/Fuse request that I might experiment with? The VM is not under load, but the number of request to S3 seems gated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants