-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is a bug with the new version of boto used by s3fs that prevents writes to non-AWS S3 buckets #1546
Comments
Thanks for raising awareness here @ryanovas. It looks like the lock is at 1.36.1: Lines 406 to 415 in c84dd8d
This is interesting since we test against |
In my case I was using a digital ocean spaces bucket and lakekeeper then tried to follow the getting started guide from the docs with the taxi data. When I got to the append part I started getting the errors. |
This occurs for me as well against an OCI Object Storage using S3 compat api. With Boto3 request I can set the config to the necessary config to disable the checksum validation they added. boto/boto3#4398 (comment)
|
The issue on botocore was closed because there are apparently params you can pass to turn off the new signing stuff. But I'm not sure how we can apply those flags here given botocore is so many layers deep. Perhaps we just need to wait until S3-compatible buckets update their headers to match? That doesn't seem ideal though. We can't stay on 1.35 forever |
we pass configs down to s3fs, i think it would be useful to allow passing arbitrary configs to s3fa so that we dont need to explicit define all of s3fs S3FileSystem args @ryanovas do you know if that option is available in s3fs yet? i couldnt find a match |
I think that would be the move as well, let those of us who are on s3-compatible pass in anything extra we need. Sadly I don't know much about s3fs. A quick search of their issues brings up this: fsspec/s3fs#855 but I don't think that's the same error (or that just turning off SSL like they suggest is ever a good idea). It might be necessary to open an issue with them if the option isn't present in their docs. I can try to help out with this more next week. |
It is discussed briefly here fsspec/s3fs#931 though it might be best to track as a separate issue if necessary. It looks from the doc that it should be passed into |
I'm open to locking it to |
IMO locking the version seems like an ok temp fix, but the better fix is likely to figure out how to get the extra params down to botocore. I commented in the s3fs issue with a link to the issue in botocore that mentions the configs to disable this on the botocore side. Edit: According to the s3fs people there is an env var we can try, and if it fixes the issue they will figure out a way to expose it programmatically - then on the pyiceberg side it should be possible to resolve this without locking to an issue. I will attempt to try this in the next day or 2 unless someone beats me to it. |
Just an update, setting the environment variable as such fixes the issue: I updated the s3fs issue as well. I guess now we wait for next steps. This will fix it for people in the meantime however without having to downgrade. |
In the s3fs issue it looks like there's a config we can pass? I believe it was suggested already by @maarek but confirmed in this comment on the issue: fsspec/s3fs#931 (comment) |
Thanks for keeping track of this @ryanovas. I think we can do an if-else based on the The relevant code can be found here: iceberg-python/pyiceberg/io/fsspec.py Line 157 in dfbee4b
|
I might suggest we allow for a way to set a flag maybe when we init a catalog? or something similar as the issue is only present for s3-compatible storage and AWS peeps probably want to keep the extra signing headers. Also, it's botocore that has the version incompatibility which is a couple layers deep (s3fs -> aibotocore -> botocore) so it feels a little fragile imo to tack it to the version of botocore. If you agree, and can point me to where we could add this extra flag as well, I could try to make a PR to address this. Also if there's a contributing guide anywhere. |
Apache Iceberg version
0.8.1 (latest release)
Please describe the bug 🐞
Here is a link to the relevant boto issue: boto/boto3#4398
Attempting to use
table.append
ortable.overwrite
when using the 1.36.x version installed by default with s3fs causes a very confusing errorbotocore.exceptions.ClientError: An error occurred (MissingContentLength) when calling the PutObject operation: None
The workaround is to downgrade back to botocore 1.35.99 manually.
I'm unsure if there's work on the pyiceberg side to resolve this but I am posting this issue for others like me googling this issue and struggling to find answers.
I might also suggest pyiceberg consider a library that doesn't rely on AWS controlled boto and uses something more like s2cmd that is more S3-agnostic for those of us using things like Digital Ocean, minio, backblaze, etc.
The text was updated successfully, but these errors were encountered: