You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unable to use COPY TO to write Parquet file to S3 if IAM only enabled on bucket/account - workaround/fix included - allows for local parquet file writes
#460
Open
2 tasks done
sysadminmike opened this issue
Nov 28, 2024
· 5 comments
This could be used to support any format duckdb COPY TO supports instead of postgres as this is just passed down to duckdb just by adding "local://" at the start of the file location eg:
COPY (SELECT * FROM mytable) TO 'local://tmp/test.parquet' (FORMAT 'parquet')
Setup an s3 bucket with IAM policy only and try to do a COPY TO it
OS:
Linux
pg_duckdb Version (if built from source use commit hash):
na
Postgres Version (if built from source use commit hash):
17
Hardware:
No response
Full Name:
Michael Wolman
Affiliation:
Not applicable
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a source build
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?
Yes, I have
The text was updated successfully, but these errors were encountered:
sysadminmike
changed the title
Unable to use COPY TO to write Parquet file to S3 if IAM only enabled on bucket/account - workaround/fix included
Unable to use COPY TO to write Parquet file to S3 if IAM only enabled on bucket/account - workaround/fix included - allows for local parquet file writes
Nov 28, 2024
The local://gives you the option to use duckdb engine or postgres engine for the COPY TO for json/cvs or other formats they either or both support (and may have different option?). Perhaps easier to extend for other file type in the future compared to just .parquet and allow for any file name
Maybe better to replace local:// with localduck:// perhaps so you know your invoking the duckdb engine to handle it.
we use parquet files with the extension of .pq so if we get the local:// or localduck:// added that is much better for our generalised use cases too (it gives us more options to just use engine for other formats)
That is a good option to allow user to switch between duckdb and postgres to do the copy.
The issue preventing access to an S3 bucket with just IAM enabled still exists - I have found using s3-mount very good especially with local cache enabled for speeding up queries to parquet files already accessed but it may not be an option for everyone.
What happens?
If the buckets policy only allows IAM enabled access it is not possible to use COPY TO s3://... to write out a parquet file.
The below diff make it possible to write to a local filesystem (or s3 and IAM working with something like https://github.com/awslabs/mountpoint-s3)
This could be used to support any format duckdb COPY TO supports instead of postgres as this is just passed down to duckdb just by adding "local://" at the start of the file location eg:
COPY (SELECT * FROM mytable) TO 'local://tmp/test.parquet' (FORMAT 'parquet')
The diff:
To Reproduce
Setup an s3 bucket with IAM policy only and try to do a COPY TO it
OS:
Linux
pg_duckdb Version (if built from source use commit hash):
na
Postgres Version (if built from source use commit hash):
17
Hardware:
No response
Full Name:
Michael Wolman
Affiliation:
Not applicable
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a source build
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: