Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide list of DuckDB extensions used by stac-rs duckdb crate #655

Open
ceholden opened this issue Feb 25, 2025 · 0 comments
Open

Provide list of DuckDB extensions used by stac-rs duckdb crate #655

ceholden opened this issue Feb 25, 2025 · 0 comments

Comments

@ceholden
Copy link

As part of using the DuckDB related crate in this project the user needs to have some ~300MB of DuckDB extensions. The code here runs an INSTALL <extension> as part of client setup, which is great because it means that we're ensured that if the client can initialize then it should be able to work properly without the user having to know about these dependencies. If the user doesn't already have the right extensions in their DuckDB extensions directory (home folder, usually) DuckDB will fetch them from a remote server.

To help with anyone working on packaging this code into containers/Lambdas/etc for production use cases, it would be great to have a way to tell which DuckDB extensions are required for each version of stacrs. This can help because we want to ship applications that have all of their dependencies encapsulated with them, ensuring they'll continue running (if upstream sources have an outage or disappear, if outbound network requests become filtered, etc) and without adding unnecessary traffic to these remote sources.

The list for stacrs==0.5.5 (the Python wrapper) was,

bash-5.2# du -hs ~/.duckdb/extensions/v1.1.1/linux_amd64_gcc4/
297M    /root/.duckdb/extensions/v1.1.1/linux_amd64_gcc4/
bash-5.2# ls -lh ~/.duckdb/extensions/v1.1.1/linux_amd64_gcc4/
total 297M
-rw-r--r-- 1 root root  65M Feb 25 04:21 aws.duckdb_extension
-rw-r--r-- 1 root root  164 Feb 25 04:21 aws.duckdb_extension.info
-rw-r--r-- 1 root root  65M Feb 25 04:21 httpfs.duckdb_extension
-rw-r--r-- 1 root root  166 Feb 25 04:21 httpfs.duckdb_extension.info
-rw-r--r-- 1 root root  68M Feb 25 04:21 icu.duckdb_extension
-rw-r--r-- 1 root root  163 Feb 25 04:21 icu.duckdb_extension.info
-rw-r--r-- 1 root root 100M Feb 25 04:21 spatial.duckdb_extension
-rw-r--r-- 1 root root  168 Feb 25 04:21 spatial.duckdb_extension.info

It might be worth noting that DuckDB extensions are specific to the version of DuckDB (as well as platform/architecture), so any updates to the version of DuckDB requires reinstalling. This means that if stacrs bumps the version of DuckDB it compiles against, a user will have to ensure they're installing the correct versions. Since this DuckDB version isn't listed (or listable) via the Python package, it seems relatively difficult to have to manage this unless there is a nice way of installing the extensions stacrs needs for the version of DuckDB that stacrs is using (e.g., as we can currently do by initializing the Client at container build time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant