Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image #53

Merged
merged 20 commits into from
Oct 16, 2024
Merged

Docker image #53

merged 20 commits into from
Oct 16, 2024

Conversation

wuputah
Copy link
Collaborator

@wuputah wuputah commented Jun 21, 2024

  • Usage follows https://hub.docker.com/_/postgres/
  • Images are pushed to pgduckdb/pgduckdb
  • Anytime an image is built, it's pushed to 17-$sha, 16-$sha, 15-$sha
  • Nightly images pushed to 17-main, 16-main, 15-main
  • Version tags (tags beginning with v) will be built automatically and pushed to 17-$tag, 16-$tag, 15-$tag
  • Only built on PR when the Docker-related files are changed

To test, grab a tag, for instance:

TAG=17-b203b63b052df8a7fece37f79face6b7522a498f
PORT=6543

Then:

docker run --name pgduckdb -e POSTGRES_PASSWORD=secret -p $PORT:5432 \
  pgduckdb/pgduckdb:$TAG -c shared_preload_libraries=pg_duckdb

psql postgres://postgres:[email protected]:$PORT/postgres

Cleanup:

docker stop pgduckdb
docker rm pgduckdb

This can be made easier with a docker-compose.yml, which will be forthcoming.

@wuputah wuputah requested a review from owenthereal June 21, 2024 01:40
@owenthereal
Copy link

owenthereal commented Jun 21, 2024

For context reference, the reason that we can't build hydra using pgxman as part of its CI is due to the intricate limitation of pgxman's publication mechanism. It necessitates the existence and publication of the buildkit file in a central repo pgxman/buildkit, which in turn refers to a tag for source download. This creates a circular dependency, where the tagging of an extension requires pgxman to build the buildkit artifacts, but the build artifacts require the tag to exist first. This can be worked around now by:

  1. Create a buildkit in this repo with source pointing to a local path (this is already working but undocumented). This would generate the Debian files to copy into a Docker image. Even better, add a command pgxman build --export DOCKER_IMAGE that does both.
  2. For publication, we duplicate the same buildkit and save it to pgxman/buildkit with source pointing to the source of a tag. Having two canonical copies of buildkit files is not ideal, but it's not the end of the world.

When we can self-publish extensions from individual repo in the future, the circular dependency will be broken, and this workaround will no longer be necessary.

@wuputah
Copy link
Collaborator Author

wuputah commented Jun 21, 2024

heya @owenthereal, sorry for the lack of context, I added you here in part because of pgxman but also because I'm a bit of a noob when it comes to Docker so I wanted you to check my work, if you had any suggestions for improvement. For instance I'm current creating a checker image to run the tests but I guess Docker determines that this isn't necessary to make the final image, so it skips this step. Of course I could just run the tests in builder instead.

I do think it would be possible to build with pgxman based on the docs here, as you suggest a buildkit yaml would be needed as well.
https://docs.pgxman.com/building_an_extension#test-the-extension

IMO this would just be for making "dev builds" as desired, though for local testing the duckdb build takes a long time and the ccache setup seems to not work super well, so it's not a great local developer tool.

@owenthereal
Copy link

For instance I'm current creating a checker image to run the tests but I guess Docker determines that this isn't necessary to make the final image, so it skips this step.

You could run a specific target with https://docs.docker.com/reference/cli/docker/image/build/#target, e.g. docker build ... -t checker

though for local testing the duckdb build takes a long time and the ccache setup seems to not work super well

Not saying we should replace this Docker build with pgxman build right now, but being able to specify cache dir would be a nice pgxman feature to add in the future.

@wuputah wuputah force-pushed the jd/docker-image branch 2 times, most recently from a811b49 to d9167ef Compare August 7, 2024 20:03
@wuputah wuputah removed the request for review from owenthereal September 5, 2024 15:56
@mike-luabase
Copy link

this is what ultimately worked for me:

FROM postgres:16-bookworm as base

###
### BUILDER
###
FROM base as builder

RUN --mount=type=cache,target=/var/cache/apt \
  apt-get update -qq && \
  apt-get install -y build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libxslt-dev \
  libssl-dev libxml2-utils xsltproc pkg-config libc++-dev libc++abi-dev libglib2.0-dev libtinfo5 cmake \
  libstdc++-12-dev postgresql-server-dev-16 liblz4-dev ccache git

WORKDIR /build

ENV PATH=/usr/lib/ccache:$PATH
ENV CCACHE_DIR=/ccache

# Clone the pg_duckdb repository and initialize submodules
RUN git clone --branch main https://github.com/duckdb/pg_duckdb.git . && \
    git submodule update --init --recursive

# permissions so we can run as `postgres` (uid=999,gid=999)
RUN chown -R postgres:postgres .
RUN chown -R postgres:postgres /usr/lib/postgresql /usr/share/postgresql
RUN mkdir /out && chown postgres:postgres /out
RUN rm -f .depend

USER postgres

# Build and install
RUN --mount=type=cache,target=/ccache/,uid=999,gid=999 make install
RUN --mount=type=cache,target=/ccache/,uid=999,gid=999 DESTDIR=/out make install

###
### CHECKER
###
FROM builder as checker

USER postgres
RUN --mount=type=cache,target=/ccache/,uid=999,gid=999 make installcheck

###
### OUTPUT
###
# This creates a usable postgres image but without the packages needed to build
FROM base as output
COPY --from=builder /out /

@jorinvo
Copy link

jorinvo commented Sep 19, 2024

I would love to install pg_duckdb in our Postgres Docker image. I tried the Dockerfile from above, but it was stuck at 100% at make install for 30 minutes.

@mike-luabase
Copy link

I would love to install pg_duckdb in our Postgres Docker image. I tried the Dockerfile from above, but it was stuck at 100% at make install for 30 minutes.

Took a very long time to run for me too. Might have been more than 30 minutes before it was complete.

@jorinvo
Copy link

jorinvo commented Sep 20, 2024

Thanks @mike-luabase, that's good to know.
I am afraid that's not usable for us for now. But I am looking forward to having a prebuilt image available some day. pg_duckdb is definitely an exciting project 🤩

@JelteF
Copy link
Collaborator

JelteF commented Sep 20, 2024

To make the build faster it would help a lot if you changed the make install commands to be parallel, by using e.g. make -j10 install. Or maybe make -j$(nproc) install

@wuputah wuputah force-pushed the jd/docker-image branch 3 times, most recently from c38f089 to 4419031 Compare September 27, 2024 18:37
@wuputah wuputah changed the base branch from main to jd/makefile September 27, 2024 18:39
@wuputah
Copy link
Collaborator Author

wuputah commented Sep 27, 2024

To make the build faster it would help a lot if you changed the make install commands to be parallel, by using e.g. make -j10 install. Or maybe make -j$(nproc) install

Even after #211, there remains an issue with -j being passed to the duckdb build that I haven't managed to solve (though I tried). You'll see this printed in the log when running make duckdb:

make[2]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.

This is despite the fact we are using $(MAKE) as suggested for this issue. This probably has something to do with DuckDB's use of cmake, but I don't know anything about cmake.

@wuputah wuputah force-pushed the jd/docker-image branch 10 times, most recently from 7d442d8 to 0801d3f Compare September 27, 2024 20:27
Copy link
Collaborator

@Y-- Y-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We will need a green build to merge though no?

*.cache-to=type=gha,mode=max
*.cache-from=type=gha
postgres.tags=pgduckdb/pgduckdb:${{ matrix.postgres }}-${{ github.sha }}
${{ !contains(github.ref_name, '/') && format('postgres.tags=pgduckdb/pgduckdb:{0}-{1}', matrix.postgres, github.ref_name) || '' }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this override the one above or add it? Also instead of skipping when branch name have a / could we replace with -? (seems arbitrary to just exclude these no?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds to the one above it, so we will push with the sha anytime it gets built, including PRs.

Mm yeah we could replace / with -, but this was sorta an accidental "feature" -- by doing it this way we only get tags and main pushed as "named tags" rather than PRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, might as well be explicit cause if I name my branch hello it would be pushed but not yl/hello :-)

Suggested change
${{ !contains(github.ref_name, '/') && format('postgres.tags=pgduckdb/pgduckdb:{0}-{1}', matrix.postgres, github.ref_name) || '' }}
${{ github.ref_name == "main" && format('postgres.tags=pgduckdb/pgduckdb:{0}-main', matrix.postgres) || '' }}

I'm fine either way!

Copy link
Collaborator Author

@wuputah wuputah Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work for tags though (v0.1.0).

The / will show up in the ref_name for any PR, even if you don't put a / in your branch name.

.github/workflows/docker.yaml Outdated Show resolved Hide resolved
Copy link
Collaborator

@Y-- Y-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We will need a green build to merge though no?

@wuputah
Copy link
Collaborator Author

wuputah commented Oct 16, 2024

Looks good. We will need a green build to merge though no?

Yeah, looks like the azure ubuntu mirror is broken atm.

This reverts commit 6063ac8.

This did not work because this is the initdb directory, which needs to
not exist at the time the container is started. We'll need to set the
config another way.
@wuputah
Copy link
Collaborator Author

wuputah commented Oct 16, 2024

I think a quick mention in the readme, contributing.md or PR description would be very helpful.

PR description updated.

I'd like to get this merged here then continue on in separate PRs with docker-compose, docs changes, etc. Especially since there won't be an image tagged with main until it's merged. 😁

@wuputah wuputah enabled auto-merge (squash) October 16, 2024 20:01
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. I left one comment, but that can also be done later. Feel free to merge as you see fit.


# A more selective copy might be nice, but the git submodules are not cooperative.
# Instead, use .dockerignore to not copy files here.
COPY . .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to automatically run CREATE EXTENSION pg_duckdb and configure shared_preload_libraries=pg_duckdb. That'll make it easier to run.

Using an initialization script should make automating that possible I think: https://github.com/docker-library/docs/blob/master/postgres/README.md#initialization-scripts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can handle that at the docker compose level

@wuputah wuputah merged commit 39eb71c into main Oct 16, 2024
10 checks passed
@wuputah wuputah deleted the jd/docker-image branch October 16, 2024 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer experience Improves our own lives
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants