-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable local packages and PROJECT_ROOT expansion in requirements files #2517
base: main
Are you sure you want to change the base?
Conversation
dd19d39
to
5d9e158
Compare
Signed-off-by: Amaury Pouly <[email protected]>
Signed-off-by: Amaury Pouly <[email protected]>
Signed-off-by: Amaury Pouly <[email protected]>
5d9e158
to
8e5751d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of your situation sounds like the problem is not that rules_python
is lacking features but rather that the organization you are in does not like/prefer bazel
as a build system and I think the solution to that problem is a non-technical. I very well understand this pain, but I am not sure rules_python
features will ever solve this fully.
In order to make rules_python
more maintainable we would like to keep it focused only to standards and it seems that PROJECT_ROOT
concept is not something that I can easily find when googling. That means that the project would have the burden of teaching users what this new rules_python
-specific thing means and how to use it in their setup.
uv
treatsPROJECT_ROOT
as an env var, but I don't see any docs in their GH project except for this comment herepoetry
doesn't have any mentions of that in their GH projectpdm
seems to have it documented according to their GH project
What is more, I think that staying compatible with #2345 is something that I would prefer.
@pamaury, if you had #2345 available as part of rules_python
feature set, would it be useful to you?
Other ways to make the PROJECT_ROOT
work would be to extend the envsubst
to also expand the requirements sources instead of creating an extra attribute just for a single string value.
if subst_req.startswith("file://"): | ||
_, path = subst_req.split("file://", 1) | ||
logger.info(lambda: "watching tree {} for wheel library {}".format(path, rctx.name)) | ||
rctx.watch_tree(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is the correct behaviour - will we start watching more than just the intended python files? What if the path
is not a dir? The watch_tree
will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if this is a directory then the correct behaviour should be to fall back to what #2345, or to use watch
instead of watch_tree
. Yes, it could end up watching more than the intended python files but as far as I know, pip
simply copies all of the files as well so I think this is the correct thing to do in that sense. This is definitely a trade-off in that using a local package means giving up on precise dependency tracking of the files within that package.
@@ -193,6 +193,10 @@ def _whl_library_impl(rctx): | |||
# Manually construct the PYTHONPATH since we cannot use the toolchain here | |||
environment = _create_repository_execution_environment(rctx, python_interpreter, logger = logger) | |||
|
|||
# Add a PROJECT_ROOT environment variable | |||
if rctx.attr.project_root: | |||
environment["PROJECT_ROOT"] = str(rctx.path(rctx.attr.project_root).dirname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an extra value here instead of adding PROJECT_ROOT
into envsubst
attr?
# Assume that such packages are imported using a single line requirement | ||
# of the form [<name> @] file://<path> | ||
# We might have to perform some substitutions in the string before searching. | ||
subst_req = envsubst(rctx.attr.requirement, environment.keys(), lambda x, dft: environment[x]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/bazelbuild/rules_python/pull/2517/files#diff-c007ed21502bf8ea19b98b3f1b402e7071615f8520e4291b00a71bca2cd451e8R114 how the envsubst
should be used instead.
Thank you for your comments, I understand that The issue with pip.parse(
hub_name = "pypi",
# If we want to expand the PROJECT_ROOT variable in the requirement lock file,
# we need to pass a label to the file defining the project root.
envlabels = {
"PROJECT_ROOT: "//:pyproject.toml",
},
# We need to use the same version here as in the `python.toolchain` call.
python_version = "3.9.13",
requirements_lock = "//:requirements_lock.txt",
) which has the advantage of being more general, although this is still quite a specific use case. That being said, the environment variable problem is less of an issue because I think I could work around it by creating a custom repository rule for the sole purpose of expanding this variable and then make this expanded version the input to |
@aignas I have thought about the problem a bit more. I think the |
Thanks. The probem is that there is no way to provide a label to a repository. We can only provide a label to something within the repository. I'm still wondering if this is being solved in the right domain - this sounds a little bit what gazelle is trying to do - make python code automatically available as bazel targets. Having such code in the repository rule or module extension stage is high maintenance and long term it doesn't really work. Hence my (and other maintainers') preference to keep the logic to a minimum. Is using gazelle out of question in your case? |
The goal of this PR is to make working with local packages easier.
Motivation:
We have a large multilanguage repository with a nontrivial amount of Python code. Unfortunately, most of the Python developers do not like/use Bazel and prefer to run the code directly and/or would like it to be managed using
pip
(e.g.pip install -e
). On the other hand, some of the python libraries are also used by bazel managedpy_binary
s. As a result, the BUILD files frequently become out of date and the Python developers are unhappy because the Python codebase is not managed by pip. To make matters worse, it is very easy to escape the sandbox in python by importing file based on the__file__
path. As a result, a python script can work in Bazel even with missing dependencies, leading to subtle caching bugs. The ideal situation would be that the Python codebase become a local Python package which is picked-up bypip
and made available to the Bazel codebase like any other package.Changes:
pip
already knows how to install local package. There are however, two missing pieces to make it work in Bazel:rules_python
should ask bazel to watch the directory so that the wheel gets rebuilt when the content has changed. This PR makeswhl_library
automatically watch any package specified using afile:...
path.pip
does not accept relative paths with thefile://
protocol, which makes dealing with local packages a bit tricky. Since environment variables are expanded bypip
in requirements lock files, the usual approach is to use an environment variable to point to the top of the project. It seems thatpoetry
,pdm
anduv
have converged on thePROJECT_ROOT
variable which is set automatically. This PR adds a newproject_root
attribute towhl_library
andpip.parse
which, if set, will automatically set thePROJECT_ROOT
env var. By default, thePROJECT_ROOT
variable is not set.Examples:
This PR adds an example of local package installed by
pip.parse
. The showcases theproject_root
attribute as well. To make it more interesting, the local package exports an entry point and depends on another python package. Two test are provided in the example.Issues/limitations/Comments:
There is some ambiguity about how local packages should be specified in requirements files. As far as I understand, the only officially supported way is to the use the
file://
protocol which is explicitely supported in the specification. Support for any other non-standard paths is very spotty anyway in the Python ecosystem, and subtly broken in different ways in most tools. This PR does not add any non-standard behaviour and stick to thefile://
protocol.The requirement lock file was compiled by
uv
. It seems that modern managers likeuv
,poetry
andpdm
are aware of environment variables and internally expand them when resolving them but keeps them intact when writing the requirement lock file. This is not the case ofpip-compile
which expands it to a full path. I guess this is whyrules_python
automatically rewrites absolute paths after runningpip-compile
. Unfortunately this breaks in the presence offile:
. Here is an example:But now
file://hello
is considered invalid bypip
:non-local file URIs are not supported on this platform
.As a result, the requirement file cannot be created by the
compile_pip_requirements
rule. I am not sure how this should be fixed.This PR may be incompatible with #2345 but I think it may supercede it in a way: if a wheel file is passed with a
file://
then it will be watched by Bazel (the code does not care whether it is a file or a directory) so this should produce exactly the same result. I haven't yet confirmed that this is the case.