`rust-analyzer` discoverConfig integration #3073

bobozaur · 2024-12-10T09:53:21Z

Adds a target that can be used for project auto-discovery by using the discoverConfig settings as described in the rust-analyzer user manual.

Unlike the gen_rust_project target, this can be used for dynamic project discovery, and passing {arg} to discoverConfig.command can split big repositories into multiple, smaller workspaces that rust-analyzer switches between as needed. Large repositories can make it OOM.

At amo, we've used a similar implementation for a while with great success, which is why we figured we might upstream it. The changes also include two additional output groups to ensure that proc-macros and build script targets are built, as rust-analyzer depends on these to provide complete IDE support.

Additionally, the PR makes use of the output_base value in bazel invocations. We found it helpful to have tools such as rust-analyzer and clippy run on a separate bazel server than the one used for building. And a config_group argument was added to provide the ability to provide a config group to bazel invocations.

An attempt to get codelens actions to work was done as well, particularly around tests and binaries. They seem to work, but I'm not 100% sure whether the approach taken is the right one.

Closes #2755 .

google-cla · 2024-12-10T09:53:27Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

bobozaur · 2024-12-16T14:06:02Z

@sam-mccall I noticed you've been working on things adjacent to this in other recent PRs. Just adding you here for visibility reasons. Any advice you can offer regarding the PR is more than welcome.

sam-mccall · 2024-12-18T23:16:08Z

@sam-mccall I noticed you've been working on things adjacent to this in other recent PRs. Just adding you here for visibility reasons. Any advice you can offer regarding the PR is more than welcome.

Thanks! Yes I had a draft version running locally that I'd planned to clean up and send, but I'm happy you got there first.
I'm not an owner here, but happy to provide feedback and test this out.

This PR combines a few logically-separate changes. I get it - there are a bunch of details that all need to be right for this to work end-to-end. (I ran into some overlapping-but-different set of these myself). I'll try to understand them all and provide high-level comments first, and then let you figure out whether to split them out and land the more "obvious" stuff first. (e.g. having a separate output-base and injecting blaze configuration are useful, but different users will have different needs here).

(I did spot this last friday and started to review, but have been unwell this week...)

sam-mccall

This is great stuff, really thorough job! Some parts of the protocol (logging & runnables) were totally new to me.

High level:

starlark changes (proc macros & build scripts) are good. Related: rust_analyzer: generate more reqired sources #3031 includes some other generated files.
A separate binary for discover sounds good to me (I had this as a flag on gen_rust_project, but it's messy)
this is background infra that should "do what I mean", as such we want fewer flags and more detection.
some things we will eventually want should inform flags/detection:
- become a standalone binary (not invoked through blaze run)
- automatically select an output_base
a BlazeSpec (argv0, workspace, execroot, ...) would be a useful abstraction, replacing Config and long param lists
clearer split between main (config + protocol) and library (build+query+describe) would make the code easier to follow.

Let me know what you think - as I said I'm not an owner here. I can nag one to get you a stamp :-) but also feel free to get a second opinion instead.

If you're interested, this could be a few PRs - might go quicker, but totally up to you:

proc macros + build scripts
refactor current bin+lib to make lib reusable
add discover tool
runnables

rust/private/rust_analyzer.bzl

rust/private/providers.bzl

rust/private/rust_analyzer.bzl

sam-mccall · 2024-12-19T15:59:41Z

tools/rust_analyzer/rust_project.rs

@@ -108,6 +223,31 @@ pub fn generate_rust_project(
        sysroot: Some(sysroot.into()),
        sysroot_src: Some(sysroot_src.into()),
        crates: Vec::new(),
+        runnables: vec![


(no action needed) runnables is not documented in the rust-project.json spec, we should fix that.

sam-mccall · 2024-12-19T16:05:48Z

tools/rust_analyzer/rust_project.rs

@@ -108,6 +223,31 @@ pub fn generate_rust_project(
        sysroot: Some(sysroot.into()),
        sysroot_src: Some(sysroot_src.into()),
        crates: Vec::new(),
+        runnables: vec![
+            Runnable {


What's the intended use of build?

If it's to get diagnostics (which RunnableKind::Check suggests), it's not clear to me if we should pass --config, --output_base etc as we're a tool, or not pass them as we're acting as a simple proxy for the user.

I'm fine with either answer but let's say why in a comment.

Part of this is: are we expecting --config to be something like --config=generate_simplified_sources_for_rust_analyzer (should be ignored) or --config=macos (should be passed)?

What's the intended use of build?

Don't laugh, but I have absolutely no idea. I can't find any place where it is used in rust-analyzer so it might be a leftover. I initially just copied this from the buck2 implementation, only refining it later. I don't see a reason to keep it, to be honest.

But I think the best course of action would be to raise some issue with rust-analyzer and see if this serves any purpose before removing it, just to ensure we're not omitting something here.

Regarding passing --config and other args, I'm not entirely sure either. It's a very good question so thanks for spotting this, I missed it.

From the top of my head, I'd argue that we should not pass them. The main idea behind the flags are to get the tool itself running in a way that does not obstruct regular usage, like building, running or testing targets. Runnables, while invoked by the tool, do qualify as "regular usage", I think. It's just that instead of typing a command in the terminal you invoke them through the IDE code lens.

No strong opinions here though.

Don't laugh, but I have absolutely no idea

That's perfectly reasonable!

I see TestOne and Bin used in rust-analyzer/src/target_spec.rs, but not Check, so maybe drop Check?

For explicit user actions (--test) I agree we don't need to take any workspace-related shortcuts. Sometimes' there's code that doesn't build in the default config at all, though. In any case, if we punt on --config for now we can leave this :-)

Note that I ended up removing the --config_group CLI argument to err on the side of simplicity.

Regarding the RunnableKind::Check variant, it seems that it was recently added here: rust-lang/rust-analyzer@71a78a9. If I were to guess, this is perhaps meant to help with the implementation of cargo check like support.

I have no strong opinions, though. We could remove this entirely, simply comment it out or leave it in preparation for better flycheck support for non-cargo projects from rust-analyzer's side.

tools/rust_analyzer/rust_project.rs

sam-mccall · 2024-12-19T16:27:35Z

tools/rust_analyzer/lib.rs

        .env_remove("BAZELISK_SKIP_WRAPPER")
        .env_remove("BUILD_WORKING_DIRECTORY")
        .env_remove("BUILD_WORKSPACE_DIRECTORY")
+        .arg(format!("--output_base={output_base}"))
        .arg("build")


(no action needed) When we're running in the background, we want to be as forgiving as possible if the user's current state is broken. So eventually we should have --nocheck_visibility here and possibly elsewhere.

tools/rust_analyzer/lib.rs

bobozaur · 2024-12-23T10:51:48Z

@sam-mccall Sorry for the delay and thank you for the thorough review! I'm not going to have access to my laptop for a few more days but I plan to address all of this after Christmas. Happy holidays :)!

bobozaur · 2025-01-06T01:24:23Z

@sam-mccall My apologies for the delay. Winter break wasn't as restful as I hoped it would be so I got delayed a bit 🙃 .

Nevertheless, I addressed the bulk of your concerns. While all your ideas are great, I do feel though that some of them might be out of scope for this PR. In my opinion, the main objective here is getting a working solution for project auto-discovery.

While some of the design decisions are questionable, I think an iterative approach might be better, considering that a few issues pertain to code that already existed or was simply moved around for the sake of DRY. In particular, the points revolving around a BazelSpec, the CLI args and bazel info autodetection seem to me like separate issues than what this PR tries to address. I've no problem helping with those as well, though I'd definitely be more comfortable doing that in a subsequent PR.

A separate binary for discover sounds good to me (I had this as a flag on gen_rust_project, but it's messy)

Hah, I know what you mean. When I started out on this I attempted the same thing but, while do-able, it only led to convoluted code and pain 😂 .

If you're interested, this could be a few PRs - might go quicker, but totally up to you

I think that multiple PR's would've been better if done from the start. However I only worked with bazel for a couple of months and I'm slightly afraid that if we'd be breaking this down now there's a risk of me missing individual pieces that all have to come together for this to work correctly. With the bulk of the work done and all being present here, I find it a easier to just march on with this single PR. If that's not acceptable, for any reason, do let me know.

Let me know what you think - as I said I'm not an owner here. I can nag one to get you a stamp :-) but also feel free to get a second opinion instead.

Perhaps it might be valuable to get a code owner to review this as well, especially since some of your questions are related to pre-existent code.

sam-mccall · 2025-01-08T01:17:39Z

@sam-mccall My apologies for the delay. Winter break wasn't as restful as I hoped it would be so I got delayed a bit 🙃 .

No worries at all! I hope you got some time off at least :-)

Nevertheless, I addressed the bulk of your concerns. While all your ideas are great, I do feel though that some of them might be out of scope for this PR.

Yes, fair enough! I'm happy if we can find ways to reduce the scope here.

We should be mindful this isn't a small change: it's more than doubling the size of tools/rust_analyzer, and reusing code across binaries. When adding a lot of complexity, we will need to spend some effort improving design.

On the command-line interface: I think if we're adding a new tool, it should not expose unnecessary options. That is, copying over the flags/knobs from gen_rust_project with the hope of removing some later, is not a reasonable way to reduce the scope of a PR even if it saves some code.

This interface is important - I think more so than the code that implements it. Once released, it will be difficult to remove features from it, as we cannot see or reason about the callers. Each option is a constraint that makes maintenance harder.

(Concrete example: gen_rust_project supports specifying/inferring workspace and execroot in many combinations. This makes it really awkward to add support for automatic ephemeral --output_base. For discover_rust_project this is an important feature, while the ability to pass --execution_root is borderline useless).

While some of the design decisions are questionable, I think an iterative approach might be better, considering that a few issues pertain to code that already existed or was simply moved around for the sake of DRY.

DRY is library design. If we lift some single-use code into a reusable module, now changes to each user are constrained by the other, and by the module boundary. If we want to iterate cheaply, we should duplicate code and customize it, and defer generalization until later. (I think this is the right approach for the CLI).

If you're interested, this could be a few PRs - might go quicker, but totally up to you

I think that multiple PR's would've been better if done from the start. However I only worked with bazel for a couple of months and I'm slightly afraid that if we'd be breaking this down now there's a risk of me missing individual pieces that all have to come together for this to work correctly. With the bulk of the work done and all being present here, I find it a easier to just march on with this single PR. If that's not acceptable, for any reason, do let me know.

Let me know what you think - as I said I'm not an owner here. I can nag one to get you a stamp :-) but also feel free to get a second opinion instead.

Perhaps it might be valuable to get a code owner to review this as well, especially since some of your questions are related to pre-existent code.

Ok, summoning some owners: @UebelAndre @illicitonion @krasimirgg

are you happy with what this PR is trying to do? (I hope so!)
is it useful/sufficient if I continue to review? would someone prefer to take over now/soon?
anyone have opinions/historical context on the configuration surface of gen_rust_project (flags, env vars, defaults, inference) and how much of it we should support?

bobozaur · 2025-01-09T18:30:09Z

DRY is library design. If we lift some single-use code into a reusable module, now changes to each user are constrained by the other, and by the module boundary. If we want to iterate cheaply, we should duplicate code and customize it, and defer generalization until later. (I think this is the right approach for the CLI).

@sam-mccall thanks for thinking about this. On a second thought I agree with you. I'll "duplicate" the config on each binary so we can tweak the discovery one without worrying about the impact on the gen_rust_project target.

bobozaur · 2025-01-10T10:45:53Z

@sam-mccall thanks for thinking about this. On a second thought I agree with you. I'll "duplicate" the config on each binary so we can tweak the discovery one without worrying about the impact on the gen_rust_project target.

I split up the configs so each binary has its own, but I did factor out a get_bazel_info() function (which I believe to be general enough to be "lib worthy") and added an extension trait for std::process::Command to help with deduplicating bazel command instantiations.

I also removed the config_group CLI arg for discover_rust_project, however another use case occurred to me and I'm really sorry for not remembering this sooner: besides --watchfs and the --build_tag_filters which, after more thought, are "useless", this also allowed passing in specific --platforms.

The --platforms matter is very important and we should figure out a way to address it. We could simply add a CLI arg fo it, and I might do that to kickstart the process, but my main rationale behind the config group was the enabling users to specify as many configuration settings as desired without bloating the CLI interface.

Curious about what other people think regarding this.

krasimirgg · 2025-01-10T11:20:27Z

Thank you @bobozaur This is exciting!

As an owner, I'm happy to go over this PR and help out with the review.

On @sam-mccall 's questions:

are you happy with what this PR is trying to do? (I hope so!)

Yes! I think this is a great feature to have.

is it useful/sufficient if I continue to review? would someone prefer to take over now/soon?

Yes, I think you have a lot of good context on this and can provide some good feedback.

anyone have opinions/historical context on the configuration surface of gen_rust_project (flags, env vars, defaults, inference) and how much of it we should support?

Unfortunately I'm not familiar with this part of the codebase... @hlopko might have some insights. My opinions:

it's very nice to keep the new project discovery as a separate tool.
we should try to keep the existing gen_rust_project mechanism feature set.
we should come up with a minimal set of supported features for the new project discovery. I think for a first version a common set of features that @bobozaur and @sam-mccall would be great (maybe with priorities; they don't all have to be added to this PR, as it might be simpler to review the addition of a single feature as a smaller follow-up PR).

It would be nice to set up some sort of a test that exercises the new workflow, since this work touches a lot of places and there are quite some nontrivial interactions that come into play to get this all to work. Maybe we can look up copying and adapting something of the pre-existing gen_rust_project tests.

bobozaur · 2025-01-10T13:36:49Z

It would be nice to set up some sort of a test that exercises the new workflow, since this work touches a lot of places and there are quite some nontrivial interactions that come into play to get this all to work. Maybe we can look up copying and adapting something of the pre-existing gen_rust_project tests.

@krasimirgg I would like to point out that the structure between rust-project.json and the message passed to rust-analyzer for autodiscovery is the same. Well, technically the DiscoverProject::Finished wraps the JSON that would otherwise be written to rust-project.json, but you get my point. Considering this and the fact that the rust-project.json payload is handled entirely on the library side, I don't really see a reason why tests that pass for gen_rust_project would not pass for discover_rust_project or vice-versa.

I'm on board with testing the discover_rust_project target too, but I'm just not sure what's worth testing. The only thing that comes to mind is a deserialization test to ensure that the discover_rust_project target only outputs DiscoverProject JSON messages. Should I implement this? Are there any other ideas of tests to implement?

bobozaur · 2025-01-10T14:04:30Z

(Concrete example: gen_rust_project supports specifying/inferring workspace and execroot in many combinations. This makes it really awkward to add support for automatic ephemeral --output_base. For discover_rust_project this is an important feature, while the ability to pass --execution_root is borderline useless).

@sam-mccall I keep thinking about this and the overall config situation. I'm wondering what you have in mind in terms of automatic ephemeral --output_base.

If you refer to some temp dir output base, I'm not sure that's a good idea. rust-analyzer will invoke discover_rust_project multiple times as you're working (like when modifying a build file or switching workspaces). Reusing the same output base would be helpful so that generated source files and proc macro dylibs are cached between invocations. I agree though that we should allow users to pass a custom output base and perhaps other configurations (--platforms comes to mind) as well.

That's what I wanted to achieve with the --config (config group) CLI argument, but you're right about this not really being enough. How do you feel about allowing a .bazelrc file instead? It would be a single argument that can encapsulate output_base, platforms and a wider variety of settings than what a single config group would allow. I'll be pro-active about it but let me know if you do not agree with the approach and I can remove the commit.

EDIT: Also, another aspect to keep in mind is that the workspace, execution_root and output_base trio are all used for string substitution in the final JSON string. So while we do not have to accept all of them as arguments, we need all of them.

LATER EDIT: There's also the matter of autodiscovery in polyglot workspaces where no argument is being passed to the discoverConfig command, so the tool will run on the target pattern //.... This can cause issues as it would not look for Rust targets only (hence the idea with the --build_tag_filters).

bobozaur · 2025-01-12T12:03:29Z

One other scenario popped up that might be worth considering (if not now, then in a future PR).

Since the tool builds build scripts and proc macro targets to provide full support, if you are working on a proc macro crate or a build script and restart rust-analyzer, the build will fail and rust-analyzer will crash. This is, in my opinion, completely understandable.

However, if you're using the split workspace approach (passing {arg} to the discoverConfig command in rust-analyzer settings) then rust-analyzer will treat the current crate as the workspace root. It still crashes because it tries to build the crate, but the reality is that it doesn't have to, since there are no downstream consumers of the crate in the workspace.

So, essentially, the aspect should not output the proc macro dylib or build info out dir for the workspace root, this way not triggering a build of these targets and providing a better experience. This is definitely do-able, but I'm not sure what the right approach would be. We could have some sort of config setting where we pass the package to the aspect, and if the target it's currently observing is in that package, then omit the outputs I mentioned above.

Are there better ideas for handling this?

sam-mccall · 2025-01-13T02:01:16Z

First my apologies - I had review comments on the previous revision, and then replied without sending them. (Why do we have that button?) I'm going to send them now, then go through your latest changes.

My thoughts on the topics you raised below - mostly these are "let's not try to solve it in this patch".

@sam-mccall I keep thinking about this and the overall config situation. I'm wondering what you have in mind in terms of automatic ephemeral --output_base.

The default output_base would be /tmp/rust_analyzer/<hash>, where the hash is deterministically derived from the workspace root. This means that repeated invocations will hit the same bazel instance (until it times out), but it won't block regular user builds, and if you do work in multiple projects they won't stomp on each other.
(Ephemeral is probably the wrong word, sorry if that was confusing).

We don't need that feature in this PR. But I think it will be a good default, so until we have it I think we should refrain from offering other ways to configure the output base.

Specifying platforms, config, bazelrc...

Please leave these out of this patch if you don't mind, it's already large enough.

(Probably my favorite mechanism is allowing the user to specify extra flags for the bazel invocation. Then they can pass --bazelrc=... for full flexibility, but also --watchfs directly if they only need that.)

autodiscovery in polyglot workspaces where no argument is being passed to the discoverConfig command, so the tool will run on the target pattern //...

The no-argument-passed-to-discoverconfig case creates a lot of ambiguity and problems. Why do we need to support it?

build scripts and proc macro targets

Workflows around this are interesting, and I don't understand them well.
Can I suggest we leave them out initially as there's clearly some risk, and then subsequently add them behind a discover_rust_project flag so we can see how well it works in practice?
(It's difficult to "just patch in" this PR, since the aspect changes need to be live in the codebase you're working on)

Ultimately, I think treating the directly-queried "active" package differently from its dependencies on is going to cause problems. Users edit code in multiple packages, different packages are "active" at different times, and we're going to get complex behavior based on sequencing and how rust-analyzer invalidates its cache.

I'm on board with testing the discover_rust_project target too, but I'm just not sure what's worth testing.

It would be useful to have an end-to-end integration test, which creates a bazel workspace, runs the tool against a source file in it, and verifies the output is sensible.
Such tests are difficult to write cleanly. (There are some under test/rust_analyzer, they're not great).

I think this should wait until we're able to run the tool as a standalone binary (rather than under blaze run) as that is valuable and significantly changes how the test should be set up.

rust/private/providers.bzl

rust/private/rust_analyzer.bzl

tools/rust_analyzer/BUILD.bazel