-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add which function for finding executables in PATH #2440
base: master
Are you sure you want to change the base?
Conversation
Closes casey#2109 (but with a function name that is shorter and more familiar)
Thanks for the PR! I this is definitely useful. One thought is that perhaps this function should actually fail, as in, produce an error and terminate execution, if the binary does not exist? It's very common that if a binary isn't available, you can't do anything useful, and so you just want to give up. In that case, you would be required to check that Tests which are just testing the functionality of Some thoughts:
|
The use cases I have in mind don't involve throwing an error. For instance, the example from #2109 suggests using If the user wants to give up, they could write, for instance: git := if which("git") == "" { error("git is not installed") } else { which("git") } An alternative is to allow
That makes sense. Should I remove the tests I currently have?
I think that's right, though I haven't tested it myself to confirm. I chose to use
Yeah
I like the word choice of x := require("ls")
@test:
echo {{x}} # what does this print? I guess it could just return the full path like Design-wise, I'm starting to lean toward having |
Gotcha, that's good to know. In that case I think returning the empty string is ideal. I'm actually thinking about adding Python-style
Or, if you have a fallback:
If we added require, which I agree we can add later:
(require would behave like
I think the tests you have are reasonable, and we should have at least one test, not to test the dependency, but test the function implementation, i.e. that we're calling the right dependency.
I think this is good, and we should use
Ooo, good call. Yah,
Yah, I agree. And
I think we should just do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random comments.
I took a look at the |
Yeah, I agree. I'm happy to give it a shot, though I'm not familiar with how executables work on Windows, nor what the conventions are, so I'll start based on what the
Yeah I saw you had mentioned that in another issue somewhere, and I really like that proposal. Besides, it would be nice to clarify what the semantics of conditional expressions and "Booleans" are---introducing those operators would act as a forcing function for that clarification. |
I just pushed a draft implementation, but I haven't written any new tests (for which coverage is probably more important now). This implementation doesn't use the Please note that The following script runs #!/usr/bin/env bash
set -e
tmpd="$(mktemp -d)"
pushd "$tmpd" >/dev/null
# Add an ordinary directory with an ordinary executable to PATH
mkdir bin
printf '#!/bin/sh\necho "this is expected"\n' > bin/cmd
chmod 755 bin/cmd
NEWPATH="$(pwd)/bin"
# An unreadable empty directory to PATH
mkdir unreadable-dir
chmod 000 unreadable-dir
NEWPATH="$(pwd)/unreadable-dir:$NEWPATH"
# Add a broken symlink to PATH
ln -s nowhere broken-dir
NEWPATH="$(pwd)/broken-dir:$NEWPATH"
# Add a directory with an unreadable file to PATH
mkdir unreadable
printf '#!/bin/sh\necho "this is unexpected"\n' > unreadable/cmd
chmod 000 unreadable/cmd
NEWPATH="$(pwd)/unreadable:$NEWPATH"
# Add a directory with a broken symlink to PATH
mkdir broken
ln -s nowhere broken/cmd
NEWPATH="$(pwd)/broken:$NEWPATH"
sh="$(which sh)"
which="$(sh -c 'which which')"
printf "Executing \`cmd'...\n\t"
PATH="$NEWPATH" "$sh" -c "cmd"
# executes ./bin/cmd
printf "Running \`which cmd'...\n\t"
PATH="$NEWPATH" "$sh" -c "$which cmd"
# prints the absolute path of ./bin/cmd
popd >/dev/null
rm -rf "$tmpd" The shadowing use case you discuss in this comment is interesting, but I think it can be pretty hard to distinguish between an unmounted directory or unreadable file from unrelated directory in That said, I think was still worthwhile to rewrite a simplified version of the
Something else I wanted to respond to:
I'm only just learning about this myself, but |
@casey I finally got around to updating the tests to try out the internal I've only run this on my local macOS machine; I'm wondering if you could run those tests in CI to see if my implementation works on other OSes (Windows especially)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally took another look at this, sorry for the delay!
I set the tests to run, and it looks like they all pass.
I think delegating to is_executable
is fine, and that what you wrote about swallowing I/O errors is reasonable, given that it's what the external which
command does.
Good to know that PATHEXT is a standard windows thing, I wasn't familiar with it at all.
I didn't do a super in-depth review, my only comment is about trying to avoid a dependency on either
.
We could also consider adding a variant of which()
, in a follow-up PR (doesn't have to be by you!) which fails if the binary isn't found. I think it's pretty common to need a bunch of commands available, and not want to run if they aren't present.
No worries! Thanks for running the CI tests, I'm glad that everything works on Windows too.
In all honesty I'm not familiar with this either... I will ask some of my friends from the Windows universe whether we are handling this correctly.
Yeah, maybe something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Left a bunch of comments, check them out.
@@ -661,6 +662,61 @@ fn uuid(_context: Context) -> FunctionResult { | |||
Ok(uuid::Uuid::new_v4().to_string()) | |||
} | |||
|
|||
fn which(context: Context, s: &str) -> FunctionResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this version? I found the original logic a little hard to follow:
(Also included some comments which don't need to go into the final PR.)
fn which(context: Context, command: &str) -> FunctionResult {
use std::path::Component;
let command = Path::new(command);
let relative = match command.components().next() {
None => return Err("empty command".into()),
// Any path that starts with `.` or `..` can't be joined to elements of `$PATH` and should be considered on its own. (Is this actually true? What about `C:foo` on windows? Is that a thing?
Some(Component::CurDir) | Some(Component::ParentDir) => vec![command.into()],
_ => {
let paths = env::var_os("PATH").ok_or("`PATH` environment variable not set")?;
env::split_paths(&paths)
.map(|path| path.join(command))
.collect()
}
};
let working_directory = context.evaluator.context.working_directory();
let absolute = relative
.into_iter()
// note that an path.join(absolute_path) winds up being absolute_path
// lexiclean is hear to remove unnecessary `.` and `..`
.map(|relative| working_directory.join(relative).lexiclean())
.collect::<Vec<PathBuf>>();
for candidate in absolute {
if is_executable::is_executable(&candidate) {
return candidate
.to_str()
.map(str::to_string)
.ok_or_else(|| format!("Executable path not unicode: {}", candidate.display()));
}
}
Ok(String::new())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find my implementation easier to understand, but that might only be because I wrote it.
The main difference between our implementations seems to be how we determine whether to resolve via PATH
. Mine checks command.components.count()
: anything with more than one component is considered a relative path and skips PATH
resolution. Meanwhile, yours checks command.components.next()
: anything whose first element is .
, ..
, or /
is considered a relative path and skips PATH
resolution.
Crucially, they differ on how they handle something like which("dir/cmd")
, which has more than one component, but the first component is a Component::Normal("dir")
. Thus, my implementation would consider it the same as ./dir/cmd
(i.e., resolved relative to CWD), while yours would resolve it using the PATH
variable.
I wrote a small test, and (at least on macOS with sh
, bash
, and zsh
) it seems like my implementation has the correct behavior in this instance:
#!/bin/sh
set -e
tmpdir="$(mktemp -d)"
cd "$tmpdir"
mkdir -p path/dir dir
printf "#!%s\necho resolved via PATH" "$SHELL" > path/dir/cmd
chmod +x path/dir/cmd
printf "#!%s\necho resolved via CWD" "$SHELL" > dir/cmd
chmod +x dir/cmd
PATH="$(realpath path)" dir/cmd # prints 'resolved via CWD'
rm -rf "$tmpdir"
The other differences are:
- You use
Path::new(command)
rather thanPathBuf::from(command)
. I've adopted this change since it's more efficient (and also - You use
lexiclean()
to remove extraneous..
or.
components. However, at least thewhich
implementation on my system (macOS) does not do this. - You handle absolute paths (those beginning with
/
) implicitly, through the behavior ofPath::join()
(TIL about this behavior!). But I personally think it's clearer to handle this case explicitly. - You make two passes through the candidates, one to prepend the working directory, and one to check whether it is an executable. I didn't do this to avoid making another copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, you're right, in addition to your test, I looked at the [POSIX shell standard](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html_ (search for "command search and execution"), and it says that for any path with a, the PATH
variable isn't consulted.
So I think command.components.count()
is best. (Which, if we're doing that, it makes sense to use that to check for an empty command, and not .is_empty()
)
GNU which does actually remove ..
:
$ gwhich ../just/bin/forbid
/Users/rodarmor/src/just/bin/forbid
So I'm in favor of using lexiclean
, since it makes the returned paths easier to read, in case they return ..
or .
.
I personally like using the relative path behavior to do joining, but I think that's kind of a wash, since it's a little weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GNU which does actually remove
..
[...] So I'm in favor of using lexiclean, since it makes the returned paths easier to read, in case they return..
or.
.
I wasn't aware of that, thanks for finding that behavior! But if someone really wants to clean up the path, can't they do something like clean(which("some/../funny/path"))
? Or even canonicalize(which("some/../funny/path"))
?
The argument against cleaning the path is that doing so removes information that can no longer be recovered. I'm wary of over-normalization for weird edge cases like the example mentioned in the Rust Path::components() docs.
Admittedly, I can't personally foresee any scenario where this relative path information might actually be useful, so I'm strongly against calling lexiclean()
here. But since the user can opt into normalization with other Just functions like clean
and canonicalize
, shouldn't we leave this to the user?
- use Path::new(s) instead of PathBuf::from(s) - revert early handling of empty path - use PathBuf::join() instead of PathBuf::push()
@casey I made some changes according to your suggestions, and also added some test cases. It might be nice to test the behavior of prefixed-paths on Windows but I don't know the platform well enough to say what the expected behavior should be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, see minor comments.
Closes #2109.
I considered calling it
executable_exists
but that seemed limiting---which
also returns the path of the executable, not just whether it exists. Then one can implementexecutable_exists(name)
usingwhich(name) == ""
.I wasn't sure whether to add more tests. Most of the interesting test cases are the responsibility of the
which
crate to get right (eg executable existing in multiple directories in PATH, executable existing in PATH but not having executable permissions, etc). At the end of the day this function is just a wrapper aroundwhich::which
, so there is not much more to test here than what should already be tested bywhich::which
.Also not sure if documentation was added in the right place.