-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement parallel job execution #1562
base: master
Are you sure you want to change the base?
Conversation
personally the |
@syphar: Thanks for working on this! Regarding [parallel]
foo: recipe_1 recipe_2
echo hello Somehow I don't have a good feeling about that syntax. Is there a reason, why something like the following wouldn't work? foo: parallel(recipe_1 recipe_2)
echo hello That seems more specific to me. |
This was coming from this comment by @casey : #626 (comment)
This would definitly be more complex to implement since this is a change to the file-format.
where only |
I think that would be an advantage. Basically, we'd be able to define groups of dependency tasks: foo: dependency_of_dependencies parallel(dep_1 dep_2) teardown_of_dependencies
echo hello |
I'm not sure if it's worth adding new syntax for this feature, especially when you could do the same with another parallel rule. Also, personally I probably won't be able to invest that much more additional time to dig into how the parser / lexer works to change the syntax. |
Hello, thanks for bringing this good feature @syphar ! Dropping my two cents: What if instead of dependencies, the And if parallel groups are desired, we may do this inside the attribute declaration, like I think this may solve the concerns of @d4h0:
Which may lead to something like:main: branch_a1 branch_a2 branch_b1 branch_b2
echo "main!"
[parallel(a)]
branch_a1:
echo "a1"
[parallel(a)]
branch_a2:
echo "a2"
[parallel(b)]
branch_b1:
echo "b1"
[parallel(b)]
branch_b2:
echo "b2" |
@ykis-0-0 thank you for the feedback! But since there is no feedback from any maintainer yet, I won't invest (sink) more time into this feature, without knowing if a specific (or even any) design would be accepted & merged at some point. Don't get me wrong, I'm happy to invest more time if there is a good chance of a certain design being merged at some point. |
Hey guys. I was looking for replacement for Just on some projects since it doesn't support task parallelization (and in my opinion it's a must these days). Just itself is super fast, which is great and refreshing especially compared to annoyingly slow npm scripts. But without task parallelization it falls short compared to some other tools. While doing my research I noticed this PR, so I though I'll add my 5c. As I really like Just for its simplicity and minimalism and would actually prefer staying with it if it supported this feature. I would take approach used by a competing tool - Task (taskfile). With Just that would translate to
Note: there are of course known caveats when running just from the recipe mentioned here, but that's a whole different topic and could be solved separately. However, I understand going parallel by default would be a significant breaking change so even if such dramatic change would be considered by the maintainer, it could only land in next major. I guess As for the possibility of mixing together sequential and parallel recipes mentioned in the comments above
IMO that's redundant. If you need to mix and match, just create another sequential recipe and make the parallel one a dependency. Composition FTW. With "parallel by default" approach, the first snippet in my message shows how that would work essentially by calling just from the recipe (at least in the beginning, unless some more advanced recipe referencing mechanism is implemented). And with the syntax proposed by this MR that would look like
Also, even though I'm not a maintainer, I'd like to address the question from the PR description from the user perspective:
This is a very common case for any tool that does parallel execution. I believe the most common approach is this:
IMO use of the fail-fast option should be an explicit user choice as killing some of the tasks could have detrimental effects (e.g. you probably don't want to kill your
|
Sorry for being so delinquent reviewing this! Another project of mine has gotten popular this year, so I haven't had as much time for I started reviewing this, and this looks like a good approach. I wasn't sure if I liked the task scope, so I tried to rewrite it without it. I came up with a version that uses Rust's native scoped threads, and mspc channels to communicate back errors, so that it can error out when any thread returns an error, instead of waiting for each thread in turn to complete. It's in #1738, what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, back to this PR. I've definitely come around to your approach, i.e., creating a struct which handles scope creation and abstracting whether or not to run in parallel.
I used std::sync::mpsc
in #1738, because I thought it helped with early return, but it doesn't, so using join handles to communicate errors back to the main thread seems like the way to go.
- Let's implement this in terms of
std::thread::scope
instead ofcrossbeam::scope
. It would be nice to use the stdlib, unless it's not possible or there's a compelling reason to usecrossbeam
. - Lets' change
Ran
into a type alias:type Ran = Arc<Mutex<HashSet<Vec<String>>>>;
. It can then be initialized withRan::default()
.insert
andcontains
will be a little noisy, but it's nice to avoid a wrapper type. It can be insrc/lib.rs
. - This is a big feature, so let's land it as unstable first, so we can get it in and iterate on it. You can easily generate the right error with
config.require_unstable("The `--parallel` flag is currently unstable.")?;
Things we can figure out later:
- Are dependencies handled correctly? I think maybe if you have:
[parallel]
foo: a b
a: c
b: c
Then a
and b
might wind up launching c
independently.
-
How to do early return? We'll probably have thread through a
Mutex<bool>
used to signal if an error has occured on another thread, and a helper function that launches a command, and then loops checking the mutex for an error, checking if the child has exited, and then sleeping for 100ms or something. -
Are chars written to stdout intermingled?
@@ -225,4 +225,3 @@ pwd: | |||
# Local Variables: | |||
# mode: makefile | |||
# End: | |||
# vim: set ft=make : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to remove this?
@@ -12,6 +12,7 @@ pub(crate) enum Attribute { | |||
NoCd, | |||
NoExitMessage, | |||
Private, | |||
Parallel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be sorted.
@@ -1,12 +1,14 @@ | |||
use parallel::Ran; | |||
use std::sync::Arc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this use
into src/lib.rs
so it doesn't need to be imported in multiple files.
evaluated.push( | ||
evaluator | ||
.evaluate_expression(argument) | ||
.expect("error evaluating expression"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task_scope
should accept a function that returns an error, so we can use ?
here.
|
||
type ScopeResult<'src> = RunResult<'src, ()>; | ||
|
||
pub(crate) struct TaskScope<'env, 'src, 'inner_scope> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) struct TaskScope<'env, 'src, 'inner_scope> { | |
pub(crate) struct Parallelizer<'env, 'src, 'inner_scope> { |
/// The first error will be returned as result of this `task_scope`. | ||
/// | ||
/// Only works for tasks with an `RunResult<'src, ()>` result type. | ||
pub(crate) fn task_scope<'env, 'src, F>(parallel: bool, f: F) -> ScopeResult<'src> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this to a function on Parallelizer
:
impl Parallelizer {
fn parallelize(parallel: bool, f: F) -> RunResult<'src, ()> {
thread::scope(|scope| {
let mut parallelizer = Self {
parallel,
scope,
join_handles: Vec::new(),
};
f(&mut parallelizer)?;
for handle in parallelizer.join_handles {
handle.join().expect("could not join thread")?;
}
Ok(())
}
)
}
How does this handle output? Ninja does buffering for parallel jobs, and that is nice because it is easy to trace back where errors come from. From their docs:
|
I only wanted to say, I didn't forget about this PR, and still want to finish up all the details at some point. ( my open source time was/is focused on some major docs.rs improvements, and I'm hoping to come back to this change here when that's done) |
@syphar : So happy if you finish this up, its a game changer! |
@syphar If you'd like some assistance with this, hit me up. The beef of this may be a bit out of my depth, but maybe I can assist with some minor parts to lift a bit off your plate. |
This would be great to have in |
Seeing #626 I started investing some time into a draft for parallel execution.
This PR would:
[parallel]
attribute--parallel
argumentwhile keeping execution / dependency order, including when having dependencies of dependencies etc.
This implementation here compiles & works.
From my side, to finish this up, I would:
re-add theran
logic to prevent duplicate task execution, probably by wrapping this in anArc<Mutex<>>
, or some message passing.get rid of the repetition / duplicationadd a--parallel
argument to control if the main tasks are run in paralleladd documentationFrom @casey (or another maintainer?) I would need feedback on:
should we do it the same way as Taskfile, so dependencies are run in parallel by default? IMO this would then be a backwards incompatible change.or should this also be controlled by the--parallel
argument?or optional via an annotation?Also,std::thread::scope
was only (relatively) recently added in 1.63, but if we have a lower MSRV we can just switch to usingcrossbeam::scope