Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider using parallelly::availableCores() instead of parallel::detectCores() #128

Open
HenrikBengtsson opened this issue May 31, 2024 · 2 comments

Comments

@HenrikBengtsson
Copy link

Hello, I came to you R package from an HPC user asking how to parallelize with it.

I see that n_processors = "auto" defaults to:

if (identical(n_processors, "auto")) n_processors <- floor(parallel::detectCores(logical = FALSE)/2)

Unfortunately, this doesn't play well on systems with multiple users. There's a great risk it will overuse the CPU resources, resulting in grumpy sysadms, and unhappy fellow users on the same machine.

At a minimum, would you mind replacing that line with the following backward compatible solution:

 if (identical(n_processors, "auto")) n_processors <- floor(parallelly::availableCores(logical = FALSE)/2) 

The advantage of that, is that it respects limits that the sysadm and job schedulers assigns to the user/process. You can read more about the problems with detectCores() in my 2022-12-05 blog post 'Please Avoid detectCores() in your R Packages' (https://www.jottr.org/2022/12/05/avoid-detectcores/). That post also explains how availableCores() makes your code play nicer.

There are actually more issues with your current floor(parallel::detectCores(logical = FALSE)/2). Note that parallel::detectCores() will return 1 on some systems (e.g. virtual machines and Posit Cloud). That means you end up with floor(1/2), which becomes 0. So, you need to do something like max(1, ...) on that. As the above blog post mentions, there are also system where detectCores() returns NA_integer_.

@ekatsevi
Copy link
Member

ekatsevi commented Jun 3, 2024

Thanks for bringing these issues to our attention! We will look into them. In the meantime, we would recommend using the sceptre Nextflow pipeline for much more scalable application of sceptre on HPC, which avoids the issues you bring up.

@ekatsevi
Copy link
Member

ekatsevi commented Jun 7, 2024

Hi @HenrikBengtsson, thanks again for bringing up these subtle issues. Could you please suggest a concrete line (or lines) of code to replace the line in question, which addresses all of the issues you bring up? Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants