Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unsoundness in tokenizers::utils::parallelism #1492

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions tokenizers/src/utils/parallelism.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
//! This module defines helpers to allow optional Rayon usage.
//!

use std::sync::Mutex;

use rayon::iter::IterBridge;
use rayon::prelude::*;
use rayon_cond::CondIterator;
Expand All @@ -12,7 +14,7 @@ pub use rayon::current_num_threads;
pub const ENV_VARIABLE: &str = "TOKENIZERS_PARALLELISM";

// Reading/Writing this variable should always happen on the main thread
static mut USED_PARALLELISM: bool = false;
static USED_PARALLELISM: Mutex<bool> = Mutex::new(false);

/// Check if the TOKENIZERS_PARALLELISM env variable has been explicitly set
pub fn is_parallelism_configured() -> bool {
Expand All @@ -21,7 +23,9 @@ pub fn is_parallelism_configured() -> bool {

/// Check if at some point we used a parallel iterator
pub fn has_parallelism_been_used() -> bool {
unsafe { USED_PARALLELISM }
*USED_PARALLELISM
.lock()
.expect("`USED_PARALLELISM` should only be accessed on the main thread.")
}

/// Get the currently set value for `TOKENIZERS_PARALLELISM` env variable
Expand Down Expand Up @@ -70,7 +74,9 @@ where
fn into_maybe_par_iter(self) -> CondIterator<P, S> {
let parallelism = get_parallelism();
if parallelism {
unsafe { USED_PARALLELISM = true };
*USED_PARALLELISM
.lock()
.expect("`USED_PARALLELISM` should only be accessed on the main thread.") = true;
}
CondIterator::new(self, parallelism)
}
Expand Down Expand Up @@ -159,7 +165,9 @@ where
let iter = CondIterator::from_serial(self);

if get_parallelism() {
unsafe { USED_PARALLELISM = true };
*USED_PARALLELISM
.lock()
.expect("`USED_PARALLELISM` should only be accessed on the main thread.") = true;
CondIterator::from_parallel(iter.into_parallel().right().unwrap())
} else {
iter
Expand Down
Loading