Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/block rejections timeout heuristic #5731

Merged
merged 44 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ea771ca
added heuristic for block rejections timeout
rdeioris Jan 22, 2025
daffba1
Merge branch 'develop' into feat/block_rejections_heuristic
rdeioris Jan 22, 2025
6ab756b
added rejections_timeout test exposure via BLOCK_REJECTIONS_CURRENT_T…
rdeioris Jan 22, 2025
bfd4793
improved test logic
rdeioris Jan 22, 2025
3cebb35
fixed integration test
rdeioris Jan 22, 2025
db14fba
Merge branch 'develop' into feat/block_rejections_heuristic
rdeioris Jan 22, 2025
c3e163e
fixed comment for BLOCK_REJECTIONS_CURRENT_TIMEOUT
rdeioris Jan 22, 2025
6b751c4
fixed test comment
rdeioris Jan 22, 2025
bb8df99
fmt
rdeioris Jan 22, 2025
280483b
added more comments
rdeioris Jan 22, 2025
05b3f92
fixed formatting
rdeioris Jan 23, 2025
9971d1a
added wanr! when timeout
rdeioris Jan 24, 2025
45ae015
initial prototype for configurable block_rejction timeout steps
rdeioris Jan 28, 2025
3071d36
improved config system
rdeioris Jan 29, 2025
f2ab500
fmt-stacks
rdeioris Jan 29, 2025
5c0805a
fixed default rejections timeout
rdeioris Jan 29, 2025
f22a990
improved comment for rejections timeout
rdeioris Jan 29, 2025
4f4b584
fixed typo
rdeioris Jan 29, 2025
463839f
ensure 0 key is specified for rejections timeout
rdeioris Jan 29, 2025
0e7461c
message typo
rdeioris Jan 29, 2025
420f861
use Counters instead of global test vars
rdeioris Jan 30, 2025
57f80fb
restored original wait_for_block_status
rdeioris Jan 30, 2025
d5abfdc
removed empty line
rdeioris Jan 30, 2025
ed48786
Merge branch 'develop' into feat/block_rejections_heuristic
rdeioris Jan 30, 2025
5269714
updated CHANGELOG
rdeioris Jan 30, 2025
5a70c43
updated CHANGELOG
rdeioris Jan 30, 2025
c424212
Secp256k1PrivateKey::random() instrad of new()
rdeioris Jan 30, 2025
d713290
removed useless test attributes
rdeioris Jan 30, 2025
adaabe6
use u32 for block_rejection_timeout_steps keys
rdeioris Jan 30, 2025
7cfbfdd
refactored block_rejection_timeout_steps percentage setup
rdeioris Jan 30, 2025
d4171d7
do not reset rejections_timer on block_status updates
rdeioris Jan 30, 2025
fbc3b7d
moved block_rejection_timeout_steps computation directly in SignerCoo…
rdeioris Jan 30, 2025
33642c3
Merge branch 'develop' into feat/block_rejections_heuristic
rdeioris Jan 31, 2025
790c1e1
merged with develop, reintroduces old logic for condition variables, …
rdeioris Jan 31, 2025
54aed16
rollback BlockStatus Default and PartialEq
rdeioris Jan 31, 2025
eb883ee
improved error messages on timeout
rdeioris Jan 31, 2025
fe429f4
report rejection step in miner log, ensure signe_test shutdown in int…
rdeioris Jan 31, 2025
6a238a7
added set_miner_current_rejections_timeout and set_miner_current_reje…
rdeioris Jan 31, 2025
8d2bb0d
refactored block_validation_check_rejection_timeout_heuristic test
rdeioris Feb 1, 2025
151c844
use From based cast for u32->u64
rdeioris Feb 1, 2025
b813afa
merged with develop
rdeioris Feb 1, 2025
f66c894
removed useless line
rdeioris Feb 1, 2025
0c86012
Merge branch 'develop' into feat/block_rejections_heuristic
rdeioris Feb 3, 2025
400c806
use u32 for rejections counter
rdeioris Feb 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 53 additions & 10 deletions testnet/stacks-node/src/nakamoto_node/signer_coordinator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@
// along with this program. If not, see <http://www.gnu.org/licenses/>.

use std::sync::atomic::AtomicBool;
#[cfg(test)]
use std::sync::LazyLock;
use std::sync::{Arc, Mutex};
use std::thread::JoinHandle;
use std::time::{Duration, Instant};

use libsigner::v0::messages::{MinerSlotID, SignerMessage as SignerMessageV0};
use libsigner::{BlockProposal, SignerSession, StackerDBSession};
Expand All @@ -33,14 +36,26 @@ use stacks::types::chainstate::{StacksBlockId, StacksPrivateKey};
use stacks::util::hash::Sha512Trunc256Sum;
use stacks::util::secp256k1::MessageSignature;
use stacks::util_lib::boot::boot_code_id;
#[cfg(test)]
use stacks_common::util::tests::TestFlag;

use super::stackerdb_listener::StackerDBListenerComms;
use super::Error as NakamotoNodeError;
use crate::event_dispatcher::StackerDBChannel;
use crate::nakamoto_node::stackerdb_listener::{StackerDBListener, EVENT_RECEIVER_POLL};
use crate::nakamoto_node::stackerdb_listener::{
BlockStatus, StackerDBListener, EVENT_RECEIVER_POLL,
};
use crate::neon::Counters;
use crate::Config;

#[cfg(test)]
/// Test-only value for storing the current rejection based timeout
pub static BLOCK_REJECTIONS_CURRENT_TIMEOUT: LazyLock<TestFlag<Duration>> =
LazyLock::new(TestFlag::default);

/// Base timeout for rejections heuristic
pub static BLOCK_REJECTIONS_TIMEOUT_BASE: u64 = 600;

/// The state of the signer database listener, used by the miner thread to
/// interact with the signer listener.
pub struct SignerCoordinator {
Expand Down Expand Up @@ -293,17 +308,24 @@ impl SignerCoordinator {
sortdb: &SortitionDB,
counters: &Counters,
) -> Result<Vec<MessageSignature>, NakamotoNodeError> {
// this is used to track the start of the waiting cycle
let mut rejections_timer = Instant::now();
// the amount of current rejections to eventually modify the timeout
let mut rejections: u32 = 0;
// default timeout
let mut rejections_timeout = Duration::from_secs(BLOCK_REJECTIONS_TIMEOUT_BASE);
// this is used for comparing block_status to identify if it has been changed from the previous event
let mut block_status_tracker = BlockStatus::default();
loop {
// At every iteration wait for the block_status.
// Exit when the amount of confirmations/rejections reaches the threshold (or until timeout)
// Based on the amount of rejections, eventually modify the timeout.
let block_status = match self.stackerdb_comms.wait_for_block_status(
block_signer_sighash,
&mut block_status_tracker,
rejections_timer,
rejections_timeout,
EVENT_RECEIVER_POLL,
|status| {
status.total_weight_signed < self.weight_threshold
&& status
.total_reject_weight
.saturating_add(self.weight_threshold)
<= self.total_weight
},
)? {
Some(status) => status,
None => {
Expand Down Expand Up @@ -336,10 +358,29 @@ impl SignerCoordinator {
return Err(NakamotoNodeError::BurnchainTipChanged);
}

if rejections_timer.elapsed() > rejections_timeout {
return Err(NakamotoNodeError::SigningCoordinatorFailure(
"Gave up while tried reaching the threshold".into(),
));
rdeioris marked this conversation as resolved.
Show resolved Hide resolved
}

continue;
}
};

if rejections != block_status.total_reject_weight {
rejections_timer = Instant::now();
rdeioris marked this conversation as resolved.
Show resolved Hide resolved
rejections = block_status.total_reject_weight;
rejections_timeout = Duration::from_secs_f32(
BLOCK_REJECTIONS_TIMEOUT_BASE as f32
- (BLOCK_REJECTIONS_TIMEOUT_BASE as f32
* ((rejections as f32 / self.weight_threshold as f32).powf(2.0))),
);
rdeioris marked this conversation as resolved.
Show resolved Hide resolved

#[cfg(test)]
BLOCK_REJECTIONS_CURRENT_TIMEOUT.set(rejections_timeout);
}

if block_status
.total_reject_weight
.saturating_add(self.weight_threshold)
Expand All @@ -357,10 +398,12 @@ impl SignerCoordinator {
"block_signer_sighash" => %block_signer_sighash,
);
return Ok(block_status.gathered_signatures.values().cloned().collect());
} else {
} else if rejections_timer.elapsed() > rejections_timeout {
return Err(NakamotoNodeError::SigningCoordinatorFailure(
"Unblocked without reaching the threshold".into(),
"Gave up while tried reaching the threshold".into(),
));
} else {
continue;
}
}
}
Expand Down
45 changes: 20 additions & 25 deletions testnet/stacks-node/src/nakamoto_node/stackerdb_listener.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ pub static TEST_IGNORE_SIGNERS: LazyLock<TestFlag<bool>> = LazyLock::new(TestFla
/// waking up to check timeouts?
pub static EVENT_RECEIVER_POLL: Duration = Duration::from_millis(500);

#[derive(Debug, Clone)]
#[derive(Debug, Clone, Default, PartialEq)]
pub struct BlockStatus {
pub responded_signers: HashSet<StacksPublicKey>,
pub gathered_signatures: BTreeMap<u32, MessageSignature>,
Expand Down Expand Up @@ -337,10 +337,8 @@ impl StackerDBListener {
block.gathered_signatures.insert(slot_id, signature);
block.responded_signers.insert(signer_pubkey);

if block.total_weight_signed >= self.weight_threshold {
// Signal to anyone waiting on this block that we have enough signatures
cvar.notify_all();
}
rdeioris marked this conversation as resolved.
Show resolved Hide resolved
// Signal to anyone waiting on this block that we have a new status
cvar.notify_all();

// Update the idle timestamp for this signer
self.update_idle_timestamp(
Expand Down Expand Up @@ -378,6 +376,7 @@ impl StackerDBListener {
}
};
block.responded_signers.insert(rejected_pubkey);

block.total_reject_weight = block
.total_reject_weight
.checked_add(signer_entry.weight)
Expand All @@ -396,14 +395,8 @@ impl StackerDBListener {
"server_version" => rejected_data.metadata.server_version,
);

if block
.total_reject_weight
.saturating_add(self.weight_threshold)
> self.total_weight
{
// Signal to anyone waiting on this block that we have enough rejections
cvar.notify_all();
}
// Signal to anyone waiting on this block that we have a new status
cvar.notify_all();

// Update the idle timestamp for this signer
self.update_idle_timestamp(
Expand Down Expand Up @@ -487,30 +480,32 @@ impl StackerDBListenerComms {

/// Get the status for `block` from the Stacker DB listener.
/// If the block is not found in the map, return an error.
/// If the block is found, call `condition` to check if the block status
/// satisfies the condition.
/// If the condition is satisfied, return the block status as
/// `Ok(Some(status))`.
/// If the condition is not satisfied, wait for it to be satisfied.
/// If the block is found, return it.
/// If the timeout is reached, return `Ok(None)`.
pub fn wait_for_block_status<F>(
pub fn wait_for_block_status(
&self,
block_signer_sighash: &Sha512Trunc256Sum,
block_status_tracker: &mut BlockStatus,
rejections_timer: std::time::Instant,
rejections_timeout: Duration,
timeout: Duration,
condition: F,
rdeioris marked this conversation as resolved.
Show resolved Hide resolved
) -> Result<Option<BlockStatus>, NakamotoNodeError>
where
F: Fn(&BlockStatus) -> bool,
{
) -> Result<Option<BlockStatus>, NakamotoNodeError> {
let (lock, cvar) = &*self.blocks;
let blocks = lock.lock().expect("FATAL: failed to lock block status");

let (guard, timeout_result) = cvar
.wait_timeout_while(blocks, timeout, |map| {
if rejections_timer.elapsed() > rejections_timeout {
return true;
}
let Some(status) = map.get(block_signer_sighash) else {
return true;
};
condition(status)
if status != block_status_tracker {
*block_status_tracker = status.clone();
return false;
}
return true;
})
.expect("FATAL: failed to wait on block status cond var");

Expand Down
89 changes: 89 additions & 0 deletions testnet/stacks-node/src/tests/signer/v0.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ use crate::event_dispatcher::{MinedNakamotoBlockEvent, TEST_SKIP_BLOCK_ANNOUNCEM
use crate::nakamoto_node::miner::{
TEST_BLOCK_ANNOUNCE_STALL, TEST_BROADCAST_STALL, TEST_MINE_STALL,
};
use crate::nakamoto_node::signer_coordinator::BLOCK_REJECTIONS_CURRENT_TIMEOUT;
use crate::nakamoto_node::stackerdb_listener::TEST_IGNORE_SIGNERS;
use crate::neon::Counters;
use crate::run_loop::boot_nakamoto;
Expand Down Expand Up @@ -7802,6 +7803,94 @@ fn block_validation_response_timeout() {
);
}

// Verify that the miner timeout while waiting for signers will change accordingly
// to rejections.
#[test]
#[ignore]
fn block_validation_check_rejection_timeout_heuristic() {
if env::var("BITCOIND_TEST") != Ok("1".into()) {
return;
}

info!("------------------------- Test Setup -------------------------");
let num_signers = 5;
let timeout = Duration::from_secs(30);
let sender_sk = Secp256k1PrivateKey::new();
let sender_addr = tests::to_addr(&sender_sk);
let send_amt = 100;
let send_fee = 180;

let mut signer_test: SignerTest<SpawnedSigner> = SignerTest::new_with_config_modifications(
num_signers,
vec![(sender_addr, send_amt + send_fee)],
|config| {
config.block_proposal_validation_timeout = timeout;
},
|_| {},
None,
None,
);

let all_signers: Vec<_> = signer_test
.signer_stacks_private_keys
.iter()
.map(StacksPublicKey::from_private)
.collect();

signer_test.boot_to_epoch_3();

// note we just use mined nakamoto_blocks as the second block is not going to be confirmed

rdeioris marked this conversation as resolved.
Show resolved Hide resolved
info!("------------------------- Check Rejections-based timeout with 1 rejection -------------------------");

let blocks_before = test_observer::get_mined_nakamoto_blocks().len();

TEST_REJECT_ALL_BLOCK_PROPOSAL.set(vec![all_signers[4]]);
TEST_IGNORE_ALL_BLOCK_PROPOSALS.set(vec![
all_signers[0],
all_signers[1],
all_signers[2],
all_signers[3],
]);

next_block_and(
&mut signer_test.running_nodes.btc_regtest_controller,
30,
|| Ok(test_observer::get_mined_nakamoto_blocks().len() > blocks_before),
)
.unwrap();

signer_test
.wait_for_block_rejections(timeout.as_secs(), &[all_signers[4]])
.unwrap();

assert_eq!(BLOCK_REJECTIONS_CURRENT_TIMEOUT.get().as_secs(), 551);

info!("------------------------- Check Rejections-based timeout with 2 rejections -------------------------");

let blocks_before = test_observer::get_mined_nakamoto_blocks().len();

TEST_REJECT_ALL_BLOCK_PROPOSAL.set(vec![all_signers[3], all_signers[4]]);
TEST_IGNORE_ALL_BLOCK_PROPOSALS.set(vec![all_signers[0], all_signers[1], all_signers[2]]);

next_block_and(
&mut signer_test.running_nodes.btc_regtest_controller,
30,
|| Ok(test_observer::get_mined_nakamoto_blocks().len() > blocks_before),
)
.unwrap();

signer_test
.wait_for_block_rejections(timeout.as_secs(), &[all_signers[3], all_signers[4]])
.unwrap();

assert_eq!(BLOCK_REJECTIONS_CURRENT_TIMEOUT.get().as_secs(), 404);

// reset reject/ignore
TEST_REJECT_ALL_BLOCK_PROPOSAL.set(vec![]);
TEST_IGNORE_ALL_BLOCK_PROPOSALS.set(vec![]);
}
rdeioris marked this conversation as resolved.
Show resolved Hide resolved

/// Test scenario:
///
/// - when a signer submits a block validation request and
Expand Down
Loading