You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I opened two PRs to resolve some flakiness in two different tests (#5733 and #5769). In both cases, the test flake was caused by the "wait for a block commit" logic. The issue in one is that the block commit being waited on wasn't submitted for the correct burn block and in the other, the commit was pointing at the wrong stacks tip. There is a function which attempts to do the "right thing" when mining a tenure:
pubfnnext_block_and_wait_for_commits(btc_controller:&mutBitcoinRegtestController,timeout_secs:u64,coord_channels:&[&Arc<Mutex<CoordinatorChannels>>],commits_submitted:&[&Arc<AtomicU64>],wait_for_stacks_block:bool,) -> Result<(),String>{let commits_submitted:Vec<_> = commits_submitted.to_vec();let blocks_processed_before:Vec<_> = coord_channels
.iter().map(|x| {
x.lock().expect("Mutex poisoned").get_stacks_blocks_processed()}).collect();let commits_before:Vec<_> = commits_submitted
.iter().map(|x| x.load(Ordering::SeqCst)).collect();letmut block_processed_time:Vec<Option<Instant>> = vec![None; commits_before.len()];letmut commit_sent_time:Vec<Option<Instant>> = vec![None; commits_before.len()];next_block_and(btc_controller, timeout_secs, || {for i in0..commits_submitted.len(){let commits_sent = commits_submitted[i].load(Ordering::SeqCst);let blocks_processed = coord_channels[i].lock().expect("Mutex poisoned").get_stacks_blocks_processed();let now = Instant::now();if blocks_processed > blocks_processed_before[i] && block_processed_time[i].is_none(){
block_processed_time[i].replace(now);}if commits_sent > commits_before[i] && commit_sent_time[i].is_none(){
commit_sent_time[i].replace(now);}}if !wait_for_stacks_block {for i in0..commits_submitted.len(){// just wait for the commitlet commits_sent = commits_submitted[i].load(Ordering::SeqCst);if commits_sent <= commits_before[i]{returnOk(false);}// if two commits have been sent, one of them must have been afterif commits_sent >= commits_before[i] + 1{continue;}returnOk(false);}returnOk(true);}// waiting for both commit and stacks blockfor i in0..commits_submitted.len(){let blocks_processed = coord_channels[i].lock().expect("Mutex poisoned").get_stacks_blocks_processed();let commits_sent = commits_submitted[i].load(Ordering::SeqCst);if blocks_processed > blocks_processed_before[i]{// either we don't care about the stacks block count, or the block count advanced.// Check the block-commits.let block_processed_time = block_processed_time[i].as_ref().ok_or("TEST-ERROR: Processed block time wasn't set")?;if commits_sent <= commits_before[i]{returnOk(false);}let commit_sent_time = commit_sent_time[i].as_ref().ok_or("TEST-ERROR: Processed commit time wasn't set")?;// try to ensure the commit was sent after the block was processedif commit_sent_time > block_processed_time {continue;}// if two commits have been sent, one of them must have been afterif commits_sent >= commits_before[i] + 2{continue;}// otherwise, just timeout if the commit was sent and its been long enough// for a new commit pass to have occurredif block_processed_time.elapsed() > Duration::from_secs(10){continue;}returnOk(false);}else{returnOk(false);}}Ok(true)})}
That logic is prone to flakiness (the Duration::from_secs(10) being a dead give away). But many tests don't even rely on that function, instead embedding some of the commit waiting logic themselves. So, a refactor here would probably replace that function with one that relied on the new Counters variables that actually capture the committed to stacks height and burn heights, and then we'd also need to go through the other tests and figure out some common logic that could be factored (or just be lazy about it, and fix their logic whenever flakiness manifests).
The text was updated successfully, but these errors were encountered:
I opened two PRs to resolve some flakiness in two different tests (#5733 and #5769). In both cases, the test flake was caused by the "wait for a block commit" logic. The issue in one is that the block commit being waited on wasn't submitted for the correct burn block and in the other, the commit was pointing at the wrong stacks tip. There is a function which attempts to do the "right thing" when mining a tenure:
That logic is prone to flakiness (the
Duration::from_secs(10)
being a dead give away). But many tests don't even rely on that function, instead embedding some of the commit waiting logic themselves. So, a refactor here would probably replace that function with one that relied on the new Counters variables that actually capture the committed to stacks height and burn heights, and then we'd also need to go through the other tests and figure out some common logic that could be factored (or just be lazy about it, and fix their logic whenever flakiness manifests).The text was updated successfully, but these errors were encountered: