Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: cleanup nakamoto/signer test commit logic #5770

Open
kantai opened this issue Jan 29, 2025 · 0 comments
Open

Refactor: cleanup nakamoto/signer test commit logic #5770

kantai opened this issue Jan 29, 2025 · 0 comments

Comments

@kantai
Copy link
Member

kantai commented Jan 29, 2025

I opened two PRs to resolve some flakiness in two different tests (#5733 and #5769). In both cases, the test flake was caused by the "wait for a block commit" logic. The issue in one is that the block commit being waited on wasn't submitted for the correct burn block and in the other, the commit was pointing at the wrong stacks tip. There is a function which attempts to do the "right thing" when mining a tenure:

pub fn next_block_and_wait_for_commits(
    btc_controller: &mut BitcoinRegtestController,
    timeout_secs: u64,
    coord_channels: &[&Arc<Mutex<CoordinatorChannels>>],
    commits_submitted: &[&Arc<AtomicU64>],
    wait_for_stacks_block: bool,
) -> Result<(), String> {
    let commits_submitted: Vec<_> = commits_submitted.to_vec();
    let blocks_processed_before: Vec<_> = coord_channels
        .iter()
        .map(|x| {
            x.lock()
                .expect("Mutex poisoned")
                .get_stacks_blocks_processed()
        })
        .collect();
    let commits_before: Vec<_> = commits_submitted
        .iter()
        .map(|x| x.load(Ordering::SeqCst))
        .collect();

    let mut block_processed_time: Vec<Option<Instant>> = vec![None; commits_before.len()];
    let mut commit_sent_time: Vec<Option<Instant>> = vec![None; commits_before.len()];
    next_block_and(btc_controller, timeout_secs, || {
        for i in 0..commits_submitted.len() {
            let commits_sent = commits_submitted[i].load(Ordering::SeqCst);
            let blocks_processed = coord_channels[i]
                .lock()
                .expect("Mutex poisoned")
                .get_stacks_blocks_processed();
            let now = Instant::now();
            if blocks_processed > blocks_processed_before[i] && block_processed_time[i].is_none() {
                block_processed_time[i].replace(now);
            }
            if commits_sent > commits_before[i] && commit_sent_time[i].is_none() {
                commit_sent_time[i].replace(now);
            }
        }

        if !wait_for_stacks_block {
            for i in 0..commits_submitted.len() {
                // just wait for the commit
                let commits_sent = commits_submitted[i].load(Ordering::SeqCst);
                if commits_sent <= commits_before[i] {
                    return Ok(false);
                }

                // if two commits have been sent, one of them must have been after
                if commits_sent >= commits_before[i] + 1 {
                    continue;
                }
                return Ok(false);
            }
            return Ok(true);
        }

        // waiting for both commit and stacks block
        for i in 0..commits_submitted.len() {
            let blocks_processed = coord_channels[i]
                .lock()
                .expect("Mutex poisoned")
                .get_stacks_blocks_processed();
            let commits_sent = commits_submitted[i].load(Ordering::SeqCst);

            if blocks_processed > blocks_processed_before[i] {
                // either we don't care about the stacks block count, or the block count advanced.
                // Check the block-commits.
                let block_processed_time = block_processed_time[i]
                    .as_ref()
                    .ok_or("TEST-ERROR: Processed block time wasn't set")?;
                if commits_sent <= commits_before[i] {
                    return Ok(false);
                }
                let commit_sent_time = commit_sent_time[i]
                    .as_ref()
                    .ok_or("TEST-ERROR: Processed commit time wasn't set")?;
                // try to ensure the commit was sent after the block was processed
                if commit_sent_time > block_processed_time {
                    continue;
                }
                // if two commits have been sent, one of them must have been after
                if commits_sent >= commits_before[i] + 2 {
                    continue;
                }
                // otherwise, just timeout if the commit was sent and its been long enough
                //  for a new commit pass to have occurred
                if block_processed_time.elapsed() > Duration::from_secs(10) {
                    continue;
                }
                return Ok(false);
            } else {
                return Ok(false);
            }
        }
        Ok(true)
    })
}

That logic is prone to flakiness (the Duration::from_secs(10) being a dead give away). But many tests don't even rely on that function, instead embedding some of the commit waiting logic themselves. So, a refactor here would probably replace that function with one that relied on the new Counters variables that actually capture the committed to stacks height and burn heights, and then we'd also need to go through the other tests and figure out some common logic that could be factored (or just be lazy about it, and fix their logic whenever flakiness manifests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Status: 🆕 New
Development

No branches or pull requests

1 participant