Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce missing-at/last-missing timestamps #1811

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

evanlinjin
Copy link
Member

@evanlinjin evanlinjin commented Jan 24, 2025

Fixes #1740.

Description

This PR replaces #1765. For context and the original discussion that led to this change, please refer to this comment.

This PR addresses a potential malicious double-spending issue by introducing improvements to unconfirmed transaction tracking. Key changes include the addition of TxUpdate::missing that tracks transactions that have been replaced and are no longer in the mempool, and the inclusion of last_missing and missing_at timestamps in TxGraph to track when a transaction was last deemed missing.

SpkWithExpectedTxids is introduced in SpkClient to track expected Txids for each spk. During a sync, if any Txids from SpkWithExpectedTxids are not in the current history of an spk obtained from the chain source, those Txids are considered missing. Support for SpkWithExpectedTxids has been added to both bdk_electrum and bdk_esplora chain source crates.

The canonicalization algorithm is updated to disregard transactions with a last_missing timestamp greater than or equal to their last_seen timestamp, except in cases where transitivity rules apply.

Changelog notice

  • TxUpdate::missing tracks transactions that have been replaced and are no longer present in mempool.
  • last_missing and missing_at timestamps in TxGraph track when a transaction was last marked as missing from the mempool.
  • The canonicalization algorithm now disregards transactions with a last_missing timestamp greater than or equal to it's last_seen timestamp, except when a canonical descendant exists due to rules of transitivity.
  • SpkWithExpectedTxids in SpkClient keeps track of expected Txids for each spk.
  • SpkWithExpectedTxids support added for bdk_electrum and bdk_esplora.
  • SyncRequestBuilder::expected_txids_of_spk adds an association between Txid and spk.
  • SyncRequestExt::check_unconfirmed_statuses adds unconfirmed transactions alongside their spks during sync.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

This may, if we introduce new fields to `TxUdpate`, they can be minor
non-breaking updates.
@evanlinjin evanlinjin changed the title Introduce MissingMarker Introduce missing-at/last-missing timestamps Jan 24, 2025
This is a set of txids missing from the mempool.
@notmandatory notmandatory added the bug Something isn't working label Jan 24, 2025
@notmandatory
Copy link
Member

Looks like only breaking changes are on core and chain, so we'll be able to roll this out in a non-breaking 1.x wallet release?

@LLFourn
Copy link
Contributor

LLFourn commented Jan 28, 2025

@notmandatory core breaking change is also a wallet crate breaking change unfortunately. It is a minor breaking change (some struct from core is changing that no one inspects directly in their code probably) but it technically is a breaking change so it would have to be in bdk_wallet v2. This is a pretty important improvement so I'd say it justifies a v2 release by itself when it's merged.

Copy link
Contributor

@LLFourn LLFourn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConceptACK. Nice ideas here. I think chain API needs a clean up.

@@ -879,6 +881,23 @@ pub trait SyncRequestBuilderExt<K> {

/// Add [`Script`](bitcoin::Script)s that are revealed by the `indexer` but currently unused.
fn unused_spks_from_indexer(self, indexer: &KeychainTxOutIndex<K>) -> Self;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I don't really see the attraction of a extension trait for this stuff. I think this should all be just member methods.

fn check_for_missing_txs<A, C>(
self,
indexer: &KeychainTxOutIndex<K>,
canonical_iter: CanonicalIter<A, C>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems extremely awkward. What about ChainOracles that are not infallible? Why does it need be an KeychainTxOutIndex -- it just calls inner in the implementation.

How about you just put this method on the tx graph. Like:

impl<I, D: AsRef<SpkTxOutIndex<I>>> IndexedTxGraph<D> {
    fn sync_request_add_expected_unconfirmed_txs<O: ChainOracle>(&self, sync_request: &mut SyncRequestBuilder, chain_oracle: &O, range: impl RangeBounds<I>) -> Result<(), O::Error> { ... }
}

impl<K> IndexedTxGraph<KeychainTxOutIndex<K>> {
    fn keychain_sync_request_add_expected_unconfirmed_txs<O: ChainOracle>(&self, sync_request: &mut SyncRequestBuilder, chain_oracle: &O, range: impl RangeBounds<K>) -> Result<(), O::Error> { ... }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on this comment I put some ideas for a IndexedTxGraph method in evanlinjin#14

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LLFourn, the infallible part was pure laziness.

I wanted to have the extension traits since bdk_core::spk_client types have a builder API. However, @ValuedMammal's has already provided a superior solution - I'm happy to go with that!

@@ -334,4 +334,21 @@ impl<I: Clone + Ord + core::fmt::Debug> SpkTxOutIndex<I> {
.any(|output| self.spk_indices.contains_key(&output.script_pubkey));
input_matches || output_matches
}

/// Find relevant script pubkeys associated with a transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ doesn't define relevant here.


/// Find relevant script pubkeys associated with a transaction.
///
/// The script pubkeys of the transaction's outputs and previous outputs as scanned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unhelpful sentence. Why. What do I use this method for.

Also needs to explain that inputs would not be detected as relevant unless the parents have been scanned.

@@ -715,6 +717,30 @@ impl<A: Anchor> TxGraph<A> {
changeset
}

/// Inserts the given `missing_at` for `txid`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 there is no explanation of missing_at.

@evanlinjin evanlinjin marked this pull request as ready for review January 30, 2025 12:33
@@ -145,6 +145,7 @@ pub struct TxGraph<A = ConfirmationBlockTime> {
spends: BTreeMap<OutPoint, HashSet<Txid>>,
anchors: HashMap<Txid, BTreeSet<A>>,
last_seen: HashMap<Txid, u64>,
last_missing: HashMap<Txid, u64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if we had a data structure like this and then last_seen would become HashMap<Txid, LastSeen>

#[derive(Debug, Clone, Copy, Default, PartialEq)]
struct LastSeen {
    /// seen at
    seen_at: u64,
    /// evicted at
    evicted_at: Option<u64>,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like this idea since there has been talks about adding more timestamps.

I.e. first-seen timestamp, which will help with tx ordering when listing transactions.

struct MempoolTimestamps {
    pub first_seen: u64,
    pub last_seen: u64,
    pub last_evicted: Option<u64>,
}

I also like the term evicted as it's more descriptive than missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and +1 on this idea, it makes it clear, and easy to "derive" its lifecycle and it's and can be useful info to expose to the users.

.flat_map(|(txid, anchors)| anchors.into_iter().map(move |a| (a, txid)))
.collect();
tx_update.seen_ats = graph.last_seen.into_iter().collect();
tx_update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like TxUpdate would need another field evicted_at in order to maintain roundtrip convertibility between TxUpdate and TxGraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the convertibility between TxGraph and TxUpdate is important? I'm thinking maybe we should just get rid of it. One can just construct an empty TxGraph and apply an TxUpdate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably ok to forego complete fungibility between graphs and updates, but at the same time I wonder how useful are the seen-ats without the corresponding evictions?

crates/chain/src/tx_graph.rs Outdated Show resolved Hide resolved
crates/chain/src/tx_graph.rs Outdated Show resolved Hide resolved
crates/chain/src/indexer/keychain_txout.rs Show resolved Hide resolved
crates/core/src/spk_client.rs Outdated Show resolved Hide resolved
crates/core/src/spk_client.rs Outdated Show resolved Hide resolved
@@ -145,6 +145,7 @@ pub struct TxGraph<A = ConfirmationBlockTime> {
spends: BTreeMap<OutPoint, HashSet<Txid>>,
anchors: HashMap<Txid, BTreeSet<A>>,
last_seen: HashMap<Txid, u64>,
last_missing: HashMap<Txid, u64>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like this idea since there has been talks about adding more timestamps.

I.e. first-seen timestamp, which will help with tx ordering when listing transactions.

struct MempoolTimestamps {
    pub first_seen: u64,
    pub last_seen: u64,
    pub last_evicted: Option<u64>,
}

I also like the term evicted as it's more descriptive than missing.

.flat_map(|(txid, anchors)| anchors.into_iter().map(move |a| (a, txid)))
.collect();
tx_update.seen_ats = graph.last_seen.into_iter().collect();
tx_update
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the convertibility between TxGraph and TxUpdate is important? I'm thinking maybe we should just get rid of it. One can just construct an empty TxGraph and apply an TxUpdate.

fn check_for_missing_txs<A, C>(
self,
indexer: &KeychainTxOutIndex<K>,
canonical_iter: CanonicalIter<A, C>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LLFourn, the infallible part was pure laziness.

I wanted to have the extension traits since bdk_core::spk_client types have a builder API. However, @ValuedMammal's has already provided a superior solution - I'm happy to go with that!

evanlinjin and others added 10 commits February 3, 2025 01:00
The evicted-at and last-evicted timestamp informs the `TxGraph` when the
transaction was last deemed as missing from the mempool.

The canonicalization algorithm is changed to disregard transactions with
a last-missing timestamp greater or equal to it's last-seen timestamp.
The exception is when we have a canonical descendant due to rules of
transitivity.
This is for conveniently adding associations of txid <-> spk. We expect
that these txids exist in the spk history. Otherwise, it means the tx is
evicted from the mempool and we need to update the `missing_at` value in
the sync response.
This is a convenience method for adding unconfirmed txs alongside their
associated spks the the sync request. This way, we will be able to
detect evictions of these transactions from the mempool.
Rename `SyncRequestBuilderExt::check_unconfirmed_statuses` to
`SyncRequestBuilderExt::check_for_missing_txs` and update docs.
Copy link
Contributor

@oleonardolima oleonardolima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general ConceptACK, some small nits.

/// Transactions without last-seens are excluded.
pub fn txids_by_descending_last_seen(&self) -> impl ExactSizeIterator<Item = (u64, Txid)> + '_ {
self.txs_by_last_seen.iter().copied().rev()
/// Transactions without last-seens are excluded. Transactions with a last-missing timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Transactions without last-seens are excluded. Transactions with a last-missing timestamp
/// Transactions without last-seens are excluded. Transactions with a last-evicted timestamp

@@ -1139,6 +1182,8 @@ pub struct ChangeSet<A = ()> {
pub anchors: BTreeSet<(A, Txid)>,
/// Added last-seen unix timestamps of transactions.
pub last_seen: BTreeMap<Txid, u64>,
/// Added timestamps of when a transaction is last missing from the mempool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Added timestamps of when a transaction is last missing from the mempool.
/// Added timestamps of when a transaction is last evicted from the mempool.

@@ -145,6 +145,7 @@ pub struct TxGraph<A = ConfirmationBlockTime> {
spends: BTreeMap<OutPoint, HashSet<Txid>>,
anchors: HashMap<Txid, BTreeSet<A>>,
last_seen: HashMap<Txid, u64>,
last_missing: HashMap<Txid, u64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and +1 on this idea, it makes it clear, and easy to "derive" its lifecycle and it's and can be useful info to expose to the users.

@@ -765,6 +791,14 @@ impl<A: Anchor> TxGraph<A> {
changeset.merge(self.insert_seen_at(txid, seen_at));
}
}
for txid in update.missing {
Copy link
Contributor

@oleonardolima oleonardolima Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd help to update apply_update_at documentation in order to mention the evicted_at too, in lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Undetected double-spent
6 participants