Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-???? | Removal of Epoch Boundary Blocks #974

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nfrisby
Copy link

@nfrisby nfrisby commented Jan 29, 2025

A CIP for the community to use to discuss removing Epoch Boundary Blocks (EBBs) from the historical Cardano chain.

To do so would ultimately simplify the reference node's specification and implementation, as well as that of any newly developed node etc and potentially various tooling as well --- EBBs are an onerous historical design mistake and it would be a relief to stop preserving them.

Rendered: https://github.com/nfrisby/CIPs/blob/nfrisby/EBBs-CIP/CIP-XXXX/README.md

@nfrisby
Copy link
Author

nfrisby commented Jan 29, 2025

@kderme @jpraynaud As promised, I'm pinging you now that I've written up the basic plan for removing EBBs.

Please --- anyone who is reading this --- ping folks who author Cardano tools that would likely be affected by this retcon of the historical chain. Thank you!

Relevant summary so you can quickly judge who might care:

  • No regular block's content nor hash would change.
  • EBBs would simply be trimmed out of the chain --- nodes would no longer send (accept) them when serving (syncing) the historical chain.
  • EBBs are useless blocks: they do not at all affect the protocol nor ledger state.
  • As of the culmination of the CIP's plan, the resulting gaps in the prev-hash links would be patched up using a lookup table that contains the hashes and prev-hashes of all EBBs that have ever existed.
  • The on-disk files would change, since they currently contain the EBBs.
  • The code's various corner cases for EBBs could finally be removed; all consolidated down to that very isolated prev-hash fixup.

@jpraynaud
Copy link
Contributor

@scarmuega this will probably have an impact on Pallas

@nfrisby
Copy link
Author

nfrisby commented Jan 30, 2025

@abailly @KtorZ @agaffney @AndrewWestberg I'd appreciate your feedback on how disruptive this CIP would be for the nodes/tooling you design and maintain. Thanks!

(This CIP made me starkly realize that Mithril snapshots are something the various nodes are going to have to be able to interoperate on, so I suppose there's a whole other conversation to have there too cc @jpraynaud .)

Edit: FYI, on a side-conversation in a call, Arnaud just expressed that the long-term plan for Mithril was/has been to have a dedicated format for it (specified in a CIP), but that existing files were used directly for now, to get the ball rolling.

Edit 2: Arnaud elaborated in a comment below, #974 (comment)

@jpraynaud
Copy link
Contributor

@nfrisby if the immutable files from the Byron era are rewritten at Cardano node startup, this could create problems with the certification made by Mithril for the Cardano database:

It could be stopped because there is a split between signers (some use the digests of the previous immutable files to sign -because it is in their Mithril signer cache or they have not updated their Cardano node yet- and the others use the digests from the new immutable files), which prevents reaching the quorum to create a multi-signature.

Today the digest is computed as a hash of the files contents. There may be some other way to compute this digest of the immutable chunk files by using the blocks inside the file and by skipping EBB blocks during the first stage you propose. This would need a new Mithril era (i.e. all the signer nodes sign differently at a given epoch transition, which means that a very high majority of the signers run a compatible version) but this could work. However, I don't know if we could compute the digests for the primary and secondary indexes with the same trick.

Regarding the certification of the Cardano transactions, as there is no transactions in the EBB blocks, there should be no impact.

We also have plans to certify the blocks themselves, and in that case we would simply skip the EBBs (as we can detect them with the Pallas library). Depending on when this happens, this could also require a new Mithril era.

@abailly
Copy link

abailly commented Jan 30, 2025

The fact that mithril signs and serves the cardano-node DB is a historical artefact: it was the most useful and easiest thing we could do when we started the project, and it proved quite effective at boostrapping cardano-node. But of course, we would like to avoid depending on the specific implementation details of one implementation.

Ideally, I would like to:

  • Have mithril provide a certified chain of blocks in some specified (in a CIP?) interchange format, perhaps something as simple as a zstandard concatenation of all blocks grouped by 100s or something
  • Have mithril provide a certified ledger snapshot at some point in some specified interchange format
  • Have a tool/command in cardano-node to import such raw data. This is the path we have started following with @KtorZ in Amaru (see import command) because it helps us

@agaffney
Copy link

agaffney commented Jan 30, 2025

@nfrisby would this eventually end up leaving a gap in the block type IDs returned in the NtC chainsync payload? Could it go even further and get rid of that gap to make those NtC block type IDs the same as the NtN block header IDs and era ID?

I'm a bit dubious on the static mapping of known EBBs and their prev-hash values. While the information has already been gathered and is relatively easy to copy, it kinda becomes a "tribal knowledge" sort of thing that will trip up people doing new implementations. There's also no good way to verify that information in the future if it's removed from the chain, so it may be possible for an attacker to trick someone into using a "bad" list.

@nfrisby
Copy link
Author

nfrisby commented Jan 30, 2025

would this eventually end up leaving a gap in the block type IDs returned in the NtC chainsync payload? Could it go even further and get rid of that gap to make those NtC block type IDs the same as the NtN block header IDs and era ID?

I don't follow this question. What are the "block type IDs" and "era ID" etc? My guess is that my anwser is "no", since there would no observable change in behavior except for the fact that the ChainSync and BlockFetch protocols would not send EBBs.

it kinda becomes a "tribal knowledge" sort of thing that will trip up people doing new implementations

I agree with this. I don't know how to handle it, other than to put up loud blinking marquees throughout the Byron ledger spec about this CIP.

so it may be possible for an attacker to trick someone into using a "bad" list.

My current thought process is that the list will be baked in the source code, so if an attacker could manipulate my list then they could manipulate everything about my node 🤷. There is still the question of how to ensure the correct list of known EBBs is securely preserved indefinitely --- I don't know how to do that on the level of a long-lived community like this one. We'd need a "secure" version of this CIP repo, eg?

@agaffney
Copy link

would this eventually end up leaving a gap in the block type IDs returned in the NtC chainsync payload? Could it go even further and get rid of that gap to make those NtC block type IDs the same as the NtN block header IDs and era ID?

I don't follow this question. What are the "block type IDs" and "era ID" etc? My guess is that my anwser is "no", since there would no observable change in behavior except for the fact that the ChainSync and BlockFetch protocols would not send EBBs.

In the node-to-client version of the chainsync protocol, each block payload has a header that includes the block type ID. This is 0/1 for Byron main/EBB blocks, 2 for Shelley, 3 for Allegra, etc.

The node-to-node version of the chainsync protocol also has a payload header a similar identifier, but instead it's just the era ID (0 for Byron, 1 for Shelley, 2 for Allegra, etc.). This ID also matches the TX type IDs used when submitting a transaction via LocalTxSubmission.

It's the difference in chainsync payload format combined with Byron having 2 block types that causes this discrepancy in the type IDs used for blocks in chainsync. It would be nice to get these synced back up, and it's relatively doable since it all happens at the protocol level.

@abailly
Copy link

abailly commented Jan 30, 2025

Perhaps the delineation of a common format for the chain could be led by Mithril Team? It seems to me that simply reusing the existing node format, perhaps with some adjustments, but with a clear specification, would go a long way towards the goal of standardising. Then Mithril node, as a standalone node or component within a node would only need to follow the chain and construct that data himself, and each node would be free to store whatever they want the way they want.

@abailly
Copy link

abailly commented Jan 30, 2025

Also I wonder: Would it not be possible for the cardano-node to optimise its storage, yet keep a backward compatible behaviour by emitting those pesky EBB nodes "on-the-fly"?
It seems to me important to separate the official part (the protocol) from the unofficial part (what people do with the cardano-node storage).

@nfrisby
Copy link
Author

nfrisby commented Jan 30, 2025

@agaffney Thanks for pointing this discrepancy out to me. I had never had cause to notice this; it was designed before my time and hasn't broke anything for my team since. I agree that it's burdensome and surprising.

I see the discrepancy in the code, but I do not ultimately know why it is this way. Specifically, I don't know why the (modern) HFC on-the-wire codec for blocks directly reuses the on-the-disk codec --- I agree with you that it doesn't have to. (The HFC instead does "the compositional thing" for headers.) I could imagine some reasons for why it was this way at first (eg the prevailing HFC design goal that a single-era HFC block is indistinguishable from that single era's block), but I don't see why it would need to still be this way.

It would be nice to get these synced back up, and it's relatively doable since it all happens at the protocol level.

By "these", do you mean the codecs for the HFC envelope on blocks versus on headers and on txs? Or do you mean the tag in the envelope actually matching the era index (as is currently the case in headers)?

If this CIP were to go through, then there would be no EBBs. Unfortunately, the existing Byron codec uses the "era tag" 0 for EBBs and 1 for regular blocks. So if we left all bytes as-is (except for deleting the EBBs) the era tags would be 1-based instead of 0-based 🤷. But at least it'd be one tag reserved per era, which is less surpising.

We could change the on-the-wire codec for Cardano blocks to have the same kind of envelope that headers and txs do today, ie even if this CIP is rejected. So I hesitate to bundle that change up with this CIP. If this CIP is accepted, then the codec for Byron itself could be simplified: its 0-versus-1 tag would be unnecessary. Moreover, if we also want to simplify the Byron on-disk codec, then we'd need another database migration --- so maybe it would be nice to combine that migration with this CIP's migration.

(I'm so far assuming that this 0-versus-1 tag in Byron does not influence any of the identifying hashes, but that must be double-checked at some point.)

@nfrisby
Copy link
Author

nfrisby commented Jan 30, 2025

@abailly

It seems to me important to separate the official part (the protocol) from the unofficial part (what people do with the cardano-node storage).

I agree with this sentence.

Also I wonder: Would it not be possible for the cardano-node to optimise its storage, yet keep a backward compatible behaviour by emitting those pesky EBB nodes "on-the-fly"?

But I disagree with this one. My goal with this CIP is not merely to simplify the storage. I also want to simplify the protocol itself. It just so happens those most of the explicit corner cases for EBBs are in the storage layer. But EBBs themselves obstruct other aspects of the era-agnostic design as well, not just the storage layer. And it'd be really nice if no one would ever have to consider EBBs in the future beyond "they were a historical mistake that necessitates today's Byron block codec doing the prev-hash swap thing with that one static table".

Edit: to clarify, it would be possible to remove EBBs from the storage layer but inject them into the ChainSync and BlockFetch protocols. And perhaps it'd be easier to maintain the complexity there (or maybe it could even been done more simply still in the storage layer now that there's just a finite list of them). But I think it'd be more beneficial to many people in the future to go even further and eliminate them as much as possible (ie confine them to an aberration in the Byron codec).

@agaffney
Copy link

It would be nice to get these synced back up, and it's relatively doable since it all happens at the protocol level.

By "these", do you mean the codecs for the HFC envelope on blocks versus on headers and on txs? Or do you mean the tag in the envelope actually matching the era index (as is currently the case in headers)?

I was just referring to the block/header type tag/IDs in the chainsync payload header

@abailly
Copy link

abailly commented Jan 31, 2025

@nfrisby Understood, what I meant was to provide an incremental and smooth migration path, something like:

Proto version Feature Node Version Storage
$v_n$ serve EBB $v_m$ store EBB
$v_n$ serve EBB $v_{m+1}$ does not store EBB
$v_{n+1}$ does not serve EBB $v_{m+1}$ does not store EBB

@nfrisby
Copy link
Author

nfrisby commented Jan 31, 2025

@abailly the CIP draft includes the following in the Specification section. It's different than the table in your comment, but I'm not seeing which difference is crucial. What do you think?

Node Version Proto Upstream Proto Downstream Storage
Stage Zero (ie today's node) 🔴 require EBBs 🔴 serve EBBs 🔴 store EBBs
Stage One 🟡 treat EBBs as optional 🔴 serve EBBs 🔴 store EBBs
Stage Two 🟡 treat EBBs as optional 🟢 skip EBBs 🟢 do not store EBBs
Stage Three 🟢 reject EBBs 🟢 skip EBBs 🟢 do not store EBBs

@abailly
Copy link

abailly commented Jan 31, 2025

I think that I should have read the CIP more carefully before commenting, thanks pointing that out @nfrisby

@abailly
Copy link

abailly commented Jan 31, 2025

On second thought, I think the difference is that I would not make a difference based on whether you are upstream or downstream, but based on version negotiation, so that an updated node can still serve/receive EBBs depending on the peer's version. But perhaps that's already covered in the CIP too ;)

As of Stage Three, all nodes would relay a chain that contains no EBBs and would never store them.
EBBs could now be entirely removed from the node specification and implementation, except for the prev-hash overrides described in this CIP.

It would be possible to accelerate this plan to merely two stages, at the cost of the additional complexity necessary for the Stage One node to conditionally send EBBs depending on the mini protocol version negotiated with each downstream peer.
Copy link
Author

@nfrisby nfrisby Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abailly I think this paragraph is the extent of what the current draft says relevant to your suggestion in this comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm overlooking something? Let me know!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I elaborated this section, adding the tables we were using in our other discussion.

@KtorZ
Copy link
Member

KtorZ commented Feb 5, 2025

@nfrisby I'd appreciate your feedback on how disruptive this CIP would be for the nodes/tooling you design and maintain. Thanks!

In my case, not disruptive at all.

Ogmios

Doesn't really care about EBB, as Ogmios will "simply" proxy answers from the (local-)chain-sync protocol down to clients. It will make my life slightly easier as it removes the need to document them, and explain why people should just ignore them.

Kupo

Kupo's interface with the chain via transactions more than blocks. So I mostly leverage the extractTxs method from the HasTx class defined in ouroboros-consensus, which yields an empty list for EBB.

When plugged into Ogmios as a data-source instead of a Haskell node, I simply default to an empty list of transactions when running into an EBB (so, similar behaviour down the line)

Aiken

No impact whatsoever, blocks aren't visible at the smart contract level, nor do they surface when simulating / evaluating transactions.

Amaru

In Amaru, we always bootstrap from somewhat recent snapshots. So we completely alleviate the problem: Byron doesn't even exist for Amaru. The strategy is to keep producing snapshots so that we always ever need to support only two eras: the current, and the next one.

This is possible because Ouroboros gives us strong immutability guarantees after 2160 blocks. And by leveraging Mithril, we can hopefully circumvent dynamic availability attacks that come with it (lengthy discussion probably, but not the point of this CIP I reckon).


So all-in-all, all good. But also, little impact too IMO. I don't know many dApps that are interested in re-validating old data; the snapshot strategy sounds like something we could make more generally adopted overall:

  • Make mithril a first-class citizen in the (all) node implementation
  • Possibly add or revise incentives to accommodate for running Mithril signers as part of the SPO duties
  • Bootstrap from stake-based-signed snapshots, and get rid of old eras altogether.

@rphair rphair changed the title CIP-XXXX | Removal of Epoch Boundary Blocks CIP-???? | Removal of Epoch Boundary Blocks Feb 5, 2025
@nfrisby
Copy link
Author

nfrisby commented Feb 6, 2025

@KtorZ thanks very much for the response! Glad to hear it wouldn't be very disruptive for you.

@kderme
Copy link
Contributor

kderme commented Feb 11, 2025

DBSync indexes all blocks, including EBBs, so an integration will require adjusting the existing schema. This is not disruptive though and could be handled by a small migration, which deletes existing EBBs entries from the block table table and adjusts some previous block references.

This change may push some complexity downstream of DBSync, but it will only impact clients that care about EBBs, which is likely no one.

@kderme
Copy link
Contributor

kderme commented Feb 11, 2025

It would be nice to have an easy way to extract the lookup table of old EBBs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants