Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Big Decontamination Plan #386

Open
edsko opened this issue May 27, 2020 · 15 comments
Open

The Big Decontamination Plan #386

edsko opened this issue May 27, 2020 · 15 comments
Labels
technical debt Technical debt
Milestone

Comments

@edsko
Copy link
Contributor

edsko commented May 27, 2020

This would be quite a large refactoring, but it would simplify a large part of the codebase, and means that going forward we can forget about EBBs entirely.

If we didn't care about preserving history, we could have rewritten the existing chain, rewriting nothing but the prev-hashes of every block, skipping any EBBs. We do want to preserve history, and so we cannot do this, but we can do the next best thing: we can introduce a hashmap mapping the hash of every EBB to its predecessor. This way we can do a local translation (in the HasHeader instance for ByronBlock) so that if we have blocks A - EBB - B, we rewrite the prev-hash of B on the fly to be A rather than the EBB. Now we can pretend EBBs don't exist, we don't store them, we don't send them, we have no SlotNo clashes, we have no BlockNo clashes, soooo many things in the consensus layer and storage layer could be made simpler, at the cost of one static value and HasHeader instance for a block type that is anyway no longer going to be used.

@edsko edsko added the technical debt Technical debt label May 27, 2020
@mrBliss
Copy link
Contributor

mrBliss commented Jun 17, 2020

Possible plan:

To transition to an EBB-free chain, there will need to be a transition period in which we are both able to serve an EBB-contaminated chain and an EBB-free chain. Old Byron nodes (NodeToNodeV1/2 and NodeToClientV2) still expect EBBs and that's what they'll get. However, new Byron-to-Shelley nodes (NodeToNodeV3 and NodeToClient3) should not receive EBBs anymore. Once we have switched to Shelley, we will know that only nodes that are capable of handling an EBB-free chain are active, after which we can start making more invasive changes to get rid of EBBs once and for all. E.g., migrate the database, remove all EBB-related edge cases, etc.

This means that new Byron-to-Shelley nodes should both be capable of serving an EBB-contaminated chain (to remain compatible with old Byron nodes) and serving an EBB-free chain (to new Byron-to-Shelley nodes). The ChainDB will need to provide EBB-contaminated Iterators and Readers as well as EBB-free ones. The latter can be implemented on top of the former as it knows which blocks are EBBs (it can use the GetIsEBB block component internally).

Something else we should check: there is a staging net, which might also contain EBBs. To keep that working, we'd need to include its EBB mapping in our hard-coded one. Alternatively, they can rewrite it to be EBB-free 😁.

If we want to use the Shelley release as the opportunity to start this process, we should do the following before the Shelley release:

  • Hard-code the EBB mapping as discussed in the ticket description
  • Provide EBB-free variants of ChainDB Iterators and Readers (without losing the existing EBB-contaminated ones). In the BlockFetchServer and ChainSyncServer, we get the NodeToNode/Client version and use that to decide which iterator/reader to open.
  • Think hard about other things we might be forgetting 😁

Note that LedgerCursor doesn't need to change, because the in-memory representation of the LedgerDB will still have snapshots for EBBs (until we get rid of them altogether, which is after the Shelley release), so a new Byron-to-Shelley node will just never request a cursor for an EBB. In other words, we'll have more points to roll back to, which is backwards-compatible.

To test this properly, it would be nice to be able to run a mix of old and new nodes.

@edsko
Copy link
Contributor Author

edsko commented Jun 25, 2020

We'll need to remove blockPrevHash (and ideally blockInvariant) from HasHeader: rewriting of the prev hash must be conditional, because the Volatile DB needs to construct fragments of headers that optionally include EBBs (depending on the network version of the node requesting the blocks).

edsko referenced this issue in IntersectMBO/ouroboros-network Jun 26, 2020
This is the first step towards the master plan for getting rid of EBBs (#2156).
Given blocks `A, EBB, B`, we will reinterpret the prev hash of `B` to be `A`.
However, in order to support legacy nodes, we must do this rewrite
conditionally. The network layer does not, and should not, need to know
anything about this, but `HasHeader` _did_ insist that the prevhash of a block
was known, even though the network layer actually never depended on that
information at all, except in some assertion checks -- and in one other place
(see below). Those assertion checks were useful in early developemnt but header
validation is of course the responsibility of consensus, and the assertions
don't add much on top.

This means `HasHeader` now no longer has `blockPrevHash` or indeed
`blockInvariant`; they are both still available as `HasFullHeader`, used in the
network tests only.

The one other place where the network layer depended on this was in joining
`FetchRequest`: it was joining to fetch requests if the two fragments happened
to fit together, and it was using the prev hash for this. This is no longer
possible, _but_ there was no reason for `FetchRequest` to use `ChainFragment`
instead of `AnchoredFragment`: we start with an `AnchoredFragment`, and then
_lose that information_ by dropping the anchor. Now `FetchRequest` _does_ use
`AnchoredFragment` all the way, which means that we can use the anchor to see
if two fetch requests fit together.
iohk-bors bot referenced this issue in IntersectMBO/ouroboros-network Jun 26, 2020
2322: Remove blockPrevHash r=edsko a=edsko

This is the first step in the remove-EBB-masterplan (#2156).

Co-authored-by: Edsko de Vries <[email protected]>
edsko referenced this issue in IntersectMBO/ouroboros-network Jun 26, 2020
This is the first step towards the master plan for getting rid of EBBs (#2156).
Given blocks `A, EBB, B`, we will reinterpret the prev hash of `B` to be `A`.
However, in order to support legacy nodes, we must do this rewrite
conditionally. The network layer does not, and should not, need to know
anything about this, but `HasHeader` _did_ insist that the prevhash of a block
was known, even though the network layer actually never depended on that
information at all, except in some assertion checks -- and in one other place
(see below). Those assertion checks were useful in early developemnt but header
validation is of course the responsibility of consensus, and the assertions
don't add much on top.

This means `HasHeader` now no longer has `blockPrevHash` or indeed
`blockInvariant`; they are both still available as `HasFullHeader`, used in the
network tests only.

The one other place where the network layer depended on this was in joining
`FetchRequest`: it was joining to fetch requests if the two fragments happened
to fit together, and it was using the prev hash for this. This is no longer
possible, _but_ there was no reason for `FetchRequest` to use `ChainFragment`
instead of `AnchoredFragment`: we start with an `AnchoredFragment`, and then
_lose that information_ by dropping the anchor. Now `FetchRequest` _does_ use
`AnchoredFragment` all the way, which means that we can use the anchor to see
if two fetch requests fit together.
iohk-bors bot referenced this issue in IntersectMBO/ouroboros-network Jun 26, 2020
2322: Remove blockPrevHash r=edsko a=edsko

This is the first step in the remove-EBB-masterplan (#2156).

Co-authored-by: Edsko de Vries <[email protected]>
edsko referenced this issue in IntersectMBO/ouroboros-network Jun 26, 2020
This is the first step towards the master plan for getting rid of EBBs (#2156).
Given blocks `A, EBB, B`, we will reinterpret the prev hash of `B` to be `A`.
However, in order to support legacy nodes, we must do this rewrite
conditionally. The network layer does not, and should not, need to know
anything about this, but `HasHeader` _did_ insist that the prevhash of a block
was known, even though the network layer actually never depended on that
information at all, except in some assertion checks -- and in one other place
(see below). Those assertion checks were useful in early developemnt but header
validation is of course the responsibility of consensus, and the assertions
don't add much on top.

This means `HasHeader` now no longer has `blockPrevHash` or indeed
`blockInvariant`; they are both still available as `HasFullHeader`, used in the
network tests only.

The one other place where the network layer depended on this was in joining
`FetchRequest`: it was joining to fetch requests if the two fragments happened
to fit together, and it was using the prev hash for this. This is no longer
possible, _but_ there was no reason for `FetchRequest` to use `ChainFragment`
instead of `AnchoredFragment`: we start with an `AnchoredFragment`, and then
_lose that information_ by dropping the anchor. Now `FetchRequest` _does_ use
`AnchoredFragment` all the way, which means that we can use the anchor to see
if two fetch requests fit together.
iohk-bors bot referenced this issue in IntersectMBO/ouroboros-network Jun 26, 2020
2322: Remove blockPrevHash r=edsko a=edsko

This is the first step in the remove-EBB-masterplan (#2156).

Co-authored-by: Edsko de Vries <[email protected]>
@edsko
Copy link
Contributor Author

edsko commented Jun 27, 2020

We also have to be careful with the other direction: nodes that do translate EBBs away but are syncing from nodes using version 1 or 2 of the protocol.

@edsko
Copy link
Contributor Author

edsko commented Jun 27, 2020

As of IntersectMBO/ouroboros-network#2335 the context for translating prev hashes is now available in principle. We have to make sure, however, that the Byron ledger itself does not check prev hashes when applying blocks. It shouldn't, consensus is responsible for header validation, but it might; if it does, we'll have to remove that check from Byron (it would have been superfluous anyway).

@edsko edsko changed the title Get rid of EBBs once and for all The Big Decontamination Plan Jun 30, 2020
@edsko
Copy link
Contributor Author

edsko commented Jul 3, 2020

Since it seems we won't get a chance to push this out quickly, it makes sense to instead opt for a longer-term but more ideal plan:

  1. Reconsider parameterisation of VolatileDB and ImmutableDB ouroboros-network#2264: Remove the distinction between ImmDb/ImmutableDB and VolDB/VolatileDB (LgrDB and LedgerDB should probably stay: one is pure, one is not)

  2. ImmutableDB: introduce BlockBits #661 (in particular, ImmutableDB: introduce BlockBits #661): Migrate the immutable DB secondary index, and as we do so, add

    data PrevHint = PrevIsEBB (ChainHash ByronBlock) | PrevIsRegularBlock

    into the nested context for Byron. This will then allow us to do the on-fly rewrite of the chain based on local information only, without the need for any global EBB map. This will make testing also significantly easier. This migration can be done on the fly ("JIT migration"), and does not need to read the blocks, except to establish the PrevHint of the first regular block: if there is an EBB present, we need to read it to get its prev hash (for PrevIsEBB); if there is no EBB present, it would just be PrevIsRegularBlock.

  3. We can now proceed as above, except with local information.

@edsko
Copy link
Contributor Author

edsko commented Jul 3, 2020

Possibly related/required: Add IsEBB to HeaderFields #639

@mrBliss
Copy link
Contributor

mrBliss commented Jul 3, 2020

We should implement #645 first so that we can test that a node can both serve EBB-contaminated and EBB-free chains.

@edsko
Copy link
Contributor Author

edsko commented Jul 4, 2020

Observation from @mrBliss : how will this work with nodes syncing? Even if they don't want EBBs, they must still request them in order to be able to give them to nodes that do want them?

One thought: perhaps we should be able to get nodes in the system that only speak the new protocol, and so cannot support such "legacy" nodes..? 🤔

@edsko
Copy link
Contributor Author

edsko commented Jul 7, 2020

Observation from @mrBliss : how will this work with nodes syncing? Even if they don't want EBBs, they must still request them in order to be able to give them to nodes that do want them?
One thought: perhaps we should be able to get nodes in the system that only speak the new protocol, and so cannot support such "legacy" nodes..? thinking

Perhaps version negotiation could help here also, and it might in fact help with the phasing out. What if nodes could declare "I cannot serve EBBs", by declaring "I don't support this version number"? If you are an old style node, you need to find an upstream peer that does do it (this will require #646). This means that now we don't need this fixed support anymore within a single node: either you have EBBs, or you don't. Instead of having a single node support both EBB-contaminated and EBB-free chains, instead we can have a network of nodes, some running version 1, some version 2, so that's where we get the heterogeneity.

@edsko
Copy link
Contributor Author

edsko commented Jul 10, 2020

After EBBs are removed entirely, we can probably scrap the whole CodecConfig concept (and remove the then-redundant parameters from ProtocolClientInfo).

mrBliss referenced this issue in IntersectMBO/ouroboros-network Sep 7, 2020
We originally planned to store the EBB rewrite mapping in the `CodecConfig`, but
our new plan is to use the `NestedContext` for it. See step 2 in
https://github.com/input-output-hk/ouroboros-network/issues/2156#issuecomment-653378396.
github-merge-queue bot referenced this issue May 9, 2023
# Description

Running `cabal run ouroboros-consensus-cardano:byron-test
--ghc-options="-fno-ignore-asserts" -- -p '/simple convergence/'
--quickcheck-replay=323717` on commit
2d50007 shows a test failure for Byron
ThreadNet tests.
```
byron
  Byron
    simple convergence: FAIL (39.92s)
      *** Failed! (after 76 tests):
      Exception:
        precondition violated: fragments aren't both non-empty or don't intersect
        CallStack (from HasCallStack):
          error, called at src/ouroboros-consensus/Ouroboros/Consensus/Util/Assert.hs:13:30 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.Util.Assert
          assertWithMsg, called at src/ouroboros-consensus/Ouroboros/Consensus/Util/AnchoredFragment.hs:88:5 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.Util.AnchoredFragment
          compareAnchoredFragments, called at src/ouroboros-consensus/Ouroboros/Consensus/Util/AnchoredFragment.hs:135:5 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.Util.AnchoredFragment
          preferAnchoredCandidate, called at src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/BlockFetch/ClientInterface.hs:305:9 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.MiniProtocol.BlockFetch.ClientInterface
          plausibleCandidateChain, called at src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/BlockFetch/ClientInterface.hs:173:35 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.MiniProtocol.BlockFetch.ClientInterface
          plausibleCandidateChain, called at src/Ouroboros/Network/BlockFetch.hs:204:9 in ouroboros-network-0.5.0.0-f5391f998845aaf34991fdbae42479f49285b923ee25ccb4dd95fc194b95735e:Ouroboros.Network.BlockFetch
          plausibleCandidateChain, called at src/Ouroboros/Network/BlockFetch/Decision.hs:245:7 in ouroboros-network-0.5.0.0-f5391f998845aaf34991fdbae42479f49285b923ee25ccb4dd95fc194b95735e:Ouroboros.Network.BlockFetch.Decision
      TestSetup {setupEBBs = ProduceEBBs, setupK = SecurityParam 1, setupTestConfig = TestConfig {initSeed = Seed 5839900652819622642, nodeTopology = NodeTopology (fromList [(CoreNodeId 0,fromList []),(CoreNodeId 1,fromList [CoreNodeId 0]),(CoreNodeId 2,fromList [CoreNodeId 1])]), numCoreNodes = NumCoreNodes 3, numSlots = NumSlots 18}, setupNodeJoinPlan = NodeJoinPlan (fromList [(CoreNodeId 0,SlotNo 0),(CoreNodeId 1,SlotNo 0),(CoreNodeId 2,SlotNo 9)]), setupNodeRestarts = NodeRestarts (fromList [(SlotNo 15,fromList [(CoreNodeId 0,NodeRestart)]),(SlotNo 17,fromList [(CoreNodeId 2,NodeRestart)])]), setupSlotLength = SlotLength 16.923s, setupVersion = (NodeToNodeV_8,ByronNodeToNodeVersion1)}
      Use --quickcheck-replay=323717 to reproduce.

1 out of 1 tests failed (39.92s)
```

This test failure occurs in the implementation of the
`plausibleCandidateChain` field of the `BlockFetchConsensusInterface`
interface in Byron.

```haskell
       -- | Given the current chain, is the given chain plausible as a
       -- candidate chain. Classically for Ouroboros this would simply
       -- check if the candidate is strictly longer, but for Ouroboros
       -- with operational key certificates there are also cases where
       -- we would consider a chain of equal length to the current chain.
       --
       plausibleCandidateChain :: HasCallStack
                               => AnchoredFragment header
                               -> AnchoredFragment header -> Bool,
```

In the presence of EBBs, the implementation of `plausibleCandidateChain`
might violate a precondition down the line in
`compareAnchoredFragments`, which is ultimately caused by the fact that
EBBs share the block number of their predecessor. We've added
documentation to the code to describe this corner case.

For this to be a problem in practice, assertions would have to be
enabled (which isn't the case for a running node) and the current
evolving chain would have to be in the Byron era (which is not the
case). Since violation of the precondition is therefore highly unlikely,
we chose no to include a case for EBBs here because it would complicate
the code. In addition, the Big Decontamination plan
(https://github.com/input-output-hk/ouroboros-network/issues/2156#issuecomment-656713901)
would hopefully fix the problem
@nfrisby nfrisby transferred this issue from IntersectMBO/ouroboros-network Oct 2, 2023
@nfrisby
Copy link
Contributor

nfrisby commented Jan 22, 2025

I'm finally revisiting this goal. I've current identified a few plans. They're organized along these Yes-No dimensions.

  1. OnDisk: Whether to remove EBBs from the on-disk files.
  2. Wire: Whether to remove EBBs from the fringes of ChainSync and BlockFetch.
  3. Innards: Whether to remove EBB-specific logic from everywhere else in the Consensus Layer.
  • Plan YYY: remove EBBs entirely. This is the original plan, based on Thomas and Edsko's comments above. It's very appealing to us. However, nowadays, it would cause some disruption to 3rd party tooling. We'd need to start at least with a CPS to garner discussion about those disruptions and how to mitigate them.

  • Plan NYY: remove EBBs from everything except the on-disk ImmDB files. This is a compromise of plan A. We get most of the benefits, but we moreover don't alter the on-disk state, which benefits at least Mithril's bootstrapping snapshots.

  • Plan NNY: remove EBBs from everything except the very fringes of ChainSync and BlockFetch and the on-disk ImmDB files. This simplifies our codebase without changing the observable behavior of the node. This would minimize the amount of communication necessary to get this change through.

  • Plan YNY: remove EBBs from everything except the very fringes of ChainSync and BlockFetch. Like plan NNY, but would cause a slight disruption to Mithril and anyone similar to it. So it'd still deserve a CPS etc, but many more 3rd parties will be able to reply with "we don't care".

  • Plan NNN: maybe we mostly give up our dreams of removing them. In particular, to write the CPS I'll need to re-audit the pain points. I anticipate most of them are in the Storage Layer, and so compromising on OnDisk might lose a large portion of our benefits. But perhaps, eg, we could still isolate the <= of validateEnvelope and other such spoiled natural invariants into the Byron era, at least?

Some context:

  • There are 176 EBBs on mainnet. (I haven't checked any of the extant testnets; there used to be some in past testnets.)
  • The EBB at the beginning of epoch X contains the Ouroboros Classic leader schedule for epoch X. I don't know the exact encoding (the Ledger team might), but it looks like CBOR for a map, possibly from Word16.
  • Each is about 650 kilobytes ~ 635 kibibytes. When individually compressed via LZ4 High Compression (which should not significantly increase decompression costs), they're 26 kilobytes ~ 25.4 kibibytes. The full list of EBBs, each individually compressed with LZ4 HC, is 4,632,800 bytes. Three headers (uncompressed) are 82 bytes, then twenty are 84 bytes, and the rest are 85 bytes (I'm guessing this is CBOR small integer thing).
  • There will never be new EBBs on mainnet.
  • No EBB affects subsequent ledger states whatsoever. (That's why we could get away with removing them.)
  • Mithril maintains various hashes. They hash the on-disk files (see one of the dimensions above). They depend on the hashes of blocks that contain transactions (EBBs do not and we're not proposing changing the bytes/hashes of non-EBBs). They're also working now towards hashing the sequence of block hashes -- EBBs would disrupt this, but the same table Edsko described in the comment above would suffice for them to patch-up their logic to maintain parity with snapshots made by EBB-polluted nodes etc.
  • I've reached out to Kostas regarding db-sync, but haven't heard back yet.
  • I haven't reached out to 3rd parties yet, but I've gathered some names --- planning to leverage the CPS for this.

@nfrisby
Copy link
Contributor

nfrisby commented Jan 23, 2025

These Slack links are for my benefit --- DMs with Jean-Philippe (Mithril) and Kostaz (db-sync) taking their temperature on the idea to prepare for my first CPS draft.

@nfrisby
Copy link
Contributor

nfrisby commented Jan 24, 2025

My plan going forward.

  1. Create a PR that completely removes EBBs from the codebase. The direct goal of this PR is to very concretely demonstrate some of the benefits. The indirect goal is to help me formulate explanations of the less-concrete benefits of removing EBBs.

  2. Then I'll draft a CPS that argues it is worthwhile to remove EBBs. That CPS will need to discuss the kinds of disruptions this would have to 3rd party tools, since that is the primary cost of removing EBBs. The CPS should likely also discuss the compromise where the EBBs remain on the disk (for the sake of Mithril, eg), but are never otherwise revealed --- unless the outcome of step 1 suggests this would ultimately preserve most of the undue complexity caused by EBBs.

  3. Poke the community to respond to the CPS.

  4. Depending on the response, draft a CIP for how to proceed (eg explaining the two stage process).

@nfrisby
Copy link
Contributor

nfrisby commented Jan 29, 2025

I drafted a CIP, to gather community feedback. cardano-foundation/CIPs#974

@nfrisby
Copy link
Contributor

nfrisby commented Feb 13, 2025

I asked in the dx-pub Slack channel how the community might go about an eternal archive of the EBBs, in case people need them once they're no longer on the chain https://input-output-rnd.slack.com/archives/CG1FBSDMM/p1739462509257889

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
technical debt Technical debt
Projects
Status: 🔖 Ready
Development

No branches or pull requests

4 participants