-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Big Decontamination Plan #386
Comments
Possible plan: To transition to an EBB-free chain, there will need to be a transition period in which we are both able to serve an EBB-contaminated chain and an EBB-free chain. Old Byron nodes (NodeToNodeV1/2 and NodeToClientV2) still expect EBBs and that's what they'll get. However, new Byron-to-Shelley nodes (NodeToNodeV3 and NodeToClient3) should not receive EBBs anymore. Once we have switched to Shelley, we will know that only nodes that are capable of handling an EBB-free chain are active, after which we can start making more invasive changes to get rid of EBBs once and for all. E.g., migrate the database, remove all EBB-related edge cases, etc. This means that new Byron-to-Shelley nodes should both be capable of serving an EBB-contaminated chain (to remain compatible with old Byron nodes) and serving an EBB-free chain (to new Byron-to-Shelley nodes). The ChainDB will need to provide EBB-contaminated Iterators and Readers as well as EBB-free ones. The latter can be implemented on top of the former as it knows which blocks are EBBs (it can use the Something else we should check: there is a staging net, which might also contain EBBs. To keep that working, we'd need to include its EBB mapping in our hard-coded one. Alternatively, they can rewrite it to be EBB-free 😁. If we want to use the Shelley release as the opportunity to start this process, we should do the following before the Shelley release:
Note that To test this properly, it would be nice to be able to run a mix of old and new nodes. |
We'll need to remove |
This is the first step towards the master plan for getting rid of EBBs (#2156). Given blocks `A, EBB, B`, we will reinterpret the prev hash of `B` to be `A`. However, in order to support legacy nodes, we must do this rewrite conditionally. The network layer does not, and should not, need to know anything about this, but `HasHeader` _did_ insist that the prevhash of a block was known, even though the network layer actually never depended on that information at all, except in some assertion checks -- and in one other place (see below). Those assertion checks were useful in early developemnt but header validation is of course the responsibility of consensus, and the assertions don't add much on top. This means `HasHeader` now no longer has `blockPrevHash` or indeed `blockInvariant`; they are both still available as `HasFullHeader`, used in the network tests only. The one other place where the network layer depended on this was in joining `FetchRequest`: it was joining to fetch requests if the two fragments happened to fit together, and it was using the prev hash for this. This is no longer possible, _but_ there was no reason for `FetchRequest` to use `ChainFragment` instead of `AnchoredFragment`: we start with an `AnchoredFragment`, and then _lose that information_ by dropping the anchor. Now `FetchRequest` _does_ use `AnchoredFragment` all the way, which means that we can use the anchor to see if two fetch requests fit together.
2322: Remove blockPrevHash r=edsko a=edsko This is the first step in the remove-EBB-masterplan (#2156). Co-authored-by: Edsko de Vries <[email protected]>
This is the first step towards the master plan for getting rid of EBBs (#2156). Given blocks `A, EBB, B`, we will reinterpret the prev hash of `B` to be `A`. However, in order to support legacy nodes, we must do this rewrite conditionally. The network layer does not, and should not, need to know anything about this, but `HasHeader` _did_ insist that the prevhash of a block was known, even though the network layer actually never depended on that information at all, except in some assertion checks -- and in one other place (see below). Those assertion checks were useful in early developemnt but header validation is of course the responsibility of consensus, and the assertions don't add much on top. This means `HasHeader` now no longer has `blockPrevHash` or indeed `blockInvariant`; they are both still available as `HasFullHeader`, used in the network tests only. The one other place where the network layer depended on this was in joining `FetchRequest`: it was joining to fetch requests if the two fragments happened to fit together, and it was using the prev hash for this. This is no longer possible, _but_ there was no reason for `FetchRequest` to use `ChainFragment` instead of `AnchoredFragment`: we start with an `AnchoredFragment`, and then _lose that information_ by dropping the anchor. Now `FetchRequest` _does_ use `AnchoredFragment` all the way, which means that we can use the anchor to see if two fetch requests fit together.
2322: Remove blockPrevHash r=edsko a=edsko This is the first step in the remove-EBB-masterplan (#2156). Co-authored-by: Edsko de Vries <[email protected]>
This is the first step towards the master plan for getting rid of EBBs (#2156). Given blocks `A, EBB, B`, we will reinterpret the prev hash of `B` to be `A`. However, in order to support legacy nodes, we must do this rewrite conditionally. The network layer does not, and should not, need to know anything about this, but `HasHeader` _did_ insist that the prevhash of a block was known, even though the network layer actually never depended on that information at all, except in some assertion checks -- and in one other place (see below). Those assertion checks were useful in early developemnt but header validation is of course the responsibility of consensus, and the assertions don't add much on top. This means `HasHeader` now no longer has `blockPrevHash` or indeed `blockInvariant`; they are both still available as `HasFullHeader`, used in the network tests only. The one other place where the network layer depended on this was in joining `FetchRequest`: it was joining to fetch requests if the two fragments happened to fit together, and it was using the prev hash for this. This is no longer possible, _but_ there was no reason for `FetchRequest` to use `ChainFragment` instead of `AnchoredFragment`: we start with an `AnchoredFragment`, and then _lose that information_ by dropping the anchor. Now `FetchRequest` _does_ use `AnchoredFragment` all the way, which means that we can use the anchor to see if two fetch requests fit together.
2322: Remove blockPrevHash r=edsko a=edsko This is the first step in the remove-EBB-masterplan (#2156). Co-authored-by: Edsko de Vries <[email protected]>
We also have to be careful with the other direction: nodes that do translate EBBs away but are syncing from nodes using version 1 or 2 of the protocol. |
As of IntersectMBO/ouroboros-network#2335 the context for translating prev hashes is now available in principle. We have to make sure, however, that the Byron ledger itself does not check prev hashes when applying blocks. It shouldn't, consensus is responsible for header validation, but it might; if it does, we'll have to remove that check from Byron (it would have been superfluous anyway). |
Since it seems we won't get a chance to push this out quickly, it makes sense to instead opt for a longer-term but more ideal plan:
|
Possibly related/required: Add IsEBB to HeaderFields #639 |
We should implement #645 first so that we can test that a node can both serve EBB-contaminated and EBB-free chains. |
Observation from @mrBliss : how will this work with nodes syncing? Even if they don't want EBBs, they must still request them in order to be able to give them to nodes that do want them? One thought: perhaps we should be able to get nodes in the system that only speak the new protocol, and so cannot support such "legacy" nodes..? 🤔 |
Perhaps version negotiation could help here also, and it might in fact help with the phasing out. What if nodes could declare "I cannot serve EBBs", by declaring "I don't support this version number"? If you are an old style node, you need to find an upstream peer that does do it (this will require #646). This means that now we don't need this fixed support anymore within a single node: either you have EBBs, or you don't. Instead of having a single node support both EBB-contaminated and EBB-free chains, instead we can have a network of nodes, some running version 1, some version 2, so that's where we get the heterogeneity. |
After EBBs are removed entirely, we can probably scrap the whole |
We originally planned to store the EBB rewrite mapping in the `CodecConfig`, but our new plan is to use the `NestedContext` for it. See step 2 in https://github.com/input-output-hk/ouroboros-network/issues/2156#issuecomment-653378396.
# Description Running `cabal run ouroboros-consensus-cardano:byron-test --ghc-options="-fno-ignore-asserts" -- -p '/simple convergence/' --quickcheck-replay=323717` on commit 2d50007 shows a test failure for Byron ThreadNet tests. ``` byron Byron simple convergence: FAIL (39.92s) *** Failed! (after 76 tests): Exception: precondition violated: fragments aren't both non-empty or don't intersect CallStack (from HasCallStack): error, called at src/ouroboros-consensus/Ouroboros/Consensus/Util/Assert.hs:13:30 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.Util.Assert assertWithMsg, called at src/ouroboros-consensus/Ouroboros/Consensus/Util/AnchoredFragment.hs:88:5 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.Util.AnchoredFragment compareAnchoredFragments, called at src/ouroboros-consensus/Ouroboros/Consensus/Util/AnchoredFragment.hs:135:5 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.Util.AnchoredFragment preferAnchoredCandidate, called at src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/BlockFetch/ClientInterface.hs:305:9 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.MiniProtocol.BlockFetch.ClientInterface plausibleCandidateChain, called at src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/BlockFetch/ClientInterface.hs:173:35 in ouroboros-consensus-0.5.0.0-inplace:Ouroboros.Consensus.MiniProtocol.BlockFetch.ClientInterface plausibleCandidateChain, called at src/Ouroboros/Network/BlockFetch.hs:204:9 in ouroboros-network-0.5.0.0-f5391f998845aaf34991fdbae42479f49285b923ee25ccb4dd95fc194b95735e:Ouroboros.Network.BlockFetch plausibleCandidateChain, called at src/Ouroboros/Network/BlockFetch/Decision.hs:245:7 in ouroboros-network-0.5.0.0-f5391f998845aaf34991fdbae42479f49285b923ee25ccb4dd95fc194b95735e:Ouroboros.Network.BlockFetch.Decision TestSetup {setupEBBs = ProduceEBBs, setupK = SecurityParam 1, setupTestConfig = TestConfig {initSeed = Seed 5839900652819622642, nodeTopology = NodeTopology (fromList [(CoreNodeId 0,fromList []),(CoreNodeId 1,fromList [CoreNodeId 0]),(CoreNodeId 2,fromList [CoreNodeId 1])]), numCoreNodes = NumCoreNodes 3, numSlots = NumSlots 18}, setupNodeJoinPlan = NodeJoinPlan (fromList [(CoreNodeId 0,SlotNo 0),(CoreNodeId 1,SlotNo 0),(CoreNodeId 2,SlotNo 9)]), setupNodeRestarts = NodeRestarts (fromList [(SlotNo 15,fromList [(CoreNodeId 0,NodeRestart)]),(SlotNo 17,fromList [(CoreNodeId 2,NodeRestart)])]), setupSlotLength = SlotLength 16.923s, setupVersion = (NodeToNodeV_8,ByronNodeToNodeVersion1)} Use --quickcheck-replay=323717 to reproduce. 1 out of 1 tests failed (39.92s) ``` This test failure occurs in the implementation of the `plausibleCandidateChain` field of the `BlockFetchConsensusInterface` interface in Byron. ```haskell -- | Given the current chain, is the given chain plausible as a -- candidate chain. Classically for Ouroboros this would simply -- check if the candidate is strictly longer, but for Ouroboros -- with operational key certificates there are also cases where -- we would consider a chain of equal length to the current chain. -- plausibleCandidateChain :: HasCallStack => AnchoredFragment header -> AnchoredFragment header -> Bool, ``` In the presence of EBBs, the implementation of `plausibleCandidateChain` might violate a precondition down the line in `compareAnchoredFragments`, which is ultimately caused by the fact that EBBs share the block number of their predecessor. We've added documentation to the code to describe this corner case. For this to be a problem in practice, assertions would have to be enabled (which isn't the case for a running node) and the current evolving chain would have to be in the Byron era (which is not the case). Since violation of the precondition is therefore highly unlikely, we chose no to include a case for EBBs here because it would complicate the code. In addition, the Big Decontamination plan (https://github.com/input-output-hk/ouroboros-network/issues/2156#issuecomment-656713901) would hopefully fix the problem
I'm finally revisiting this goal. I've current identified a few plans. They're organized along these Yes-No dimensions.
Some context:
|
These Slack links are for my benefit --- DMs with Jean-Philippe (Mithril) and Kostaz (db-sync) taking their temperature on the idea to prepare for my first CPS draft. |
My plan going forward.
|
I drafted a CIP, to gather community feedback. cardano-foundation/CIPs#974 |
I asked in the |
This would be quite a large refactoring, but it would simplify a large part of the codebase, and means that going forward we can forget about EBBs entirely.
If we didn't care about preserving history, we could have rewritten the existing chain, rewriting nothing but the prev-hashes of every block, skipping any EBBs. We do want to preserve history, and so we cannot do this, but we can do the next best thing: we can introduce a hashmap mapping the hash of every EBB to its predecessor. This way we can do a local translation (in the
HasHeader
instance forByronBlock
) so that if we have blocksA - EBB - B
, we rewrite the prev-hash ofB
on the fly to beA
rather than the EBB. Now we can pretend EBBs don't exist, we don't store them, we don't send them, we have noSlotNo
clashes, we have noBlockNo
clashes, soooo many things in the consensus layer and storage layer could be made simpler, at the cost of one static value andHasHeader
instance for a block type that is anyway no longer going to be used.The text was updated successfully, but these errors were encountered: