clean up owned assets involving offline participants #212
+379
−30
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
If a node wipes its storage for any reason, any previously generated assets whose participant set includes that node are unusable. Such assets owned by other nodes are stored indefinitely until an attempt is made to use them, leading to failed computations. We need a recovery strategy to address this scenario.
Implementation
Once a node has filled its store with the desired quantity of an asset, it will gradually verify its stored assets and discard those that depend on offline participants.
To recover the system, the wiped node would intentionally remain offline for some time. The other nodes would notice its absence and discard any owned assets that depend on it.
Visibility
Each owned asset type is tracked using three counters:
available
,online
, andoffline
.When the alive participant set changes, both
online
andoffline
are reset to0
, indicating that none of the stored assets have been checked yet against the new participant set.If the participant set then remains stable, the stored assets will be verified over time. When
available
andonline
are equal, andoffline
is 0, we will know that all assets depending on the offline node have been found and discarded.Closes #213. See also #207.
A time-based expiration on all assets in the DistributedAssetStore will be implemented separately to prevent unbounded accumulation of unowned assets.