-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shares-storage: replace GOB encoding with SSZ, deduplicate validator Index field #1837
Conversation
I probably should have asked sooner - why we want SSZ for encoding shares in the first place, it doesn't feel like the simplest option we can go for (see one, two, three) - why not just go with Anyway, this was my 1st time using Another relevant question, I see other entities in |
Just for future reference (or in case we'll need it), the way to properly solve it would be: gob
|
34bef36
to
c279ecf
Compare
Added a commit 2565389 that addresses #1473, bundling it with this PR because it needs DB migration (and it's just simpler to do both #1575 and #1473 as 1 DB migration). The genearal idea is to "flatten" but I'm going 1 step further and also inlining
since |
c279ecf
to
c4f0089
Compare
47c5e6a
to
efcc1f5
Compare
efcc1f5
to
5be2303
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting change. I agree we need to clarify next time what is the rationale behind such change and it'll help define the scope and make it easier to do decisions.
b25008e
to
8addd97
Compare
…nChainData, group share on-chain data into separate struct
…e synced from smart-contract), and mark DB migration as completed
82c64d7
to
3cd1e7b
Compare
registry/storage/shares.go
Outdated
ValidatorPubKey []byte `ssz-max:"48"` | ||
SharePubKey []byte `ssz-max:"48"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why its needed? the size is actually fixed usually and not a max..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should follow the spec here, and this seems to be in line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current spec seems to use ssz-size
-
https://github.com/ssvlabs/ssv-spec/blob/main/types/share.go#L8C1-L9C54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why its needed? the size is actually fixed usually and not a max..
I think I saw various tests failing with exact size
(as opposed to max
) - so I thought maybe it is meant to support pubkeys of different format (with size lower than 48 as well),
eg:
=== RUN TestSubmitProposal
2025-01-20T10:36:10.106079Z DEBUG logger is ready {"level": "debug", "encoder": "capital", "format": "console", "file_options": null}
2025-01-20T10:36:10.106420Z DEBUG logger is ready {"level": "debug", "encoder": "capital", "format": "console", "file_options": null}
controller_test.go:157:
Error Trace: /Users/iurii/work/ssv/operator/fee_recipient/controller_test.go:157
/Users/iurii/work/ssv/operator/fee_recipient/controller_test.go:45
Error: Received unexpected error:
failed to serialize share: marshal ssz: storageShare.SharePubKey (bytes array does not have the correct length): expected 48 and 0 found
Test: TestSubmitProposal
--- FAIL: TestSubmitProposal (0.00s)
but if you are saying the size always must be 48 I guess we simply need to adjust every test to have those pubkeys initialized
let me see if I can do that ^
on the other hand, it could be quite a bit of work to try and maintain this kind of validation on DB-encoding level (it's a burden on all unit-tests that use Share) - so perhaps leaving it as max
here and validating it at the moment of "share creation" is a simpler way to go about it, WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! one comment
will do spec alignment once @moshe-blox approves
// Share represents a storage share. | ||
// The better name of the struct is storageShareGOB, | ||
// but we keep the name Share to avoid conflicts with gob encoding. | ||
type Share struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible its better to not export on migrations
package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the idea was to copy whatever structures we have from "the original code" and adjust as necessary (keeping it as close to originals as possible so it's easy to drop-in replace it),
but now since all adjustments are made we can make it package-private - done in 04d5c63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moshe-blox I guess we cannot change this after all since gob complains it cannot decode data fetched from DB (the error goes away once I revert back to using public fields 0219b21):
{
"level": "\u001b[31mFATAL\u001b[0m",
"time": "2025-01-20T14:35:08.440521Z",
"msg": "could not setup db",
"error": "failed to run migrations: migration \"migration_5_change_share_format_from_gob_to_ssz\" failed: GetAll: decode gob share: decode storageShareGOB: gob: type mismatch: no fields matched compiling decoder for storageShareGOB",
"errorVerbose": "GetAll: decode gob share: decode storageShareGOB: gob: type mismatch: no fields matched compiling decoder for storageShareGOB\nmigration \"migration_5_change_share_format_from_gob_to_ssz\" failed\ngithub.com/ssvlabs/ssv/migrations.Migrations.Run\n\t/go/src/github.com/ssvlabs/ssv/migrations/migrations.go:98\ngithub.com/ssvlabs/ssv/migrations.Run\n\t/go/src/github.com/ssvlabs/ssv/migrations/migrations.go:33\ngithub.com/ssvlabs/ssv/cli/operator.setupDB\n\t/go/src/github.com/ssvlabs/ssv/cli/operator/node.go:515\ngithub.com/ssvlabs/ssv/cli/operator.init.func2\n\t/go/src/github.com/ssvlabs/ssv/cli/operator/node.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:989\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041\ngithub.com/ssvlabs/ssv/cli.Execute\n\t/go/src/github.com/ssvlabs/ssv/cli/cli.go:27\nmain.main\n\t/go/src/github.com/ssvlabs/ssv/cmd/ssvnode/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695\nfailed to run migrations\ngithub.com/ssvlabs/ssv/cli/operator.setupDB\n\t/go/src/github.com/ssvlabs/ssv/cli/operator/node.go:517\ngithub.com/ssvlabs/ssv/cli/operator.init.func2\n\t/go/src/github.com/ssvlabs/ssv/cli/operator/node.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:989\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041\ngithub.com/ssvlabs/ssv/cli.Execute\n\t/go/src/github.com/ssvlabs/ssv/cli/cli.go:27\nmain.main\n\t/go/src/github.com/ssvlabs/ssv/cmd/ssvnode/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"
}
great work 💪 i think we should have a unit test for the migration or perhaps at least on the conversion funcs, to make sure we're not losing/mutating any field by accident during migration (keep in mind that some won't perform this migration in the first release version due to upgrading late to a future version) for testing the migration, i have an idea:
unless we expect non-test-related code to change, this can be done in parallel as we progress with merging and QA'ing this PR however we never tested migrations before, so pls lmk if you have another idea other than that, i would've preferred separation like before, except for type SSVShare struct {
spectypes.Share // Contract data, with the unfortunate exception of ValidatorIndex however it's only set once
Liquidated bool // Contract data, but the spec is missing it and its unrelated to Beacon
BeaconMetadata // Beacon data only
} i think it's a bit less confusing to work with given the metadata and contract are unrelated and update in parallel, plus eventually we'd likely store metadata in separate database collection (to optimize updates & prevent race conditions) however i suspect it would be a significant refactor, and i'm not strongly convicted on this (would love to hear counter-arguments), so perhaps this should be postponed to a future PR (maybe together with the metadata database separation) |
I don't really see any obvious way to group and I'm not sure grouping by "where data comes from" (especially with which is why to me it seems the most "straightforward" way to go about it is to flatten at least for now, if we really want to group |
…ommits that change SSZ annotation max->size)
… pkg-public to pkg-private due to GOB complaining about it (can't decode data fetched from DB)
…s have no share pubkey on it
…tional sanity-checks are added
Rebased onto #1778 (until it's merged - this diff is more representative of the contents of this PR)now rebased onto latest commit from stageThis PR additionally closes #1473 (see this comment for why below), but primarily
this PR closes #1575 (targeting both
v2
andgenesis
code - hence it should work regardless of whether it's deployed before/after Alan fork - this is because bothv2
andgenesis
use the same code from packagestorage
, as far as I can tell, and we don't usegob
anywhere else)we could, but I'm assuming we don't want to re-sync everything from Ethereum smart contract just so we can re-populate database with shares in
ssz
format,The question is - do we want to support a proper full-blown rollback back to
gob
,just so we have some basic "safety" for production (and once we are confident there were no issues with migration - maybe couple of weeks later - we can get rid of
gob
completely and even erase the old shares key-range with a separate easy-to-apply DB migration); but also when I'm gonna roll this PR out to stage - if somebody redeploys something else to the same operator/cluster after this change has been running there for a while (and before his branch includes my change) - whole cluster will just break for him (stage branch too, until this PR is merged),hence the approach I'm taking in this PR (for now) covers most of rollback scenarios (but not all):
shares
prefix rangessz
while older code uses onlygob
(and works with old shares data that are still present in DB)the issue with rolling back to
gob
this PR doesn't address is that new shares might have been added during the time we were running this newssz
code version - hence for proper rollback togob
we'd need to copy those shares over to the old key-range too; and vise versa can also happen if node versions run aregob
(current)->ssz
(migration successful)->gob
(rollback)->(added new shares to DB)->ssz
- we won't see those newly added shares because they'll end up in old gob-related key-range (while migration intended to move them over if successful - applies only once, and has been previously run to success)it's unlikely we'll need to "rollback with a delay" once everything has been proven to work on stage (and actually needing 100% rollback-supporting approach in practice) ... but you never know, WDYT ?
Before merging:
shares
prefix range" is done, after this code has been running in prod for a while - both can be added to migration_5) once we are done with testing on stage add a commit to enable commented outcompleted(txn)
code in this PR(after we are 100%since share data isn't super precious we decided to delete it right away as a part of migration_5ssz
is working and we are not going back togob
) delete oldshares
prefix range in a separate migration (migration_6 or higher)