Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gen): implement struct generation from schema registry subjects and versions #498

Merged
merged 13 commits into from
Feb 6, 2025

Conversation

ekazakas
Copy link
Contributor

@ekazakas ekazakas commented Feb 4, 2025

  • Add an option to generate Go structs by providing schema registry URL and subject:version pairs.
  • When using schema registry, additional utility functions (Subject() string + Version() int) will be generated for the root struct

Example usage:

go run cmd/avrogen/main.go \
    -pkg test \
    -encoders \
    -strict-types \
    -schemaregistry http://localhost:8081 \
    produce_report:1

I will be happy to help with anything that could potentially speed up the review, merge and release of this change. 🙏

gen/gen.go Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
gen/gen.go Outdated Show resolved Hide resolved
gen/gen.go Outdated
}

func (g *Generator) generate(schema avro.Schema) string {
func (g *Generator) generate(schema avro.Schema, schemaMetadata *SchemaMetadata, level int) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure I see the purpose of schemaMetadata. It is a very intrusive change for something that brings little value. They are also values that are meaningless outside of a specific registry.

Copy link
Contributor Author

@ekazakas ekazakas Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see your point. It does feel intrusive. These values are mostly useful when using Schema registry. The metadata provides information about the subject and the version of some specific schema that is fetched from Schema registry.

However the benefit of these is that the subject and version can be embedded in the generated struct. Whoever will be working with schema registry might certainly appreciate the extra information provided by the generated struct.

I wonder if there could be some other way to embed this information into struct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to attach it to the first struct is not really correct in this case. For instance a schema that is a union -> array -> record would attach it in a weird place. IMO it would be better to make them consts in the package, meaning it could be attached to the Generator itself, and handle in the Write. This would be simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good idea. This will keep the method signatures backwards compatible and still provide information about subject and version. I will implement these changes tomorrow and ask again for a review.

Copy link
Contributor Author

@ekazakas ekazakas Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes, the API is backwards compatible, but there are a few things and limitations I would like to point out and clarify.

At the moment, I can only generate structs for a single subject and version pair, because the SchemaMetadata can only be attached on the generator config, so only once. With files, we can still do multiple files per single generate command.

The subject and version are now constants on the package, but I would really like to attach them on the "root" type, the reason being that it would allow us to work with schemaregistry (de)serializer, e.g.:

type AvroMessage interface {
    Schema() avro.Schema
    Subject() string
    Version() int
}

func (s *Deserializer) DeserializeInto(payload []byte, msg interface{}) error {
        avroMsg, ok := v.(AvroMessage)
	if !ok {
		return nil, errors.WithStack(NoAvroMessageError{v})
	}
	
	schemaMetadata, err := m.schemaRegistryClient.GetSchemaMetadata(avroMsg.Subject(), avroMsg.Version())
	if err != nil {
		return nil, err
	}
	
	fmt.Println(schemaMetadata.ID)
   <...>
}

func (m *AvroMarshaler) Marshal(v interface{}) ([]byte, error) {
	avroMsg, ok := v.(AvroMessage)
	if !ok {
		return nil, errors.WithStack(NoAvroMessageError{v})
	}

	schemaMetadata, err := m.schemaRegistryClient.GetSchemaMetadata(avroMsg.Subject(), avroMsg.Version())
	if err != nil {
		return nil, err
	}

	config := avrov2.NewSerializerConfig()
	config.AutoRegisterSchemas = false
	config.UseSchemaID = schemaMetadata.ID

	serializer, err := avrov2.NewSerializer(m.schemaRegistryClient, serde.ValueSerde, config)
	if err != nil {
		return nil, err
	}

	b, err := serializer.Serialize(avroMsg.Subject(), avroMsg)
	if err != nil {
		return nil, err
	}

	return b, nil
}

I am not sure what would be the best way to achieve this and whether it's possible at all. I could also generate the Subject() and Version() functions on each of the types that come from current subject and version pair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one of the options here would be to provide a map that could be used to generate custom functions on specified types. E.g.:

Customizations: map[string]map[string]any

Where root map key would be type name, child map key would be function name and value would be the value it returns.

e.g.:

	test := map[string]map[string]any{
		"SomeType": map[string]string{
			"AFunction": "a",
			"BFunction": "b",
		},
	}
============================================
	type SomeType struct {
		// SomeString is a string.
		SomeString string `avro:"someString"`
		SomeInt    int    `avro:"someInt"`
	}
	
	func (o *SomeType) AFunction() string {
		return "a"
	}
	
	func (o *SomeType) BFunction() string {
		return "b"
	}	

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are hoping to generate seems out of the scope of avrogen, but you could use gen and a custom template to achieve what you need. In this case, the only change necessary is a permissive (read any) attachment of metadata in the generator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's really a very specific usecase... I guess it's a bit too much to ask without affecting other users.

I have updated avrogen to support reading from schema registry subject versions. It can do it with multiple entries.

And I have also added the any metadata field for gen to be able to implement a custom version of avrogen

@ekazakas ekazakas requested a review from nrwiersma February 5, 2025 17:27
Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor nitpicks

cmd/avrogen/main.go Outdated Show resolved Hide resolved
for _, entry := range flgs.Args() {
var schema avro.Schema

if cfg.SchemaRegistry != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A switch here would be better, as else is in general a code smell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@nrwiersma nrwiersma merged commit 559723e into hamba:main Feb 6, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants