Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add CRC64-AVRO-LE fingerprint type #491

Merged
merged 2 commits into from
Jan 30, 2025
Merged

Conversation

kimgr
Copy link
Contributor

@kimgr kimgr commented Jan 20, 2025

The Avro specification details a Single Object Encoding using a header
to associate a schema ID with an Avro payload. The ID is defined as the
CRC64 fingerprint in little-endian encoding.

The pkg/crc64 module only provides big-endian CRC64, and the CRC64-AVRO
fingerprint type is implemented as such. The specification does not
detail endianness of the CRC64-AVRO fingerprint itself (only when
embedded in an SOE header).

To avoid breaking existing CRC64-AVRO fingerprints, add a new
fingerprint type CRC64-AVRO-LE, identical to CRC64-AVRO except
little-endian.

Add NewWithByteOrder and SumWithByteOrder top-level functions to crc64
so users can configure the hasher to use a specific byte order.

Add tests and benchmarks for the SumWithByteOrder function.

Fixes #489.

@kimgr
Copy link
Contributor Author

kimgr commented Jan 20, 2025

See also #490.

pkg/crc64/crc64.go Outdated Show resolved Hide resolved
@kimgr kimgr force-pushed the sum-encoding branch 3 times, most recently from c76586d to e126cff Compare January 23, 2025 13:06
Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor issue, otherwise it is looking good.

pkg/crc64/crc64.go Outdated Show resolved Hide resolved
The Avro specification details a Single Object Encoding using a header
to associate a schema ID with an Avro payload. The ID is defined as the
CRC64 fingerprint in little-endian encoding.

The pkg/crc64 module only provides big-endian CRC64, and the CRC64-AVRO
fingerprint type is implemented as such. The specification does not
detail endianness of the CRC64-AVRO fingerprint itself (only when
embedded in an SOE header).

To avoid breaking existing CRC64-AVRO fingerprints, add a new
fingerprint type CRC64-AVRO-LE, identical to CRC64-AVRO except
little-endian.

Add NewWithByteOrder and SumWithByteOrder top-level functions to crc64
so users can configure the hasher to use a specific byte order.

Add tests and benchmarks for the SumWithByteOrder function.

Fixes hamba#489.
Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nrwiersma nrwiersma merged commit e9a408d into hamba:main Jan 30, 2025
12 checks passed
@kimgr
Copy link
Contributor Author

kimgr commented Jan 30, 2025

Thanks! Oh, I forgot about the README in all the revisions. Were you thinking something like "if you want to use a schema fingerprint for SOE, use CRC64-AVRO-LE because reasons"?

@nrwiersma
Copy link
Member

No stress, it will be documented in the release. I think that should be fine.

@kimgr
Copy link
Contributor Author

kimgr commented Jan 30, 2025

Alright, I'll leave it to you. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

CRC64 schema fingerprint endianness
2 participants