Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix serializing of time/date columns using yaml serializer #1458

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fatkodima
Copy link
Contributor

We are currently using paper_trail and have billions of items in the versions table and the table is huge.

I noticed, that for yaml serializer date/time objects are serialized as ruby objects. Something like

--- !ruby/object:ActiveSupport::TimeWithZone
utc: 2024-01-27 18:01:54.627764000 Z
zone: !ruby/object:ActiveSupport::TimeZone
  name: Etc/UTC
time: 2024-01-27 18:01:54.627764000 Z

This generates 179 bytes per field.

But that should be serialized into the string. Something like

--- 2024-01-27 18:03:07 UTC

That is 28 bytes long, so 150 bytes difference.

Considering that most people have at least 2 datetime columns (created_at and updated_at) for each table, that saves 300 bytes per row in the table.
For example, if we have a table with 4 billion rows, that is 4 * 10^9 * 300 / 10^9 = 1200 GB 😱 saved.

  • Wrote good commit messages.
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Added an entry to the Changelog if the new
    code introduces user-observable changes.
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences.

@fatkodima fatkodima force-pushed the fix-serializing-of-time-columns branch 2 times, most recently from ed4f74e to a4730dc Compare January 27, 2024 19:01
@fatkodima
Copy link
Contributor Author

Ok, this is not working for MySQL (type_cast still returns a Date/Time object instead of String).
But, json serializer in this gem already works as expected and converts it using .as_json. Maybe we should use this too instead of type_cast?

@viral810
Copy link

Ok, this is not working for MySQL (type_cast still returns a Date/Time object instead of String). But, json serializer in this gem already works as expected and converts it using .as_json. Maybe we should use this too instead of type_cast?

@fatkodima I think that makes sense to me. I tested out with MySQL and using .as_json returns a string and works as expected.

@jaredbeck
Copy link
Member

Ok, this is not working for MySQL ..

It sounds like we have insufficient test coverage, then?

But, json serializer in this gem already works as expected ...

We've had a few issues with YAML serialization over the years, and I've often thought of changing the default from YAML to JSON. In fact, I think the best choice is a jsonb column, which obviates PT-serialization, and enables performant db queries.

You can also write a custom serializer. In fact, it would be great if you could try that first in your own app, and report back after a few months in production.

Please see https://github.com/paper-trail-gem/paper_trail?tab=readme-ov-file#6b-custom-serializer

@fatkodima
Copy link
Contributor Author

You can also write a custom serializer. In fact, it would be great if you could try that first in your own app, and report back after a few months in production.

We are currently running this patch in production for postgres for 4 months already.

@jaredbeck
Copy link
Member

We are currently running this patch in production for postgres for 4 months already.

Oh, great! Which RDBMS are you using in production? I'm trying to understand the above statement "this is not working for MySQL"

expect(attrs["created_at"]).to match(/2015/)
else
expect(attrs["created_at"].to_i).to eq(time.to_i)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment here explaining the difference between RDBMS?

@klass.connection.type_cast(serialized)
else
serialized
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems similar to TypeSerializers::PostgresArraySerializer, in that it wraps a particular type (in this case Date, Time). Should we follow that pattern, and have a TypeSerializers::DateTimeSerializer?

@fatkodima fatkodima force-pushed the fix-serializing-of-time-columns branch from a4730dc to 5a6ff78 Compare May 30, 2024 11:56
@fatkodima
Copy link
Contributor Author

Updated with the suggestion, please take a look.

Copy link

This PR has been automatically marked as stale due to inactivity.
The resources of our volunteers are limited.
If this is something you are committed to continue working on, please address any concerns raised by review and/or ping us again.
Thank you for all your contributions.

@github-actions github-actions bot added the Stale label Aug 29, 2024
@fatkodima
Copy link
Contributor Author

Ping.

@github-actions github-actions bot removed the Stale label Aug 30, 2024
Copy link

This PR has been automatically marked as stale due to inactivity.
The resources of our volunteers are limited.
If this is something you are committed to continue working on, please address any concerns raised by review and/or ping us again.
Thank you for all your contributions.

@github-actions github-actions bot added the Stale label Nov 28, 2024
@fatkodima fatkodima removed the Stale label Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants