Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider storing more field details in schema #55

Open
Shelnutt2 opened this issue Jan 28, 2018 · 0 comments
Open

Consider storing more field details in schema #55

Shelnutt2 opened this issue Jan 28, 2018 · 0 comments

Comments

@Shelnutt2
Copy link
Owner

Consider changing on disk cap'n proto schema so that each column is stored as a field struct. There is a lot of metadata we are not storing right now, such as nullability of a column, or default values. Default values could be stored if they are a constant but if they are an expression we must rely on mariadb for storing these values in the .frm file.

In order to do auto table discovery, we have to store everything needed in the table data (data files or schema files) ourselves.

The limitation of inplace later tables when adding a column with a default and nullable (and it always being null for existing rows) is a because we are not storing this information.

The downside here, is right now each row is stored simply, does it really make sense to pack all this extra information into each and every single row? It would great increase disk space, and processing of data that is constant (for the schema version).

Perhaps we introduce a new data file that goes along with the schema, and contains the metadata of the table? We keep the "rows" struct compact and simple, and we create new struct to present the table and all the metadata. The advantage this way is the data files stay compact, we only have to write "table metadata" once per schema change. The downside is in any single row data file you'd be missing the data required for logic, such as if a column has no value but uses a default expression. New struct/metadata files also increases the complexity. Right now storage is simple. The downside is virtual columns are not supported.

I'm leaning toward new struct and new data files. It does not make sense to expand the row structure so that every column is a field struct and contains all metadata, it'd be a massive amount of duplication, and we have no compression (yet)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant