Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create model schema from a JSON file #1054

Merged
merged 12 commits into from
Mar 5, 2025
Merged

Create model schema from a JSON file #1054

merged 12 commits into from
Mar 5, 2025

Conversation

datejada
Copy link
Member

@datejada datejada commented Feb 26, 2025

This pull request includes the main changes to read the schema from a JSON file, which also includes a description, units, default, and constraints of the parameters in the model. The main changes are:

Configuration Management:

Code Enhancements:

Schema Handling:

  • src/input-schemas.jl: Refactored the schema definitions to read from a JSON file instead of hardcoding them in the script. This change simplifies the maintenance and updates of schema definitions.

Documentation updates:

  • docs/src/50-schemas.md: Updated the documentation to inform users that schemas can now be found in the input-schemas.json file and provided an example code snippet for generating Markdown documentation from the JSON schema. [1] [2]

Related issues

Closes #992, #1039

Checklist

  • I am following the contributing guidelines
  • Tests are passing
  • Lint workflow is passing
  • Docs were updated and workflow is passing

Copy link

codecov bot commented Feb 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.79%. Comparing base (aefad2e) to head (87cf5ee).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1054   +/-   ##
=======================================
  Coverage   97.79%   97.79%           
=======================================
  Files          29       29           
  Lines         951      951           
=======================================
  Hits          930      930           
  Misses         21       21           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@datejada datejada added the benchmark PR only - Run benchmark on PR label Feb 26, 2025
Copy link
Contributor

github-actions bot commented Feb 26, 2025

Benchmark Results

aefad2e... 87cf5ee... aefad2e... / 87cf5ee...
energy_problem/create_model 34.5 ± 3.4 s 35.1 ± 2.8 s 0.981
energy_problem/input_and_constructor 26.1 ± 0.062 s 26.2 ± 0.086 s 0.998
time_to_load 2.59 ± 0.015 s 2.61 ± 0.0048 s 0.991
aefad2e... 87cf5ee... aefad2e... / 87cf5ee...
energy_problem/create_model 0.204 G allocs: 11.4 GB 0.204 G allocs: 11.4 GB 1
energy_problem/input_and_constructor 26.8 M allocs: 0.952 GB 26.8 M allocs: 0.952 GB 1
time_to_load 0.159 k allocs: 11.2 kB 0.159 k allocs: 11.2 kB 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@datejada
Copy link
Member Author

@suvayu and @abelsiqueira these are the changes to have the schema (and more) in a JSON file. I have updated the documentation to create the parameter section from this file, including the units of measurement, description, constraints, and default values. Your comments and suggestions are more than welcome 😉

@clizbe
Copy link
Member

clizbe commented Mar 1, 2025

This is great! But now that we want the true schema inside the model, we should probably (after merging this) do another update that makes this JSON whatever format we want the user to send to TEM. I guess that's the User Format and then we use pre-processing to make the internal format. Let's discuss on Monday. :)

Copy link
Member

@clizbe clizbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I'm happy about this change but have a couple of questions.

@@ -2,7 +2,7 @@

The optimization model parameters with the input data must follow the schema below for each table. To create these tables we currently use CSV files that follow this same schema and then convert them into tables using TulipaIO, as shown in the basic example of the [Tutorials](@ref basic-example) section.

The schemas can be accessed at any time after loading the package by typing `TulipaEnergyModel.schema_per_table_name` in the Julia console. Here is the complete list of model parameters in the schemas per table (or CSV file):
The schemas can be found in the `input-schemas.json` or can be accessed at any time after loading the package by typing `TulipaEnergyModel.schema_per_table_name` in the Julia console. Here is the complete list of model parameters in the schemas per table (or CSV file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does this command produce a proper output now? When I tried it, it only gave me a statement of "structure with 20 rows" or something lame but not the actual output.

Copy link
Member

@suvayu suvayu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first pass!

Comment on lines +193 to +194
null,
"basic"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intention here? Does null mean no unit_commitment? In that case does TEM still expect this column in the tables?

I think the logically correct way to do this would be something like this:

{
  "unit_commitment_integer": {...},
  "unit_commitment_method": {
    ...
    "constraints": {
      "oneOf": ["basic", "something-else"]
    }
  },
  "required": ["unit_commitment_integer"]
}

This means, there will always be a unit_commitment_integer column, and unit_commitment_method will be present only when the previous one is true (which could be checked in code).

I understand if the current structure in TEM does not allow that, in that case this could be a future improvement.

Copy link
Member

@abelsiqueira abelsiqueira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comments. I don't know much about JSON Schemas, so take it with a grain of salt.
Another comment is that since the JSON schema is a particular type of file, it could benefit from a linter (quick googling gave this option: https://github.com/python-jsonschema/check-jsonschema). @suvayu might have a better recommendation.

I got a bit confused on how you're planning to use this, but since you're already in discussion with @suvayu on this, I will leave the details to you

"assets_profiles": {
"asset": {
"UoM": null,
"constraints": null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a way to constrain the value of asset to be one of the values from the asset table. I don't actually know if it is possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's possible, but we are not going to use them completely as it should (I'll respond to your earlier "how to use" question separately), so this is something we could do explicitly in code for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! I will check if it is possible. Thanks!

@suvayu
Copy link
Member

suvayu commented Mar 3, 2025

I got a bit confused on how you're planning to use this, but since you're already in discussion with @suvayu on this, I will leave the details to you

So while discussing this during before starting on the change, I realised that doing this "properly" might be difficult because I didn't know any tooling within Julia to do this kind of schema based validation. So we essentially simplified the schema into a map of mini schemas. Using it then becomes simpler, because you don't do complex validation anymore, it's just for one column.

E.g. you will see it doesn't define the columns properly, something like this would be the proper way to do it.

{
   "name": string,
   "type": object,
   "properties": [
      {
        "name": "type",
        "type": string,
        "oneOf": ["INTEGER", "DOUBLE", "VARCHAR", "BOOL"]
      },
      {
        "name": "constraints",
        "type": object
      },
      {
         "name": "UoM",
         "type": string
      },
      {
         "name": "default"
      }
    ]
    "required": ["type"]
}

But then you need libraries that generate these schemas, and/or that can then validate data using these later (e.g. pydantic, or marshmallow in Python).

So we simplified our approach to whatever we can use effectively. So it is now a slightly more easily extendable version of the OrderedDict already in input-schemas.jl. It adds a few additional fields, like units, default, description, etc. And the use in code is unchanged, you get the same OrderedDict as before, but in the docs you can have richer more informative lists/tables.

I hope that answers your doubt.

EDIT: If you do it "properly" then you can do references, and restrict a value based on what you had somewhere else. But there are still some limits to it, since JSON schema does not have a generic conditional primitive, it's all predefined so what you can do is limited.

Copy link
Member

@abelsiqueira abelsiqueira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra info, @suvayu.
LGTM

@datejada datejada force-pushed the 992-schema-json-file branch from 039310f to d796722 Compare March 4, 2025 17:40
@datejada
Copy link
Member Author

datejada commented Mar 4, 2025

Hi @suvayu and @clizbe, After giving it some more thought, I decided to adopt different schemas for the assets_profiles and assets_timeframe_profiles. While they share the same columns, their constraints are different. This distinction wasn't relevant before we introduced the JSON file, which is why we previously reused the schema. Now that it is relevant, having separate schemas for each file allows us to establish a direct 1:1 relationship between the table definitions and their schemas. This change will streamline the code, making it cleaner and less repetitive.

Please let me know if you agree or if you have any additional comments or suggestions.

Thank you!

@datejada datejada requested review from suvayu and clizbe March 4, 2025 18:49
@datejada datejada requested a review from suvayu March 5, 2025 08:38
@clizbe
Copy link
Member

clizbe commented Mar 5, 2025

@datejada WOW that was a good decision because the new implementation is gorgeous! 😍

@datejada datejada requested a review from clizbe March 5, 2025 10:41
Copy link
Member

@clizbe clizbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks GREAT!
Sorry for the slow review.

@clizbe clizbe merged commit ae662fe into main Mar 5, 2025
8 checks passed
@datejada datejada deleted the 992-schema-json-file branch March 6, 2025 07:16
@clizbe clizbe mentioned this pull request Mar 6, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark PR only - Run benchmark on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store schema in a JSON file
4 participants