Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an automated build process #989

Merged
merged 2 commits into from
Nov 13, 2024
Merged

Introduce an automated build process #989

merged 2 commits into from
Nov 13, 2024

Conversation

fkleedorfer
Copy link
Collaborator

@fkleedorfer fkleedorfer commented Oct 28, 2024

This is a - for now prototypical - implementation of a build process based on maven.

Structure

  • pom.xml: defines the build process
  • src/main/rdf - rdf sources
  • src/main/docs - the old docs folder
  • src/main/build - anything needed during the build that is not part of the final result
  • src/main/build/assembly/releaseZip.xml - defines the contents of the release zip
  • src/main/build/inference - generate triples via shacl inference
  • target - the folder generated during the build process
  • target/dist - everything in there goes into the release zip
  • target/inferences - ttl files generated during the build go here (=inferred triples)
  • target/validation - SHACL validation report

Operations

  • check format of all ttl files under src/main/rdf
    • (fix formatting by calling spotless plugin explicitly)
  • calculate applicableUnits
  • check the SHACL rules on the union of all vocab files, the inferred data and the schema.

Note

  • Using this build process, the github repo will no longer contain all triples that are considered part of the official distribution. Rather, each user can run the build and create the contents of the distribution (if a release was made based on the current state of the repo)

Still TODO

  • add inferred triples to the appropriate file ( applicable units --> quantitykinds file)
  • make sure the release zip contains everything we need (and nothing else) <-- @steveraysteveray @ralphtq @jhodgesatmb pls check!
  • add a github action to trigger build on push to PR and make a snapshot release when a PR is merged
  • add a github action for making a release
  • create a changelog and include it in the release process
  • decide whether we want to leave the rdf sources in the folder structure we have currently on the main branch or if moving them to src/main/rdf is acceptable
  • during the build, set the current version in all places in it is used in (in target), either use a variable in those places or replace the previous version with the current one

TODOs that we should do after merging this PR but before the next release

  • verify that build-on-PR merge release really works
  • verify that manual release action really works and that we like the process
  • remove qudt version from filenames and other places to allow for adopting semantic versioning (because in the Changelog we talk about semantic versioning, so if we tell people, we should be in the position to execute)

TODOs we can do later

  • transfer the old release messages into the changelog

Addresses #959

@steveraysteveray
Copy link
Collaborator

One question comes to mind - how do we still support distributing the OWL schema as well as our default SHACL schema?

@fkleedorfer
Copy link
Collaborator Author

how do we still support distributing the OWL schema as well as our default SHACL schema?

I don't see how this question is related to this PR - I will try not to change any RDF content (only the severity of one SHACL shape needs to be touched). What am I missing?

@steveraysteveray
Copy link
Collaborator

steveraysteveray commented Nov 6, 2024

(Answers relate to the build process as it is currently implemented)

I'm a little unclear whether a user can access a working set of vocabulary and schema graphs without running the build.

No, not without running the build.

Will a zip file exist on the repository?

No, not in the git repo - but the latest state of the main branch will always be tagged snapshot and a corresponding release will be on github. That release will contain the release zip. The release/tag is updated with each merged PR to main.

Will the quantity kind file exist with applicableUnit values somewhere in the repository, either > under src/main/rdf, or elsewhere?

No, that file is generated during the build and ends up in target/dist/vocab/quantitykinds/ as well as in the zip file in target/qudt-public-repo-{version}.zip

@fkleedorfer
Copy link
Collaborator Author

Note: with the introduction of the changelog in its current form we state that we adhere to semantic versioning, and we should require/encourage each PR to update the changelog.

@steveraysteveray
Copy link
Collaborator

I have installed Maven as per your instructions and it works as you state. However, is there not a way to make the latest version of QUDT available on our repository so that all I would need is to do a git pull of the repo and I'm back in business? Currently that is how my environment works in TopBraid. I worry that we could lose users if they have to do the extra build steps or even extract the zip file.

Stated another way, even if we adopt the architecture you describe, could we not set up a separate folder somewhere that just contains the "built" graphs? Can we not invoke Maven on the Git host?

@fkleedorfer
Copy link
Collaborator Author

fkleedorfer commented Nov 6, 2024

It is definitely possible to include the build results in the repo. All we would need to do is un-ignore the target folder (or maybe just target/dist plus some extra files, and add those files to the repo). I have never seen that done anywhere else, though. It is probably fair to consider it an antipattern.

Let's weigh advantages and disadvantages of including the build results in the git repo:
Advantages:

  1. Users working with QUDT off a clone of the git repo
    1 . can continue to do so (they would have to change the paths of the files they use, though)
    2. are not required to
    1. either trigger a maven build in their own build process (that could be hard)
    2. or change their git pull/clone to a zip download (which is not so hard)
  2. The changes to the final output files will be tracked over time (you'll be able to do a diff on these files).
  3. Users that include qudt-public-repo as a git submodule can continue to do so (but have to use different files in that repo)

Disadvantages:

  1. Every commit will have changes in the built files if the committer ran maven before committing
  2. If the committer of a change does not commit the built files or does not run maven, there will be 'implicit changes' to the built files which materialize the next time maven is run. Currently every push to a PR to main triggers a maven build on github - during that build, the implicit changes would materialize - the repository files would get additional changes, which would require an additional commit - and if we did that, the user would have to pull that commit before continuing their work on their checked out branch. That would be odd, so we would probably have to fail the build on github if it creates additional changes (which would be easy to do and not be too weird, but still annoying).
  3. The repository becomes bigger, especially if we also check in the release zip, which over time leads to noticeably slower cloning of the repo and operations like history for a file. As the repo is already big and those operations aren't super fast, that may become an issue (but probably not a showstopper)
  4. New users - who might just as well use the release zip for their work - may instead decide to work off a clone and perpetuate this situation
  5. New users might get confused as to which files to use (the source files or the build results)
  6. The advantages cited above apply mostly to users who directly use the git repo. We do not know how many they are but I do not think many. The disadvantages are more general and affect (annoy) many users
  7. If we do it this way, we'll never know if we could have done it the other way because nobody turned out to have been affected

I am leaning toward only tracking the source files in the git repo, not the results:

  • I believe most users consume qudt via the linked data endpoint. I assume that not many (if any) are in a situation that will be very negatively affected by this decision.
  • If we get a lot of substantiated complaints, we can still reconsider. If we decide otherwise at the outset, we will never know.
  • The end result we deliver will be exactly the same as before, and that should be the most important goal.
  • We should be free in choosing how we want to create our artifacts. People should not depend on our way of doing it, as it may change in the future.

Two alternative ideas, if we must have the result files in a git repo:

  1. create an additional branch, let's say, build_output, which at all times contains the build result (ie, the contents of the zip) of the main branch.
  2. create an additional repo, that the build results of the main branch are always pushed onto.

@fkleedorfer
Copy link
Collaborator Author

... But I might have misunderstood your specific problem, which is working with the source of any branch (not necessarily main or even a PR). You now expect to find the final (built) version of all files on any branch upon checkout, ie, without running a build. I don't think that will be possible (unless we add the built files to the repo as normal files). However, I think it will only be a problem for people doing qudt admin work.

@steveraysteveray
Copy link
Collaborator

@fkleedorfer, you are raising some great issues, and some great architectural candidate solutions. We should definitely talk these through and come to a decision soon. You have captured my concerns as well.

@fkleedorfer
Copy link
Collaborator Author

Well, if we agree that we do not want generated files as part of the normal repo, we can just leave the build system as is in that regard and work on its other functionality (eg version number replacement)

If we ever determine that we really need it, we can create the built-branch or built-repo later and just recreate the commits from our release tags (check out tag, build project, copy to branch/repo, commit) and include a step in our github action that pushes the built files onto the branch/repo)

@fkleedorfer
Copy link
Collaborator Author

Regarding the version number replacement:

I think the best solution is to put a placeholder in every position where we want the current version to be put.
Here are all occurrences of the version in our files:

$ grep -R 2\\.1\\.44 *
CHANGELOG.md:## [2.1.44] - 2024-10-27
CHANGELOG.md:[2.1.44]: https://github.com/qudt/qudt-public-repo/compare/v2.1.43...v2.1.44
README.md:The QUDT ontologies (Release 2.1.44) have been tested to load without error in Protege 5.6.4.
src/main/rdf/collections/COLLECTION_QUDT_QA_TESTS_ALL-v2.1.ttl:  rdfs:label "QUDT Collection - QA TESTS - ALL - v 2.1.44" ;
src/main/rdf/collections/COLLECTION_QUDT_USER_TESTS-v2.1.ttl:  rdfs:label "QUDT Collection - USER TESTS - v 2.1.44" ;
src/main/rdf/community/mappings/SSSOM/IFC/README.md:* Subsequent version of [QUDT 2.1.44](https://github.com/qudt/qudt-public-repo/releases/tag/v2.1.44) 
src/main/rdf/schema/SCHEMA-FACADE_QUDT-v2.1.ttl:  rdfs:label "QUDT SCHEMA Facade graph - v2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-DATATYPE-v2.1.ttl:  vaem:graphTitle "QUDT Schema for Datatypes - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-DATATYPE-v2.1.ttl:  vaem:title "QUDT Schema for Datatypes - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-DATATYPE-v2.1.ttl:  rdfs:label "QUDT Schema for Datatypes - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  rdfs:label "QUDT Schema - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  dcterms:title "QUDT Schema - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  vaem:graphTitle "Quantities, Units, Dimensions and Types (QUDT) Schema - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  rdfs:label "QUDT Schema - Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  dcterms:title "QUDT SHACL Schema - Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  vaem:graphTitle "Quantities, Units, Dimensions and Types (QUDT) SHACL Schema - Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Metadata Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Supplement Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  dcterms:title "QUDT SHACL Schema Overlay - Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  vaem:graphTitle "Quantities, Units, Dimensions and Types (QUDT) SHACL Schema Overlay - Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Overlay Metadata Version 2.1.44" ;
src/main/rdf/vocab/constants/VOCAB_QUDT-CONSTANTS-v2.1.ttl:  rdfs:label "QUDT VOCAB Physical Constants Release 2.1.44" ;
src/main/rdf/vocab/constants/VOCAB_QUDT-CONSTANTS-v2.1.ttl:  vaem:graphTitle "QUDT Constants Version 2.1.44" ;
src/main/rdf/vocab/constants/VOCAB_QUDT-CONSTANTS-v2.1.ttl:  rdfs:label "Physical Constant Vocabulary Version 2.1.44 Metadata" ;
src/main/rdf/vocab/currency/VOCAB_QUDT-UNITS-CURRENCY-v2.1.ttl:  rdfs:label "QUDT VOCAB Currency Units Release 2.1.44" ;
src/main/rdf/vocab/currency/VOCAB_QUDT-UNITS-CURRENCY-v2.1.ttl:  vaem:graphTitle "QUDT Currency Units Version 2.1.44" ;
src/main/rdf/vocab/currency/VOCAB_QUDT-UNITS-CURRENCY-v2.1.ttl:  rdfs:label "QUDT Currency Unit Vocabulary Metadata Version 2.1.44" ;
src/main/rdf/vocab/dimensionvectors/VOCAB_QUDT-DIMENSION-VECTORS-v2.1.ttl:  rdfs:label "QUDT VOCAB Dimension Vectors Release 2.1.44" ;
src/main/rdf/vocab/dimensionvectors/VOCAB_QUDT-DIMENSION-VECTORS-v2.1.ttl:  vaem:graphTitle "QUDT Dimension Vectors Version 2.1.44" ;
src/main/rdf/vocab/dimensionvectors/VOCAB_QUDT-DIMENSION-VECTORS-v2.1.ttl:  rdfs:label "QUDT Dimension Vector Vocabulary Metadata Version 2.1.44" ;
src/main/rdf/vocab/prefixes/VOCAB_QUDT-PREFIXES-v2.1.ttl:  rdfs:label "QUDT VOCAB Decimal Prefixes Release 2.1.44" ;
src/main/rdf/vocab/prefixes/VOCAB_QUDT-PREFIXES-v2.1.ttl:  vaem:graphTitle "QUDT Prefixes Version 2.1.44" ;
src/main/rdf/vocab/prefixes/VOCAB_QUDT-PREFIXES-v2.1.ttl:  rdfs:label "QUDT Prefix Vocabulary Version Metadata 2.1.44" ;
src/main/rdf/vocab/quantitykinds/VOCAB_QUDT-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT Quantity Kind Vocabulary Version 2.1.44" ;
src/main/rdf/vocab/quantitykinds/VOCAB_QUDT-QUANTITY-KINDS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Quantity Kinds Version 2.1.44" ;
src/main/rdf/vocab/quantitykinds/VOCAB_QUDT-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT Quantity Kind Vocabulary Metadata Version 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT VOCAB Systems of Quantity Kinds Release 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  dcterms:description "QUDT Systems of Quantity Kinds Vocabulary Version 2.1.44"^^rdf:HTML ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Systems of Quantity Kinds Version 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT System of Quantity Kinds Vocabulary Version 2.1.44 Metadata" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT VOCAB Systems of Units Release 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  dcterms:description "QUDT Systems of Units Vocabulary Version 2.1.44"^^rdf:HTML ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Systems of Units Version 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT System of Units Vocabulary Metadata Version v2.1.44" ;
src/main/rdf/vocab/types/VOCAB_QUDT-DATATYPES-v2.1.ttl:  rdfs:label "QUDT Vocabulary of Datatypes v2.1.44" ;
src/main/rdf/vocab/types/VOCAB_QUDT-DATATYPES-v2.1.ttl:  vaem:title "QUDT Vocabulary for Datatypes - Version 2.1.44" ;
src/main/rdf/vocab/types/VOCAB_QUDT-DATATYPES-v2.1.ttl:  rdfs:label "QUDT Vocabulary for Datatypes - Version 2.1.44" ;
src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT VOCAB Units of Measure Release 2.1.44" ;
src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Units Version 2.1.44" ;
src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.44" ;

All of these are RDF string properties or Text, so putting a placeholder like ${qudt.version} there will not make the data unusable prior to replacement. So, that would be my preferred solution.

Example:

in the source file src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl the line

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.44" ;

becomes

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version ${qudt.version}" ;

In a normal build (not a release, this would be replaced by the snapshot version, such as:

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.45-SNAPSHOT" ;

Whereas in a release build, this would be replaced by the release version, such as:

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.45" ;

@steveraysteveray
Copy link
Collaborator

Just so I understand, if we start with a Release of 2.1.44, and several merges are performed from various PRs, do we get:
Existing Release: 2.1.44
First PR merge: 2.1.45-SNAPSHOT
Second PR merge: 2.1.45-SNAPSHOT
Next Release: 2.1.45

Is that how it works, or does the 45 keep getting incremented with each merge?

@steveraysteveray
Copy link
Collaborator

From your earlier comment:

Two alternative ideas, if we must have the result files in a git repo:

  1. create an additional branch, let's say, build_output, which at all times contains the build result (ie, the contents of the zip) of the main branch.
  2. create an additional repo, that the build results of the main branch are always pushed onto.

I'm leaning toward your second idea. We could name the repo something like qudt-distribution as a read-only repo to support users who have git-aware applications that want to just 'git pull' the latest main branch, or even pull some earlier version. This way, people could use the qudt-public-repo with Maven that will support contributions and edits if they want, or qudt-distribution that will reflect the latest Release or snapshot.

@fkleedorfer
Copy link
Collaborator Author

Just so I understand, if we start with a Release of 2.1.44, and several merges are performed from various PRs, do we get:
Existing Release: 2.1.44
First PR merge: 2.1.45-SNAPSHOT
Second PR merge: 2.1.45-SNAPSHOT
Next Release: 2.1.45

Is that how it works, or does the 45 keep getting incremented with each merge?

Sorry, I did not explain this.

It is actually very simple because there is no magic at all, it's all manual:

The version number is in the pom.xml file, near the top:

<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.qudt</groupId>
    <artifactId>qudt-public-repo</artifactId>
    <version>2.1.45-SNAPSHOT</version>                   <--- there
    <packaging>pom</packaging>

This is the current version. The convention, as far as I've seen it is this:

  1. while you work on the code and keep making commits, the version does not change and is always the version you are aiming for with '-SNAPSHOT' appended. (i.e. 12.34-SNAPSHOT).
  2. when you make a release, the maven-release-plugin is run behind the scenes, which requires as parameters the version you want to release and the next 'development' (=snapshot) version. We pass these arguments to the github action that makes the release - the only two arguments it requires. It changes the version in the pom file, we package that for the release, and then it changes the version in the pom file to the next development version. (maybe you wouldn't call that whole process 'manual' as i did above - but at least there is no automatic version increment)

So, it's like you suspected:

  • existing Release: 2.1.44
  • First PR merge: 2.1.45-SNAPSHOT
  • Second PR merge: 2.1.45-SNAPSHOT
  • Next Release: 2.1.45 - unless you decide to release a different version, which may be warranted depending on what has changed (see semantic versioning, if we want to adhere to that). You'd pass 2.1.45 as the release version and 2.1.46-SNAPSHOT as the next development version in the github user interface to achieve that.

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Nov 8, 2024 via email

@fkleedorfer
Copy link
Collaborator Author

Rebased on main and made all necessary changes.

I think we can remove DRAFT status.

Also, I think it would be a good idea to switch our settings such that the default action for PRs is to squash and rebase.

In case of a squash I would use the commit message of the first commit in this branch as the commit message.

@steveraysteveray
Copy link
Collaborator

I'm about to jump into another meeting, but I think your applicableUnits algorithm needs to ignore any deprecated units. I'm getting lots of validation errors because quantity kinds are referring to some deprecated units. Somewhere in your nested SELECT calls you need a

FILTER NOT EXISTS {?unit qudt:deprecated true}

@fkleedorfer
Copy link
Collaborator Author

Excellent catch @steveraysteveray - I had lowered the severity of the shacl shape to warning to make the build not fail. I had not seen the connection with the applicableUnits calculation. Fixed now.

@steveraysteveray
Copy link
Collaborator

Thanks @fkleedorfer, the target files that are in the imports closure now pass the validation cleanly!

I stepped back to take a look at where we are now, and I have some observations/questions. I used the quantitykind file to focus my observations.

  1. Before running mvn clean install there is no version of quantitykinds that is ready "out of the box" (i.e. including the applicableUnit triples)

  2. After running mvn clean install there are two places to find the usable version of quantitykinds:
    a. In target/dist/vocab/quantitykinds
    b. In target/qudt-public-repo-2.1.45-SNAPSHOT.zip
    I suppose that's OK, but people might wonder if they are the same, or which is the "right one". We could discuss whether we want a zip file outside of Releases.

  3. After running a subsequent mvn clean, a git status shows the following untracked files:

	src/main/resources/docs/2020-04-28 Intro to QUDT.pdf~01e299f62ba843befaf5266de2c8cc38c32d3682
	src/main/resources/docs/2020-04-28 Intro to QUDT.pdf~HEAD
	src/main/resources/docs/2020-04-28 Intro to QUDT.pptx~01e299f62ba843befaf5266de2c8cc38c32d3682
	src/main/resources/docs/2020-04-28 Intro to QUDT.pptx~HEAD
	src/main/resources/docs/test.txt~01e299f62ba843befaf5266de2c8cc38c32d3682
	src/main/resources/docs/test.txt~HEAD

Are these remnants of a merge conflict? They should be removed, right? Or even ignore that whole folder in your scripts?

  1. There is a top-level folder named vocab, with two empty subfolders, quantitykinds and unit. Are they forgotten leftovers, or are they needed for some reason?

  2. I'm still undecided whether it would be better to deposit a set of "usable" files (i.e. including the quantitykind file containing applicableUnits) in some separate branch on this repo, or in a separate repo like qudt-dist. My own pros and cons:

  • Separate branch in qudt-public-repo
    • Pro: Only one repo for users to know about and keep up to date
    • Con: People who want to just use the files without running maven would go to a branch other than main, which seems counterintuitive.
  • Separate repo (qudt-dist)
    • Pro: People who are just users and don't expect to contribute just need to know about qudt-dist, and clone or fork the main branch, or download a Release
    • Con: People need to know about different repos for contribution vs. use

(For item 5. above, I'm thinking we have 2 communities we want to support:

  1. Contributors/developers - they could be expected to use Maven locally to validate their contributions, or at least look at the pipeline results when they git push
  2. Users who do not contribute - they would either download a Zip file or git pull from somewhere without invoking Maven

For my own work, I expect to jump back and forth between both communities. Group 1 for my own updates to QUDT, and Group 2 for my use of QUDT in committee work.

@fkleedorfer
Copy link
Collaborator Author

3 and 4 are accidents.

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Nov 9, 2024 via email

@steveraysteveray
Copy link
Collaborator

@jhodgesatmb, see the qudt-board Slack channel at 2:42pm on October 28.

@steveraysteveray
Copy link
Collaborator

@fkleedorfer, I'm still impressed by your applicableUnit query. I thought I would try to improve it because instances of qudt:SystemOfQuantityKinds also use the qudt:hasQuantityKind relation, so they could be accidentally picked up in other scenarios. I realize that in your pom.xml you only included the unit and quantitykind graphs to do the inferencing, but still I thought it would be good to make it more robust.

However, I clearly messed up with my addition of

?unit a qudt:Unit

line in the query, so in the end I just put things back the way they were...

@steveraysteveray
Copy link
Collaborator

@jhodgesatmb, to your first question, yes, I would imagine people in group 1 would create profile distributions, and people in group 2 would use such distributions.

@fkleedorfer
Copy link
Collaborator Author

@fkleedorfer, I'm still impressed by your applicableUnit query. I thought I would try to improve it because instances of qudt:SystemOfQuantityKinds also use the qudt:hasQuantityKind relation, so they could be accidentally picked up in other scenarios. I realize that in your pom.xml you only included the unit and quantitykind graphs to do the inferencing, but still I thought it would be good to make it more robust.

I will look into it, thanks!

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Nov 9, 2024 via email

@fkleedorfer
Copy link
Collaborator Author

@steveraysteveray another good catch - if I include the 'systems' files, the applicableUnits.ttl contains entries such as:

quantitykind:AbsoluteHumidity
  qudt:applicableUnit soqk:IMPERIAL ;
  qudt:applicableUnit soqk:USCS ;
  ...

Added the requirement for type qudt:Unit and verified it produces the same result as the original query without the systems files

?unit qudt:hasQuantityKind ?qk
?unit a qudt:Unit
?unit qudt:hasQuantityKind ?qk .
?unit rdf:type qudt:Unit .
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same statement (?unit rdf:type qudt:Unit) is also needed in line 79. I did that in a later commit, so don't worry about it.

@fkleedorfer
Copy link
Collaborator Author

fkleedorfer commented Nov 9, 2024

@steveraysteveray your points 3 and 4 from earlier I cannot reproduce. When you clone the repo in a new folder and check out the branch 959/maven, it looks clean to me

@fkleedorfer
Copy link
Collaborator Author

To your point 2 - it is totally possible to not make the zip file upon mvn install, and only build the zip during a release build.

@fkleedorfer
Copy link
Collaborator Author

it is totally possible to not make the zip file upon mvn install, and only build the zip during a release build.

Did that in the last commit. mvn -Pzip install now builds with the zip file

@dr-shorthair
Copy link
Contributor

Watching from a distance. Very pleased to see all this.

@fkleedorfer
Copy link
Collaborator Author

In Terms of Tasks left to do in this PR (copied from above):

These were the tasks:

  • add inferred triples to the appropriate file ( applicable units --> quantitykinds file)
  • [] make sure the release zip contains everything we need (and nothing else) <-- @steveraysteveray @ralphtq @jhodgesatmb pls check!
  • add a github action to trigger build on push to PR and make a snapshot release when a PR is merged
  • [] verify that build-on-PR merge release really works
  • add a github action for making a release
  • [] verify that manual release action really works and that we like the process
  • create a changelog and include it in the release process
  • decide whether we want to leave the rdf sources in the folder structure we have currently on the main branch or if moving them to src/main/rdf is acceptable
  • during the build, set the current version in all places in it is used in (in target), either use a variable in those places or replace the previous version with the current one

I've decided that:

  • the contents of the zip file have been reviewed multiple times, I believe it is ok
  • the github actions cannot be verified within the PR. The manual one is not available, and the automatic one will not be triggered until we merge. We'll move these tasks to the 'later' section

Which leaves us with the question whether or not we should leave the directory structure the way it is now: The former toplevel directories are moved to src/main/rdf (and src/main/resources/docs respectively), leaving at the top level just the src folder and a git-ignored target folder that is generated by the build.

A bit of background about this structure (what's main about here?): It's a maven convention. The main source code and other files go into src/main/. You usually also have test sources (unit/integration tests). Those go into /src/test/.

I would recommend that we keep the structure as it is in this PR at the moment:

Advantages:

  • it's a breaking change for everyone who uses the sources. They will have to look at what changed and adapt their code. This is good, because if we leave the toplevel folders where they were previously, the changes will be subtle (e.g. no applicableUnits triples in the quantitykinds file), and their code will break in subtle ways they might not notice for years.
  • we stick with the maven convention and those who know it will be able to work with it witouth problems
  • we might have actual tests in the future - we may provide some modelling examples, these could go into src/test

Disadvantages:

  • It's a change. We may have to update some scripts (but we will have to update some scripts either way)
  • It may look a bit disorienting. Having said that, the current structure does not necessarily look less disorienting. (Where is the qudt file?)

other considerations?

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Nov 11, 2024 via email

@fkleedorfer
Copy link
Collaborator Author

@jhodgesatmb there is some documentation on how to build the project in the README.md file in this PR.

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Nov 11, 2024 via email

## Functionality of the build system:

Run the build with `mvn install`

- check RDF source formatting
   - fail build if there are violations
   - enable to fix formatting via `mvn spotless:apply`
- remove all `qudt:applicableUnits` triples from the quantitykinds file
- copy all relevant sources to `target/dist`
- replace the version placeholder, `$$QUDT_VERSION$$` everywhere in `target/dist`
- infer `qudt:applicableUnits` by applying `src/build/inference/inferApplicableUnits.ttl` and
  add those triples to the quantitykinds file in `target/dist`
- evaluate all SHACL shapes on the build result in `target/dist` and fail the build if
  there are violations

Profile `zip` builds the release zip:

`mvn -Pzip install`

## Github Actions

Github actions are defined in `.github/workflows`:
- maven.yml - runs the build upon push to a PR or when a PR is merged to `main`. In the
  latter case, the action makes a github release `snapshot` and a tag with that name,
  which will overwrite the previous such snapshot release
- release.yml - manually invokable action that makes a release. Parameter `release_version`
  and `next_development_version` are required and will be used for making the release and
  preparing the repo for the next development cycle. This action makes changes to the repo
  (`pom.xml`, `CHANGELOG.md`), which are committed to a new branch, and a PR to `main` is
  created during its execution. This PR has to be merged manually when we are happy with
  the results of the release.

## Changes to sources that were required

- introduce a placeholder, `$$QUDT_VERSION$$` wherever the current version is needed
- move the rdf source folders from the root dir to `src/main/rdf`
   (not technically required, but makes clear what is source and what is generated)
- add CHANGELOG.md

## Other changes

- Folder structure: the `collections` folder ended up as `src/main/validation`

## Documentation

- build phases and associated plugin executions are listed in the comments in `pom.xml`
@fkleedorfer
Copy link
Collaborator Author

I've updated the pom.xml for the new folder structure and squashed everything into one commit with a nice commit message. I think this is ready for merge (preferably rebase, actually)

@steveraysteveray steveraysteveray marked this pull request as ready for review November 11, 2024 21:54
Copy link
Collaborator

@steveraysteveray steveraysteveray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dist folder loads cleanly into TopBraid 7.1.1, and validates there without errors.

The src/main/rdf folder loads successfully into a separate workspace in TopBraid 7.0 that I can use for the publication.

I will pause to let others do their testing.

@fkleedorfer
Copy link
Collaborator Author

The src/main/rdf folder loads successfully into a separate workspace in TopBraid 7.0 that I can use for the publication.

@steveraysteveray you'll have to change the code that replaces the version number in those files.

Would it not be better to create the release off the results in target/dist ?

@steveraysteveray
Copy link
Collaborator

We will need some careful sequencing here. Publication of the web pages is distinct from creation of the GitHub Release. We need access to the src files to update the version numbers and metadata like the date of publication, for both publication and the Release. I cannot have both the src graphs and the dist graphs present in the TopBraid workspace at the same time because of base URI conflicts. So, the sequence could be something like:

  1. Update the version numbers in src (just increment to 2.1.45 this time?)
  2. Run the Maven build
  3. Load just the dist folder into TopBraid (along with lmdoc, the private QUDT repo, (and mathjax and QUDT Customizations)).
  4. Run the publication script which invokes many webservices from TopBraid. Problem here is it will not be able to update the graph metadata in the src folder. It will update the metadata in the dist graphs, but those will be overwritten with the next Maven build.

So in the near term, I could do the sequence above. The published web pages would have the correct metadata, but the src files would not. Then I could manually invoke just the metadata update routines in a separate workspace, operating on the src files, and push those in a separate PR. This will be a little tricky, but doable I believe.

Then we could discuss the testing and execution of the GitHub Release.

How does this sound?

@fkleedorfer
Copy link
Collaborator Author

I think, for now, we don't need to release a new version. I expect some bugfixing and tweaking after we merge this PR. So we have time to get the publication workflow fixed.

I think it is a good idea to work on that after we merge. Maybe we can automate it fully, too.

However, I don't seem to fully understand the problems you describe. For one, if the solution is to do a manual find/replace in src and then run the maven build (which replaces the version placeholder with the version), why to the replacing in the first place? And why not just use the release zip (once it has been released on github).

I don't understand the problem with the metadata either... Is it that we should do a second replace for the date of publication? It would be no problem to add that to this PR! Using the placeholder approach, it's really quite easy. We can also adapt that approach for IRIs or non-string literals if needed.

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Nov 12, 2024 via email

@steveraysteveray
Copy link
Collaborator

OK, here's the excerpt from the webservices call to update the metadata in each of the graphs:

                                ?meta dcterms:modified ?newMod .
                                ?meta vaem:latestPublishedVersion ?newLatestU .
                                ?meta vaem:previousPublishedVersion ?newPreviousU .

and for the catalog:

                            ?entry lmcat:publicationDate ?yearMonthDay .

So if we can do that in this PR, that could work, along with your placeholder approach for the version. Want to take a shot at that? Then I can comment out the respective calls in the script and proceed with publishing the web pages.

Happy to drive all this from either the dist folder or the zip file.

@steveraysteveray
Copy link
Collaborator

...or we could just merge the current PR and keep working on the publication from there.

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Nov 12, 2024 via email

@fkleedorfer
Copy link
Collaborator Author

...or we could just merge the current PR and keep working on the publication from there.

That's the better approach.

Sounds like we can make these webservice calls from the github action. Let's work on that in a subsequent PR.

@fkleedorfer
Copy link
Collaborator Author

@ralphtq can we merge?

@fkleedorfer
Copy link
Collaborator Author

Oops, I had forgotten to update the changelog!

@steveraysteveray steveraysteveray merged commit 332ad2c into main Nov 13, 2024
1 check passed
@steveraysteveray steveraysteveray deleted the 959/maven branch November 13, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants