Introduce an automated build process #989

fkleedorfer · 2024-10-28T18:30:11Z

This is a - for now prototypical - implementation of a build process based on maven.

Structure

pom.xml: defines the build process
src/main/rdf - rdf sources
src/main/docs - the old docs folder
src/main/build - anything needed during the build that is not part of the final result
src/main/build/assembly/releaseZip.xml - defines the contents of the release zip
src/main/build/inference - generate triples via shacl inference
target - the folder generated during the build process
target/dist - everything in there goes into the release zip
target/inferences - ttl files generated during the build go here (=inferred triples)
target/validation - SHACL validation report

Operations

check format of all ttl files under src/main/rdf
- (fix formatting by calling spotless plugin explicitly)
calculate applicableUnits
check the SHACL rules on the union of all vocab files, the inferred data and the schema.

Note

Using this build process, the github repo will no longer contain all triples that are considered part of the official distribution. Rather, each user can run the build and create the contents of the distribution (if a release was made based on the current state of the repo)

Still TODO

add inferred triples to the appropriate file ( applicable units --> quantitykinds file)
make sure the release zip contains everything we need (and nothing else) <-- @steveraysteveray @ralphtq @jhodgesatmb pls check!
add a github action to trigger build on push to PR and make a snapshot release when a PR is merged
add a github action for making a release
create a changelog and include it in the release process
decide whether we want to leave the rdf sources in the folder structure we have currently on the main branch or if moving them to src/main/rdf is acceptable
during the build, set the current version in all places in it is used in (in target), either use a variable in those places or replace the previous version with the current one

TODOs that we should do after merging this PR but before the next release

verify that build-on-PR merge release really works
verify that manual release action really works and that we like the process
remove qudt version from filenames and other places to allow for adopting semantic versioning (because in the Changelog we talk about semantic versioning, so if we tell people, we should be in the position to execute)

TODOs we can do later

transfer the old release messages into the changelog

Addresses #959

steveraysteveray · 2024-11-06T14:35:42Z

One question comes to mind - how do we still support distributing the OWL schema as well as our default SHACL schema?

fkleedorfer · 2024-11-06T14:47:56Z

how do we still support distributing the OWL schema as well as our default SHACL schema?

I don't see how this question is related to this PR - I will try not to change any RDF content (only the severity of one SHACL shape needs to be touched). What am I missing?

steveraysteveray · 2024-11-06T14:56:02Z

(Answers relate to the build process as it is currently implemented)

I'm a little unclear whether a user can access a working set of vocabulary and schema graphs without running the build.

No, not without running the build.

Will a zip file exist on the repository?

No, not in the git repo - but the latest state of the main branch will always be tagged snapshot and a corresponding release will be on github. That release will contain the release zip. The release/tag is updated with each merged PR to main.

Will the quantity kind file exist with applicableUnit values somewhere in the repository, either > under src/main/rdf, or elsewhere?

No, that file is generated during the build and ends up in target/dist/vocab/quantitykinds/ as well as in the zip file in target/qudt-public-repo-{version}.zip

fkleedorfer · 2024-11-06T15:14:14Z

Note: with the introduction of the changelog in its current form we state that we adhere to semantic versioning, and we should require/encourage each PR to update the changelog.

steveraysteveray · 2024-11-06T15:25:41Z

I have installed Maven as per your instructions and it works as you state. However, is there not a way to make the latest version of QUDT available on our repository so that all I would need is to do a git pull of the repo and I'm back in business? Currently that is how my environment works in TopBraid. I worry that we could lose users if they have to do the extra build steps or even extract the zip file.

Stated another way, even if we adopt the architecture you describe, could we not set up a separate folder somewhere that just contains the "built" graphs? Can we not invoke Maven on the Git host?

fkleedorfer · 2024-11-06T20:08:20Z

It is definitely possible to include the build results in the repo. All we would need to do is un-ignore the target folder (or maybe just target/dist plus some extra files, and add those files to the repo). I have never seen that done anywhere else, though. It is probably fair to consider it an antipattern.

Let's weigh advantages and disadvantages of including the build results in the git repo:
Advantages:

Users working with QUDT off a clone of the git repo
1 . can continue to do so (they would have to change the paths of the files they use, though)
2. are not required to
1. either trigger a maven build in their own build process (that could be hard)
2. or change their git pull/clone to a zip download (which is not so hard)
The changes to the final output files will be tracked over time (you'll be able to do a diff on these files).
Users that include qudt-public-repo as a git submodule can continue to do so (but have to use different files in that repo)

Disadvantages:

Every commit will have changes in the built files if the committer ran maven before committing
If the committer of a change does not commit the built files or does not run maven, there will be 'implicit changes' to the built files which materialize the next time maven is run. Currently every push to a PR to main triggers a maven build on github - during that build, the implicit changes would materialize - the repository files would get additional changes, which would require an additional commit - and if we did that, the user would have to pull that commit before continuing their work on their checked out branch. That would be odd, so we would probably have to fail the build on github if it creates additional changes (which would be easy to do and not be too weird, but still annoying).
The repository becomes bigger, especially if we also check in the release zip, which over time leads to noticeably slower cloning of the repo and operations like history for a file. As the repo is already big and those operations aren't super fast, that may become an issue (but probably not a showstopper)
New users - who might just as well use the release zip for their work - may instead decide to work off a clone and perpetuate this situation
New users might get confused as to which files to use (the source files or the build results)
The advantages cited above apply mostly to users who directly use the git repo. We do not know how many they are but I do not think many. The disadvantages are more general and affect (annoy) many users
If we do it this way, we'll never know if we could have done it the other way because nobody turned out to have been affected

I am leaning toward only tracking the source files in the git repo, not the results:

I believe most users consume qudt via the linked data endpoint. I assume that not many (if any) are in a situation that will be very negatively affected by this decision.
If we get a lot of substantiated complaints, we can still reconsider. If we decide otherwise at the outset, we will never know.
The end result we deliver will be exactly the same as before, and that should be the most important goal.
We should be free in choosing how we want to create our artifacts. People should not depend on our way of doing it, as it may change in the future.

Two alternative ideas, if we must have the result files in a git repo:

create an additional branch, let's say, build_output, which at all times contains the build result (ie, the contents of the zip) of the main branch.
create an additional repo, that the build results of the main branch are always pushed onto.

fkleedorfer · 2024-11-07T09:08:40Z

... But I might have misunderstood your specific problem, which is working with the source of any branch (not necessarily main or even a PR). You now expect to find the final (built) version of all files on any branch upon checkout, ie, without running a build. I don't think that will be possible (unless we add the built files to the repo as normal files). However, I think it will only be a problem for people doing qudt admin work.

steveraysteveray · 2024-11-07T14:57:30Z

@fkleedorfer, you are raising some great issues, and some great architectural candidate solutions. We should definitely talk these through and come to a decision soon. You have captured my concerns as well.

fkleedorfer · 2024-11-07T15:15:41Z

Well, if we agree that we do not want generated files as part of the normal repo, we can just leave the build system as is in that regard and work on its other functionality (eg version number replacement)

If we ever determine that we really need it, we can create the built-branch or built-repo later and just recreate the commits from our release tags (check out tag, build project, copy to branch/repo, commit) and include a step in our github action that pushes the built files onto the branch/repo)

fkleedorfer · 2024-11-08T08:42:28Z

Regarding the version number replacement:

I think the best solution is to put a placeholder in every position where we want the current version to be put.
Here are all occurrences of the version in our files:

$ grep -R 2\\.1\\.44 *
CHANGELOG.md:## [2.1.44] - 2024-10-27
CHANGELOG.md:[2.1.44]: https://github.com/qudt/qudt-public-repo/compare/v2.1.43...v2.1.44
README.md:The QUDT ontologies (Release 2.1.44) have been tested to load without error in Protege 5.6.4.
src/main/rdf/collections/COLLECTION_QUDT_QA_TESTS_ALL-v2.1.ttl:  rdfs:label "QUDT Collection - QA TESTS - ALL - v 2.1.44" ;
src/main/rdf/collections/COLLECTION_QUDT_USER_TESTS-v2.1.ttl:  rdfs:label "QUDT Collection - USER TESTS - v 2.1.44" ;
src/main/rdf/community/mappings/SSSOM/IFC/README.md:* Subsequent version of [QUDT 2.1.44](https://github.com/qudt/qudt-public-repo/releases/tag/v2.1.44) 
src/main/rdf/schema/SCHEMA-FACADE_QUDT-v2.1.ttl:  rdfs:label "QUDT SCHEMA Facade graph - v2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-DATATYPE-v2.1.ttl:  vaem:graphTitle "QUDT Schema for Datatypes - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-DATATYPE-v2.1.ttl:  vaem:title "QUDT Schema for Datatypes - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-DATATYPE-v2.1.ttl:  rdfs:label "QUDT Schema for Datatypes - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  rdfs:label "QUDT Schema - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  dcterms:title "QUDT Schema - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  vaem:graphTitle "Quantities, Units, Dimensions and Types (QUDT) Schema - Version 2.1.44" ;
src/main/rdf/schema/SCHEMA_QUDT-v2.1.ttl:  rdfs:label "QUDT Schema - Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  dcterms:title "QUDT SHACL Schema - Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  vaem:graphTitle "Quantities, Units, Dimensions and Types (QUDT) SHACL Schema - Version 2.1.44" ;
src/main/rdf/schema/shacl/SCHEMA_QUDT_NoOWL-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Metadata Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Supplement Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  dcterms:title "QUDT SHACL Schema Overlay - Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  vaem:graphTitle "Quantities, Units, Dimensions and Types (QUDT) SHACL Schema Overlay - Version 2.1.44" ;
src/main/rdf/schema/shacl/SHACL-SCHEMA-SUPPLEMENT_QUDT-v2.1.ttl:  rdfs:label "QUDT SHACL Schema Overlay Metadata Version 2.1.44" ;
src/main/rdf/vocab/constants/VOCAB_QUDT-CONSTANTS-v2.1.ttl:  rdfs:label "QUDT VOCAB Physical Constants Release 2.1.44" ;
src/main/rdf/vocab/constants/VOCAB_QUDT-CONSTANTS-v2.1.ttl:  vaem:graphTitle "QUDT Constants Version 2.1.44" ;
src/main/rdf/vocab/constants/VOCAB_QUDT-CONSTANTS-v2.1.ttl:  rdfs:label "Physical Constant Vocabulary Version 2.1.44 Metadata" ;
src/main/rdf/vocab/currency/VOCAB_QUDT-UNITS-CURRENCY-v2.1.ttl:  rdfs:label "QUDT VOCAB Currency Units Release 2.1.44" ;
src/main/rdf/vocab/currency/VOCAB_QUDT-UNITS-CURRENCY-v2.1.ttl:  vaem:graphTitle "QUDT Currency Units Version 2.1.44" ;
src/main/rdf/vocab/currency/VOCAB_QUDT-UNITS-CURRENCY-v2.1.ttl:  rdfs:label "QUDT Currency Unit Vocabulary Metadata Version 2.1.44" ;
src/main/rdf/vocab/dimensionvectors/VOCAB_QUDT-DIMENSION-VECTORS-v2.1.ttl:  rdfs:label "QUDT VOCAB Dimension Vectors Release 2.1.44" ;
src/main/rdf/vocab/dimensionvectors/VOCAB_QUDT-DIMENSION-VECTORS-v2.1.ttl:  vaem:graphTitle "QUDT Dimension Vectors Version 2.1.44" ;
src/main/rdf/vocab/dimensionvectors/VOCAB_QUDT-DIMENSION-VECTORS-v2.1.ttl:  rdfs:label "QUDT Dimension Vector Vocabulary Metadata Version 2.1.44" ;
src/main/rdf/vocab/prefixes/VOCAB_QUDT-PREFIXES-v2.1.ttl:  rdfs:label "QUDT VOCAB Decimal Prefixes Release 2.1.44" ;
src/main/rdf/vocab/prefixes/VOCAB_QUDT-PREFIXES-v2.1.ttl:  vaem:graphTitle "QUDT Prefixes Version 2.1.44" ;
src/main/rdf/vocab/prefixes/VOCAB_QUDT-PREFIXES-v2.1.ttl:  rdfs:label "QUDT Prefix Vocabulary Version Metadata 2.1.44" ;
src/main/rdf/vocab/quantitykinds/VOCAB_QUDT-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT Quantity Kind Vocabulary Version 2.1.44" ;
src/main/rdf/vocab/quantitykinds/VOCAB_QUDT-QUANTITY-KINDS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Quantity Kinds Version 2.1.44" ;
src/main/rdf/vocab/quantitykinds/VOCAB_QUDT-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT Quantity Kind Vocabulary Metadata Version 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT VOCAB Systems of Quantity Kinds Release 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  dcterms:description "QUDT Systems of Quantity Kinds Vocabulary Version 2.1.44"^^rdf:HTML ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Systems of Quantity Kinds Version 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-QUANTITY-KINDS-ALL-v2.1.ttl:  rdfs:label "QUDT System of Quantity Kinds Vocabulary Version 2.1.44 Metadata" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT VOCAB Systems of Units Release 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  dcterms:description "QUDT Systems of Units Vocabulary Version 2.1.44"^^rdf:HTML ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Systems of Units Version 2.1.44" ;
src/main/rdf/vocab/systems/VOCAB_QUDT-SYSTEM-OF-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT System of Units Vocabulary Metadata Version v2.1.44" ;
src/main/rdf/vocab/types/VOCAB_QUDT-DATATYPES-v2.1.ttl:  rdfs:label "QUDT Vocabulary of Datatypes v2.1.44" ;
src/main/rdf/vocab/types/VOCAB_QUDT-DATATYPES-v2.1.ttl:  vaem:title "QUDT Vocabulary for Datatypes - Version 2.1.44" ;
src/main/rdf/vocab/types/VOCAB_QUDT-DATATYPES-v2.1.ttl:  rdfs:label "QUDT Vocabulary for Datatypes - Version 2.1.44" ;
src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT VOCAB Units of Measure Release 2.1.44" ;
src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl:  vaem:graphTitle "QUDT Units Version 2.1.44" ;
src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl:  rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.44" ;

All of these are RDF string properties or Text, so putting a placeholder like ${qudt.version} there will not make the data unusable prior to replacement. So, that would be my preferred solution.

Example:

in the source file src/main/rdf/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl the line

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.44" ;

becomes

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version ${qudt.version}" ;

In a normal build (not a release, this would be replaced by the snapshot version, such as:

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.45-SNAPSHOT" ;

Whereas in a release build, this would be replaced by the release version, such as:

 rdfs:label "QUDT Unit of Measure Vocabulary Metadata Version 2.1.45" ;

steveraysteveray · 2024-11-08T14:10:10Z

Just so I understand, if we start with a Release of 2.1.44, and several merges are performed from various PRs, do we get:
Existing Release: 2.1.44
First PR merge: 2.1.45-SNAPSHOT
Second PR merge: 2.1.45-SNAPSHOT
Next Release: 2.1.45

Is that how it works, or does the 45 keep getting incremented with each merge?

steveraysteveray · 2024-11-08T14:18:54Z

From your earlier comment:

Two alternative ideas, if we must have the result files in a git repo:

create an additional branch, let's say, build_output, which at all times contains the build result (ie, the contents of the zip) of the main branch.
create an additional repo, that the build results of the main branch are always pushed onto.

I'm leaning toward your second idea. We could name the repo something like qudt-distribution as a read-only repo to support users who have git-aware applications that want to just 'git pull' the latest main branch, or even pull some earlier version. This way, people could use the qudt-public-repo with Maven that will support contributions and edits if they want, or qudt-distribution that will reflect the latest Release or snapshot.

fkleedorfer · 2024-11-08T15:14:39Z

Just so I understand, if we start with a Release of 2.1.44, and several merges are performed from various PRs, do we get:
Existing Release: 2.1.44
First PR merge: 2.1.45-SNAPSHOT
Second PR merge: 2.1.45-SNAPSHOT
Next Release: 2.1.45

Is that how it works, or does the 45 keep getting incremented with each merge?

Sorry, I did not explain this.

It is actually very simple because there is no magic at all, it's all manual:

The version number is in the pom.xml file, near the top:

<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.qudt</groupId>
    <artifactId>qudt-public-repo</artifactId>
    <version>2.1.45-SNAPSHOT</version>                   <--- there
    <packaging>pom</packaging>

This is the current version. The convention, as far as I've seen it is this:

while you work on the code and keep making commits, the version does not change and is always the version you are aiming for with '-SNAPSHOT' appended. (i.e. 12.34-SNAPSHOT).
when you make a release, the maven-release-plugin is run behind the scenes, which requires as parameters the version you want to release and the next 'development' (=snapshot) version. We pass these arguments to the github action that makes the release - the only two arguments it requires. It changes the version in the pom file, we package that for the release, and then it changes the version in the pom file to the next development version. (maybe you wouldn't call that whole process 'manual' as i did above - but at least there is no automatic version increment)

So, it's like you suspected:

existing Release: 2.1.44
First PR merge: 2.1.45-SNAPSHOT
Second PR merge: 2.1.45-SNAPSHOT
Next Release: 2.1.45 - unless you decide to release a different version, which may be warranted depending on what has changed (see semantic versioning, if we want to adhere to that). You'd pass 2.1.45 as the release version and 2.1.46-SNAPSHOT as the next development version in the github user interface to achieve that.

jhodgesatmb · 2024-11-08T17:00:09Z

I am fine with this and wonder why we haven't done it before now.

…

On Fri, Nov 8, 2024 at 7:15 AM Florian Kleedorfer ***@***.***> wrote: Just so I understand, if we start with a Release of 2.1.44, and several merges are performed from various PRs, do we get: Existing Release: 2.1.44 First PR merge: 2.1.45-SNAPSHOT Second PR merge: 2.1.45-SNAPSHOT Next Release: 2.1.45 Is that how it works, or does the 45 keep getting incremented with each merge? Sorry, I did not explain this. It is actually very simple because there is no magic at all, it's all manual: The version number is in the pom.xml file, near the top: <?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.qudt</groupId> <artifactId>qudt-public-repo</artifactId> <version>2.1.45-SNAPSHOT</version> <--- there <packaging>pom</packaging> This is the current version. The convention, as far as I've seen it is this: 1. while you work on the code and keep making commits, the version does not change and is always the version you are aiming for with '-SNAPSHOT' appended. (i.e. 12.34-SNAPSHOT). 2. when you make a release, the maven-release-plugin is run behind the scenes, which requires as parameters the version you want to release and the next 'development' (=snapshot) version. We pass these arguments to the github action that makes the release - the only two arguments it requires. It changes the version in the pom file, we package that for the release, and then it changes the version in the pom file to the next development version. (maybe you wouldn't call that whole process 'manual' as i did above - but at least there is no automatic version increment) So, it's like you suspected: - existing Release: 2.1.44 - First PR merge: 2.1.45-SNAPSHOT - Second PR merge: 2.1.45-SNAPSHOT - Next Release: 2.1.45 - unless you decide to release a different version, which may be warranted depending on what has changed (see semantic versioning <https://semver.org/>, if we want to adhere to that). You'd pass 2.1.45 as the release version and 2.1.46-SNAPSHOT as the next development version in the github user interface to achieve that. — Reply to this email directly, view it on GitHub <#989 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATQRWNQWS7JCGSLSNOK4OLZ7TIPLAVCNFSM6AAAAABQYBW54WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRVGAYDQNRRGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Jack

fkleedorfer · 2024-11-08T18:06:02Z

Rebased on main and made all necessary changes.

I think we can remove DRAFT status.

Also, I think it would be a good idea to switch our settings such that the default action for PRs is to squash and rebase.

In case of a squash I would use the commit message of the first commit in this branch as the commit message.

steveraysteveray · 2024-11-08T19:59:44Z

I'm about to jump into another meeting, but I think your applicableUnits algorithm needs to ignore any deprecated units. I'm getting lots of validation errors because quantity kinds are referring to some deprecated units. Somewhere in your nested SELECT calls you need a

FILTER NOT EXISTS {?unit qudt:deprecated true}

fkleedorfer · 2024-11-08T20:42:51Z

Excellent catch @steveraysteveray - I had lowered the severity of the shacl shape to warning to make the build not fail. I had not seen the connection with the applicableUnits calculation. Fixed now.

steveraysteveray · 2024-11-09T15:56:05Z

Thanks @fkleedorfer, the target files that are in the imports closure now pass the validation cleanly!

I stepped back to take a look at where we are now, and I have some observations/questions. I used the quantitykind file to focus my observations.

Before running mvn clean install there is no version of quantitykinds that is ready "out of the box" (i.e. including the applicableUnit triples)
After running mvn clean install there are two places to find the usable version of quantitykinds:
a. In target/dist/vocab/quantitykinds
b. In target/qudt-public-repo-2.1.45-SNAPSHOT.zip
I suppose that's OK, but people might wonder if they are the same, or which is the "right one". We could discuss whether we want a zip file outside of Releases.
After running a subsequent mvn clean, a git status shows the following untracked files:

	src/main/resources/docs/2020-04-28 Intro to QUDT.pdf~01e299f62ba843befaf5266de2c8cc38c32d3682
	src/main/resources/docs/2020-04-28 Intro to QUDT.pdf~HEAD
	src/main/resources/docs/2020-04-28 Intro to QUDT.pptx~01e299f62ba843befaf5266de2c8cc38c32d3682
	src/main/resources/docs/2020-04-28 Intro to QUDT.pptx~HEAD
	src/main/resources/docs/test.txt~01e299f62ba843befaf5266de2c8cc38c32d3682
	src/main/resources/docs/test.txt~HEAD

Are these remnants of a merge conflict? They should be removed, right? Or even ignore that whole folder in your scripts?

There is a top-level folder named vocab, with two empty subfolders, quantitykinds and unit. Are they forgotten leftovers, or are they needed for some reason?
I'm still undecided whether it would be better to deposit a set of "usable" files (i.e. including the quantitykind file containing applicableUnits) in some separate branch on this repo, or in a separate repo like qudt-dist. My own pros and cons:

Separate branch in qudt-public-repo
- Pro: Only one repo for users to know about and keep up to date
- Con: People who want to just use the files without running maven would go to a branch other than main, which seems counterintuitive.
Separate repo (qudt-dist)
- Pro: People who are just users and don't expect to contribute just need to know about qudt-dist, and clone or fork the main branch, or download a Release
- Con: People need to know about different repos for contribution vs. use

(For item 5. above, I'm thinking we have 2 communities we want to support:

Contributors/developers - they could be expected to use Maven locally to validate their contributions, or at least look at the pipeline results when they git push
Users who do not contribute - they would either download a Zip file or git pull from somewhere without invoking Maven

For my own work, I expect to jump back and forth between both communities. Group 1 for my own updates to QUDT, and Group 2 for my use of QUDT in committee work.

fkleedorfer · 2024-11-09T16:22:55Z

3 and 4 are accidents.

jhodgesatmb · 2024-11-09T16:31:27Z

Are people that creat profile distributions in either of your groups?I agree that a large number of QUDT users (I hesitate to say ‘overwhelming’) just want to download or link to the repository and not contribute. The experience for them should be simple and transparent.Having lost track of the path of this discussion through all the threads I would like to test out the build process myself. Where is the documentation?

steveraysteveray · 2024-11-09T16:33:35Z

@jhodgesatmb, see the qudt-board Slack channel at 2:42pm on October 28.

steveraysteveray · 2024-11-09T16:38:28Z

@fkleedorfer, I'm still impressed by your applicableUnit query. I thought I would try to improve it because instances of qudt:SystemOfQuantityKinds also use the qudt:hasQuantityKind relation, so they could be accidentally picked up in other scenarios. I realize that in your pom.xml you only included the unit and quantitykind graphs to do the inferencing, but still I thought it would be good to make it more robust.

However, I clearly messed up with my addition of

?unit a qudt:Unit

line in the query, so in the end I just put things back the way they were...

steveraysteveray · 2024-11-09T16:45:25Z

@jhodgesatmb, to your first question, yes, I would imagine people in group 1 would create profile distributions, and people in group 2 would use such distributions.

fkleedorfer · 2024-11-09T18:19:30Z

@fkleedorfer, I'm still impressed by your applicableUnit query. I thought I would try to improve it because instances of qudt:SystemOfQuantityKinds also use the qudt:hasQuantityKind relation, so they could be accidentally picked up in other scenarios. I realize that in your pom.xml you only included the unit and quantitykind graphs to do the inferencing, but still I thought it would be good to make it more robust.

I will look into it, thanks!

jhodgesatmb · 2024-11-09T18:53:04Z

Ok

fkleedorfer · 2024-11-09T22:47:18Z

@steveraysteveray another good catch - if I include the 'systems' files, the applicableUnits.ttl contains entries such as:

quantitykind:AbsoluteHumidity
  qudt:applicableUnit soqk:IMPERIAL ;
  qudt:applicableUnit soqk:USCS ;
  ...

Added the requirement for type qudt:Unit and verified it produces the same result as the original query without the systems files

fkleedorfer · 2024-11-09T22:56:24Z

src/main/shacl/inferApplicableUnits.ttl

-                            ?unit qudt:hasQuantityKind ?qk
-                            ?unit a qudt:Unit
+                            ?unit qudt:hasQuantityKind ?qk .
+                            ?unit rdf:type qudt:Unit .


The same statement (?unit rdf:type qudt:Unit) is also needed in line 79. I did that in a later commit, so don't worry about it.

fkleedorfer · 2024-11-09T23:02:04Z

@steveraysteveray your points 3 and 4 from earlier I cannot reproduce. When you clone the repo in a new folder and check out the branch 959/maven, it looks clean to me

fkleedorfer · 2024-11-09T23:09:34Z

To your point 2 - it is totally possible to not make the zip file upon mvn install, and only build the zip during a release build.

fkleedorfer · 2024-11-09T23:22:53Z

it is totally possible to not make the zip file upon mvn install, and only build the zip during a release build.

Did that in the last commit. mvn -Pzip install now builds with the zip file

dr-shorthair · 2024-11-10T01:07:55Z

Watching from a distance. Very pleased to see all this.

fkleedorfer · 2024-11-10T15:00:16Z

In Terms of Tasks left to do in this PR (copied from above):

These were the tasks:

add inferred triples to the appropriate file ( applicable units --> quantitykinds file)
[] make sure the release zip contains everything we need (and nothing else) <-- @steveraysteveray @ralphtq @jhodgesatmb pls check!
add a github action to trigger build on push to PR and make a snapshot release when a PR is merged
[] verify that build-on-PR merge release really works
add a github action for making a release
[] verify that manual release action really works and that we like the process
create a changelog and include it in the release process
decide whether we want to leave the rdf sources in the folder structure we have currently on the main branch or if moving them to src/main/rdf is acceptable
during the build, set the current version in all places in it is used in (in target), either use a variable in those places or replace the previous version with the current one

I've decided that:

the contents of the zip file have been reviewed multiple times, I believe it is ok
the github actions cannot be verified within the PR. The manual one is not available, and the automatic one will not be triggered until we merge. We'll move these tasks to the 'later' section

Which leaves us with the question whether or not we should leave the directory structure the way it is now: The former toplevel directories are moved to src/main/rdf (and src/main/resources/docs respectively), leaving at the top level just the src folder and a git-ignored target folder that is generated by the build.

A bit of background about this structure (what's main about here?): It's a maven convention. The main source code and other files go into src/main/. You usually also have test sources (unit/integration tests). Those go into /src/test/.

I would recommend that we keep the structure as it is in this PR at the moment:

Advantages:

it's a breaking change for everyone who uses the sources. They will have to look at what changed and adapt their code. This is good, because if we leave the toplevel folders where they were previously, the changes will be subtle (e.g. no applicableUnits triples in the quantitykinds file), and their code will break in subtle ways they might not notice for years.
we stick with the maven convention and those who know it will be able to work with it witouth problems
we might have actual tests in the future - we may provide some modelling examples, these could go into src/test

Disadvantages:

It's a change. We may have to update some scripts (but we will have to update some scripts either way)
It may look a bit disorienting. Having said that, the current structure does not necessarily look less disorienting. (Where is the qudt file?)

other considerations?

jhodgesatmb · 2024-11-11T01:43:06Z

I am guessing that by 2:42 pm you mean EST or EDT. If I back this 3 hours it would be 11:42 am PDT. That took me to the github site but I didn't see a procedure there. Jack

…

On Sat, Nov 9, 2024 at 8:33 AM steveraysteveray ***@***.***> wrote: @jhodgesatmb <https://github.com/jhodgesatmb>, see the qudt-board Slack channel at 2:42pm on October 28. — Reply to this email directly, view it on GitHub <#989 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATQRWOURAQIAVIUDEP6T73Z7Y2PLAVCNFSM6AAAAABQYBW54WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWGI3TOMRVGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Jack

fkleedorfer · 2024-11-11T08:24:19Z

@jhodgesatmb there is some documentation on how to build the project in the README.md file in this PR.

jhodgesatmb · 2024-11-11T15:54:22Z

Do I have to check out the repository in order to see the procedure? There is nothing on the landing page of this PR that says what the starting point is. I would rather not check out a branch and confuse everything I have locally, and I cannot seem to see the README just on the github page. If I go to the 'Code' page and open the README.md file it is the regular README and nothing about this process. Jack

…

On Mon, Nov 11, 2024 at 12:24 AM Florian Kleedorfer < ***@***.***> wrote: @jhodgesatmb <https://github.com/jhodgesatmb> there is some documentation on how to build the project in the README.md <https://github.com/qudt/qudt-public-repo/blob/43f1193d2c937343b25e5e4fd27bd22d8cc07729/README.md> file in this PR. — Reply to this email directly, view it on GitHub <#989 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATQRWMTPME764UD6ZGHYPT2ABSUTAVCNFSM6AAAAABQYBW54WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRXGUYTKOBTGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Jack

## Functionality of the build system: Run the build with `mvn install` - check RDF source formatting - fail build if there are violations - enable to fix formatting via `mvn spotless:apply` - remove all `qudt:applicableUnits` triples from the quantitykinds file - copy all relevant sources to `target/dist` - replace the version placeholder, `$$QUDT_VERSION$$` everywhere in `target/dist` - infer `qudt:applicableUnits` by applying `src/build/inference/inferApplicableUnits.ttl` and add those triples to the quantitykinds file in `target/dist` - evaluate all SHACL shapes on the build result in `target/dist` and fail the build if there are violations Profile `zip` builds the release zip: `mvn -Pzip install` ## Github Actions Github actions are defined in `.github/workflows`: - maven.yml - runs the build upon push to a PR or when a PR is merged to `main`. In the latter case, the action makes a github release `snapshot` and a tag with that name, which will overwrite the previous such snapshot release - release.yml - manually invokable action that makes a release. Parameter `release_version` and `next_development_version` are required and will be used for making the release and preparing the repo for the next development cycle. This action makes changes to the repo (`pom.xml`, `CHANGELOG.md`), which are committed to a new branch, and a PR to `main` is created during its execution. This PR has to be merged manually when we are happy with the results of the release. ## Changes to sources that were required - introduce a placeholder, `$$QUDT_VERSION$$` wherever the current version is needed - move the rdf source folders from the root dir to `src/main/rdf` (not technically required, but makes clear what is source and what is generated) - add CHANGELOG.md ## Other changes - Folder structure: the `collections` folder ended up as `src/main/validation` ## Documentation - build phases and associated plugin executions are listed in the comments in `pom.xml`

fkleedorfer · 2024-11-11T19:34:24Z

I've updated the pom.xml for the new folder structure and squashed everything into one commit with a nice commit message. I think this is ready for merge (preferably rebase, actually)

steveraysteveray

The dist folder loads cleanly into TopBraid 7.1.1, and validates there without errors.

The src/main/rdf folder loads successfully into a separate workspace in TopBraid 7.0 that I can use for the publication.

I will pause to let others do their testing.

fkleedorfer · 2024-11-12T10:34:47Z

The src/main/rdf folder loads successfully into a separate workspace in TopBraid 7.0 that I can use for the publication.

@steveraysteveray you'll have to change the code that replaces the version number in those files.

Would it not be better to create the release off the results in target/dist ?

steveraysteveray · 2024-11-12T14:51:13Z

We will need some careful sequencing here. Publication of the web pages is distinct from creation of the GitHub Release. We need access to the src files to update the version numbers and metadata like the date of publication, for both publication and the Release. I cannot have both the src graphs and the dist graphs present in the TopBraid workspace at the same time because of base URI conflicts. So, the sequence could be something like:

Update the version numbers in src (just increment to 2.1.45 this time?)
Run the Maven build
Load just the dist folder into TopBraid (along with lmdoc, the private QUDT repo, (and mathjax and QUDT Customizations)).
Run the publication script which invokes many webservices from TopBraid. Problem here is it will not be able to update the graph metadata in the src folder. It will update the metadata in the dist graphs, but those will be overwritten with the next Maven build.

So in the near term, I could do the sequence above. The published web pages would have the correct metadata, but the src files would not. Then I could manually invoke just the metadata update routines in a separate workspace, operating on the src files, and push those in a separate PR. This will be a little tricky, but doable I believe.

Then we could discuss the testing and execution of the GitHub Release.

How does this sound?

fkleedorfer · 2024-11-12T16:28:00Z

I think, for now, we don't need to release a new version. I expect some bugfixing and tweaking after we merge this PR. So we have time to get the publication workflow fixed.

I think it is a good idea to work on that after we merge. Maybe we can automate it fully, too.

However, I don't seem to fully understand the problems you describe. For one, if the solution is to do a manual find/replace in src and then run the maven build (which replaces the version placeholder with the version), why to the replacing in the first place? And why not just use the release zip (once it has been released on github).

I don't understand the problem with the metadata either... Is it that we should do a second replace for the date of publication? It would be no problem to add that to this PR! Using the placeholder approach, it's really quite easy. We can also adapt that approach for IRIs or non-string literals if needed.

jhodgesatmb · 2024-11-12T16:55:22Z

I do not understand the problem either. Perhaps the best way forward is to get the release process working under the build process, and then try to get the web pages to build the same way and compare to the current way they are created.There is no question that this is kind of scary but the upside would be huge.How does all this fit in with Ralph’s project?Jack Hodges, Ph.D.Arbor StudiosOn Nov 12, 2024, at 8:28 AM, Florian Kleedorfer ***@***.***> wrote: I think, for now, we don't need to release a new version. I expect some bugfixing and tweaking after we merge this PR. So we have time to get the publication workflow fixed. I think it is a good idea to work on that after we merge. Maybe we can automate it fully, too. However, I don't seem to fully understand the problems you describe. For one, if the solution is to do a manual find/replace in src and then run the maven build (which replaces the version placeholder with the version), why to the replacing in the first place? And why not just use the release zip (once it has been released on github). I don't understand the problem with the metadata either... Is it that we should do a second replace for the date of publication? It would be no problem to add that to this PR! Using the placeholder approach, it's really quite easy. We can also adapt that approach for IRIs or non-string literals if needed. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

steveraysteveray · 2024-11-12T19:52:06Z

OK, here's the excerpt from the webservices call to update the metadata in each of the graphs:

                                ?meta dcterms:modified ?newMod .
                                ?meta vaem:latestPublishedVersion ?newLatestU .
                                ?meta vaem:previousPublishedVersion ?newPreviousU .

and for the catalog:

                            ?entry lmcat:publicationDate ?yearMonthDay .

So if we can do that in this PR, that could work, along with your placeholder approach for the version. Want to take a shot at that? Then I can comment out the respective calls in the script and proceed with publishing the web pages.

Happy to drive all this from either the dist folder or the zip file.

steveraysteveray · 2024-11-12T19:53:11Z

...or we could just merge the current PR and keep working on the publication from there.

jhodgesatmb · 2024-11-12T20:55:53Z

Ok with me.Jack Hodges, Ph.D.Arbor StudiosOn Nov 12, 2024, at 11:53 AM, steveraysteveray ***@***.***> wrote: ...or we could just merge the current PR and keep working on the publication from there. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

fkleedorfer · 2024-11-12T21:31:56Z

...or we could just merge the current PR and keep working on the publication from there.

That's the better approach.

Sounds like we can make these webservice calls from the github action. Let's work on that in a subsequent PR.

fkleedorfer · 2024-11-12T21:38:03Z

@ralphtq can we merge?

fkleedorfer · 2024-11-13T08:24:08Z

Oops, I had forgotten to update the changelog!

fkleedorfer mentioned this pull request Nov 7, 2024

Use of qudt:hasDimensionVector in datatypes file clashes with other uses #992

Closed

fkleedorfer force-pushed the 959/maven branch from 3be408a to 01e299f Compare November 8, 2024 17:55

fkleedorfer commented Nov 9, 2024

View reviewed changes

fkleedorfer force-pushed the 959/maven branch from 879a6d1 to d7e5cc4 Compare November 11, 2024 19:32

steveraysteveray marked this pull request as ready for review November 11, 2024 21:54

steveraysteveray requested review from steveraysteveray, ralphtq and jhodgesatmb as code owners November 11, 2024 21:54

steveraysteveray approved these changes Nov 11, 2024

View reviewed changes

Update CHANGELOG.md

f727340

This was referenced Nov 13, 2024

Add in NUM-PER-MilliL #995

Merged

Replace hr with h in all units' symbol, expression, or description #996

Merged

steveraysteveray merged commit 332ad2c into main Nov 13, 2024
1 check passed

steveraysteveray deleted the 959/maven branch November 13, 2024 15:52

Introduce an automated build process #989

Introduce an automated build process #989

Conversation

fkleedorfer commented Oct 28, 2024 • edited Loading

Structure

Operations

Note

Still TODO

TODOs that we should do after merging this PR but before the next release

TODOs we can do later

steveraysteveray commented Nov 6, 2024

fkleedorfer commented Nov 6, 2024

steveraysteveray commented Nov 6, 2024 • edited by fkleedorfer Loading

fkleedorfer commented Nov 6, 2024

steveraysteveray commented Nov 6, 2024

fkleedorfer commented Nov 6, 2024 • edited Loading

fkleedorfer commented Nov 7, 2024

steveraysteveray commented Nov 7, 2024

fkleedorfer commented Nov 7, 2024

fkleedorfer commented Nov 8, 2024

steveraysteveray commented Nov 8, 2024

steveraysteveray commented Nov 8, 2024

fkleedorfer commented Nov 8, 2024

jhodgesatmb commented Nov 8, 2024 via email

fkleedorfer commented Nov 8, 2024

steveraysteveray commented Nov 8, 2024

fkleedorfer commented Nov 8, 2024

steveraysteveray commented Nov 9, 2024

fkleedorfer commented Nov 9, 2024

jhodgesatmb commented Nov 9, 2024 via email • edited by fkleedorfer Loading

steveraysteveray commented Nov 9, 2024

steveraysteveray commented Nov 9, 2024

steveraysteveray commented Nov 9, 2024

fkleedorfer commented Nov 9, 2024

jhodgesatmb commented Nov 9, 2024 via email • edited by fkleedorfer Loading

fkleedorfer commented Nov 9, 2024

fkleedorfer Nov 9, 2024

Choose a reason for hiding this comment

fkleedorfer commented Nov 9, 2024 • edited Loading

fkleedorfer commented Nov 9, 2024

fkleedorfer commented Nov 9, 2024

dr-shorthair commented Nov 10, 2024

fkleedorfer commented Nov 10, 2024

jhodgesatmb commented Nov 11, 2024 via email

fkleedorfer commented Nov 11, 2024

jhodgesatmb commented Nov 11, 2024 via email

fkleedorfer commented Nov 11, 2024

steveraysteveray left a comment

Choose a reason for hiding this comment

fkleedorfer commented Nov 12, 2024

steveraysteveray commented Nov 12, 2024

fkleedorfer commented Nov 12, 2024

jhodgesatmb commented Nov 12, 2024 via email

steveraysteveray commented Nov 12, 2024

steveraysteveray commented Nov 12, 2024

jhodgesatmb commented Nov 12, 2024 via email

fkleedorfer commented Nov 12, 2024

fkleedorfer commented Nov 12, 2024

fkleedorfer commented Nov 13, 2024

fkleedorfer commented Oct 28, 2024 •

edited

Loading

steveraysteveray commented Nov 6, 2024 •

edited by fkleedorfer

Loading

fkleedorfer commented Nov 6, 2024 •

edited

Loading

jhodgesatmb commented Nov 9, 2024 via email •

edited by fkleedorfer

Loading

jhodgesatmb commented Nov 9, 2024 via email •

edited by fkleedorfer

Loading

fkleedorfer commented Nov 9, 2024 •

edited

Loading