Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable recursive XSD adding text output compare (as JavaFX is not working) #10

Open
wants to merge 39 commits into
base: develop
Choose a base branch
from

Conversation

svanteschubert
Copy link

Dear Yoep,

Thank you for your project it gave me a very good kick start on the Xerces Java API!
(BTW I was able to build and debug thourgh Xerces in conjucntion with xsd-compare in IntelliJ using JDK 8 after adding some HTML interface).

The features of JavaFX, spreadsheet and native packaging have not been working for me from the start on my Ubuntu 22.04 LTS (Jammy Jellyfish) with OpenJDK 11 (nor on Windows 10). I tried to update to all latest dependency versions, but as JavaFX is released in parallel with JDK, perhaps an update of the project to the JDK 17 LTS - as it is a stand-alone comparison tool this should be unproblematic. But as I am fine with text output for the comparison I don't follow this any longer.
Still this update caused some changes in the spreadsheet and JavaFX classes.
Also I did some renaming, mainly according to XSD spec and Xerces, sometimes due to refactoring, e.g.
Renamed Modifications.java to Change.java as it now represents either an Add, Modfication or Removal change.
A modification change might stilll consist of the change of several XSD facets but it is seen as a single change of either an element or attribute.
BTW it helped me to read the abstract XSD data model chapter in the W3C XSD spec (and jump back there from time to time)

With this PR I have enabled the feature of recursive XSDs like the on from UN/CEFACT XML:

  1. https://unece.org/trade/uncefact/mainstandards
  2. https://unece.org/trade/uncefact/xml-schemas

My test case was the compare of the cross industry invoice (CII) XML of the second release (B) of 2016 -> D16B (used by the EU norm for e-invoices EN16931 - where I am a CEN co-editor) with the second release (B) of 2022 -> D22B currently a Beta

I am returning the comparison as a list of changes, which an be serialized to text by using a different TextReporter.

I am using two different text outputs (both being serialized to txt files by JUnit tests to ./target)

You may find for both new added automated regression tests.

Note in the multi-line output, that the type change in the XSD is the first line (for modifications also the second indented lines), which I call Header Lines, as they exist once for multilple XPath.
A single type change might effect many different named nodes accross valid documents of the XSD grammar. Therefore may one header line have many XPath (a HashSet of XPath as body lines).
In the end I have the following statisik:

**** STATISTIC ****

** ELEMENTS

  • Added elements in XSD: 98

  • Added elements in XML: 1828

  • Modified elements in XSD: 18

  • Modified elements in XML: 175

  • Removed elements from XSD: 6

  • Removed elements from XML: 45

** ATTRIBUTES

  • Added attributes in XSD: 5

  • Added attributes in XML: 10

  • Modified attributes in XSD: 52

  • Modified attributes in XML: 2352

  • Removed attributes from XSD: 141

  • Removed attributes from XML: 8703

I might continue after eastern to check completness and might add tests for cases where xs:choice, xs:sequence, xs:all is being changed. But perhaps you are curious why the JavaFX is not starting any longer and might give it a quick check.
Perhaps as I started to do updates switch to JDK 17. :-)
BTW the maven-shade-plugin did no longer work (for me) so I switched to maven-assembly-plugin creating a jar-with-depencies. Still I got troubles moving version, buildtime into the manifest as I am doing at other projects:
https://github.com/tdf/odftoolkit/blob/master/odfdom/pom.xml#L136
allowing that a "java -jar XY.jar" returning version information, I even use usually the last commit hash. ;-)

Anyway, this is therefore a draft PR as I am uncertain if I broke existing features (as they never worked for me from the beginning), nevertheless the recursive grammar and the refactoring are worth a review IMHO.
If you are like to share a remote tea, I would happily explain in more detail what I intended and where I would like to have some feedback from you :-)

All the best and Happy Eastern holidays (even if we might have different religions there should be always a reason to celebrate) :-)
Svante

@svanteschubert svanteschubert marked this pull request as draft April 6, 2023 11:41
@yoep
Copy link
Owner

yoep commented Apr 7, 2023

Thanks already for the draft, seems very promising.
I'll see when I can find the time to test it myself and go again through the full code as it has been some time when I worked on this project.

@svanteschubert
Copy link
Author

Added some missing facets to comparison including test documents to compare changes (1/2)

  • xs:maxInclusive
  • xs:maxExcluisve
  • xs:minInclusive
  • xs:minExclusive
  • xs:totalDigits
  • xs:fractionDigits

and improved the reporting and added real-world test documents - the factur-x subset XSDs.
Test documents compared in one test with all reports.

My original use case was to check if the UN/CEFACT XSD D22B has added new constraints like:

  1. Removed elements/attributes that were originally used by EN16931 (esp. our Factur-X subset so called Core-Invoice-Usage-Specification (CIUS)).
  2. Added higher constraints: added mandatory node beyond existing node
  3. Limited value set, e.g. raising minimum value or lowering a maximum value or removing enumeration values
    All the above would break backwards compatibility.
    For this reason, I have added two explicit reports for these use cases (e.g. OnlyNewRestrictionReport and OnlyNewExtensionReport).
  • The OnlyNewRestrictionReport would show restricitions and new constraints, but also incompatilite changes (e.g. fixed default value changes).
  • The OnlyNewExtensionReport would show extensions. Any subset of the EU norm (EN16931) that is taking national law into account may narrow an optional to mandatory (which is useful for national VAT laws) but is not allow to extend EN16931 (unless it is explicitly an extension and not a CIUS).

What might be missing:

  1. Two facets xs:assert and xs:explicitTimezone might be added
  2. I have changed in my facet test document xs:choice to xs:sequence and this was not yet flagged

@svanteschubert
Copy link
Author

Finally, I have done some further programming of improvements:

  1. added the detection of changes of the compsitor (xs:sequence, xs:choice, xs:all)
  2. improved the output of comparisons: Stating no semantic change if a new single enumeration is equal to the previous fixed value or a removed single enumeration is still provided as @fixed attribute. Finally, moving new enumerations from extensions to restrictions reports, e.g. as a given domain (e.g. Strings) is being restricted to the subset of the enumeration of values

I could not find xs:assert nor xs:explicitTimezone in my XSDs (UN/CEFACT XML and UBL) neither could I find "explicitTimezone" with a grep within Xerces sources, therefore I am fine not embracing them now.

Guess, I am ready now :-)

@svanteschubert
Copy link
Author

I have examples for the missing features in a comment of a test file.

    <xs:complexType name="assertType">
        <xs:attribute name="min" type="xs:int"/>
        <xs:attribute name="max" type="xs:int"/>
        <xs:assert test="@min le @max"/>
    </xs:complexType>

    <xs:simpleType name="SpecificTimeType">
        <xs:restriction base="xs:time">
          <xs:explicitTimezone value="required" />
        </xs:restriction>
      </xs:simpleType>

@svanteschubert svanteschubert marked this pull request as ready for review April 17, 2023 16:29
@svanteschubert
Copy link
Author

After being finished with comparing the UN/CEFACT XML, I added a test for comparing UBL 2.1 with UBL 2.3.

Unfortunately, this test fails due to acquisition of too much Heap space (tried increasing Heap in IntelliJ without success). It seems that for UBL the extra DOM-like element model that xsd-compare is creating is too much.
I loved this abstraction for easier view on the model and was (a bit I am still) tempted to test the same Xerces-based comparison with the MultiSchemaValidator (MSV) under the hood, were I did recently some release, to check if the model is understood correctly.
(MSV is also loading DTD and RelaxNG in an internal model that abstracts from XSD, DTD and RNG).

Nevertheless, the comparision for UN/CEFACT XML worked (and this was my main goal) and I have learned a lot of the underlying Xerces API and the problems that occur when comparing XSD grammars.

Thanks again for your nice project, Yoep!

Best regards,
Svante

@svanteschubert
Copy link
Author

PS: Realized that the change of element content was not compared (added it)

@yoep
Copy link
Owner

yoep commented Apr 20, 2023

Something that I think might be better is to move the CLI functionality to a CommandLineRunner.
This allows us to switch between the GUI and CLI without having to do additional work such as setting up the logger and manually parsing the argument options given to the program. This way, we can guarantee the same functionality between the GUI and CLI as the same backend will be available.

If you want, I can do this after the PR merge and I'll foresee some new Github actions integrations to generate installers for all Operating Systems.

@svanteschubert
Copy link
Author

Yes, please do so!
I agree with you, currently I have extended XsdCompareStarter to offer command-line options.
But you are right, currently the GUI is not the default, but of course this should be switched when GUI is running!
https://github.com/svanteschubert/xsd-compare/blob/develop/src/main/java/com/compare/xsd/XsdCompareStarter.java#L19
Please refactor as you wish, I am not emotional on this - neither on naming ;-)

If I will find further problems in XSD compare in the past weeks (e.g. by the work on Factur-X e-invoice standard), I will continue to iterate/enhance the comparison!

Last but not least, I am usually using the Apache 2 License (as Xerces) as it allows to copy source code functionaltity in other Apache sources of "mine", e.g. https://github.com/tdf/odftoolkit/.
It gives me the freedom also to take parts in commercial software, but these questions are sometimes seen as religious.
Just wanted to state that I am offering the source code as well as Apache 2 to you if you like to switch the license, but no pressure!

@yoep
Copy link
Owner

yoep commented Apr 21, 2023

Thanks about the remark of the license (forgot about it), I've updated it to Apache 2.0.

I'll first make sure the packaging is working for all OS systems before I do some work on the refactoring.
For me it's fine to merge the existing PR as-is, I'll put in some effort to get it back up-and-running again in JavaFX.

src/main/resources/application.yml Show resolved Hide resolved
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>-->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it has to do with the new packaging plugin, but running the bundled jar doesn't seem to be working.

➜  xsd-compare git:(feature/recursive_xsd) ✗ java -jar ./target/xsd-compare-0.0.9-SNAPSHOT-jar-with-dependencies.jar --gui
usage: java -jar xsd-compare-jar-with-dependencies.jar <opts> <old-xsd-grammar> <new-xsd-grammar>

options:
                --ui: omits the GUI and returns only a text result for comparision (default).
                --gui: starts a JavaFX GUI front-end for comparison.
                --multi or -m: multiple lines indented per change sorted by XSD change. (default).
                --single or -s: one single line for each change with XPath in the start, harder to read but easier for compare the output with other tools.
                --extensions-only  or -e: one single line for each extension of the grammar sorted by type. Restrictions/limitations are being neglected.
                --restrictions-only  or -r: one single line for each new restrictions, requirements or limitations of the new grammar sorted by type. Extensions are being neglected.
                --help or -h: this help text.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may try to get packaging and frontend to work before my patch. At least did not work for me. Thanks for your help!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am usually testing/starting directly from the IDE and not triggering by commandline the JAR, but tested it once manually, when I was added the functionality (at least the xsd-compare-0.0.9-SNAPSHOT-jar-with-dependencies.jar)
Calling the following is showing me the options:
java -jar xsd-compare-0.0.9-SNAPSHOT-jar-with-dependencies.jar

With the former way, I had problems with the manifest...

You might considere to increase the JDK baseline to JDK 17 (or at least play around / test it ) for the JavaFX part?

Where are you hanging? Perhaps you start on a branch from scretch without any funcationality of mine from your latest commit before my PullRequest. Updating the pom.xml (not trivial - I have realized) :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any question/remarks left to answer, please give me a ping, Yoep!
All the best,
Svante

@svanteschubert
Copy link
Author

I tried variations on log and packaging, but logging never worked for me in a file) and packaging updates can be ignored as well as I never was able to get rid of exceptions. Therefore you might ignore or questions such changes. Sorry for any confusion and thanks for the license update 🤗

@svanteschubert
Copy link
Author

I have updated an updated of the UN/CEFACT XSDs and relalized a mistake I made with the report of new restrictions and extensions of the XSD grammar in regard of enumerations.
While if you have an enumeration and add new values it is an extension of the grammar, it is in the very beginning (where you have earlier only a type like xs:string) and add a new enumeration not an extension but a restriction of all possible values (here strings of xs:string) to the new given set of the enumeration!
I likely will fix this - as you already approved the PR may I still add this fix on top of this PR?

@svanteschubert
Copy link
Author

BTW many thanks for the license change. With the commit the master branch has now appeared in public on GitHub and I might try to rebase this PR develop branch with te master branch (adding my mentioned reporter updates for enumeration fixes on top).
But all commits would have new hashes and it would be a force push, making any work of yours on the PR "difficult" so I will hold my feet still until you give a green flag! :-)

@svanteschubert
Copy link
Author

Hi yoep, I rebased my changes with your Apache licence change, which happened after my fork, so it is now shown on top of my changes. In addition, I have pushed the enumeration report update (taking concern of the edge cases of adding/removal of an extension. Where "adding of an enumeration"=="grammar restriction" and "removal of an enumeration"=="grammar extension". Aside of these edge cases a larger enumeration means always an extension and a smallar a grammar restriction.
I think I am ready now. :-) I will look through your comments and a nasty cold the past week therefore the delay.

…of choice does correctly not activate choice)
…clusive xs:totalDigits xs:fractionDigits with test cases. Fixing/Refactoring OnlyNewRestrictionReport and OnlyNewExtensionReport
…new single enumeration is equal to the previous fixed value or a removed single enumeration is still provided as @fixed attribute. Finally, moving new enumerations from extensions to restrictions reports, e.g. as a given domain (e.g. Strings) is being restricted to the subset of the enumeration of values
… it on the fly 2) Number of recursions can be set by default three. The higher the more possible XPaths and XsdElements are being created
…ry change and enumeration null vs empty disabling (for later)
…values) is a restriction and equally the removal of an enumeration is an extension. Aside of these edge cases, a larger enumeration is always an extension and a smaller a restriction of the grammar.
@svanteschubert
Copy link
Author

Dear Yoep,

I have rebased this pull request as you updated the sources last week! :-)
With this rebased version the sources compile under Linux! 👍

Could you merge them? Or is there something that still could be done?`

From my perspective: The command line worked well when comparing the quite big UN/CEFACT XML grammar version D16B with D22B! Would you like to test it with the GUI? Might be already quite big!
However, I was not able to load the UBL 2.1 XML XSD to compare with newer versions of UBL. Receiving an out-of-memory exception. A profiler might give clues, but I am lacking here the time atm...

Have a nice holiday season, Yoep!

Greetings from Germany,
Svante

@yoep
Copy link
Owner

yoep commented Dec 27, 2023

Hi Svante,

Thanks for updating the feature branch.
Last time I've tried your feature branch, the JavaFX interface didn't want to load, so I'll give it another try this time.

If it works, I'll merge it in.
Otherwise, I will also need to find some time to troubleshoot the issue.

Happy holidays for you too :)

Kind regards,
yoep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants