Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganise the SPARQL tests area. #84

Closed
afs opened this issue Nov 20, 2022 · 12 comments · Fixed by #90
Closed

Reorganise the SPARQL tests area. #84

afs opened this issue Nov 20, 2022 · 12 comments · Fixed by #90

Comments

@afs
Copy link
Contributor

afs commented Nov 20, 2022

This came out of PR #83.

For the SPARQL rdf-tests, I think we should aim to make most visible a conservative "latest agreed" test suite.

We can keep reference copies of the originals, but the easy-to-find tests are current.

A reorganisation is friction but now seems like a good time to prepare because in preparation for the WG RDF-star which is also chartered for errata.

Suggestion: have a reference area for working groups for SPARQL 1.0 (DAWG) and 1.1 test suites and make rdf-tests/sparql11 a "current" area.

  • Put reference copies in archive/data-r2 and archive/data-sparql11/.
  • archive/README.md
  • Two directories tests-sparql-1/, tests-sparql-2/ which are SPARQL 1.0 updated and SPARQL 1.1 updated. (keeping them as two directories because we already have that split)
  • New README.md that says "use tests-sparql-1/ and tests-sparql-2/"
  • CHANGES.md lists briefly - not justification - the changes made.
  • Move sparql11/index.html to archive/wg-tests-sparql11-index.html
  • Update rdf-tests/index.html for the moved index file.

The current sparql11/README.md linking to index.html isn't nice!

I also hope this reorg is not a huge amount of work. We can make more changes later after the basic are done.

More disruptive:

  • Rename rdf-tests/sparql11 as rdf-tests/sparql in preparation for the WG RDF-star which is also chartered for errata.
@rubensworks
Copy link
Member

rubensworks commented Nov 21, 2022

In general, I agree with the proposed changes.
However, I would be in favor of making these changes backwards-compatible, as there may be quite a bit of projects that depend on this organization, and would break due to these changes.

I think we can go forward with these changes and remain backwards-compatible by adding a softlink from sparql11/ to archive/data-sparql11/ or tests-sparql-2/, so that the organization in that directory remains as-is.
Optionally, we could add a deprecation note on this softlink somewhere, and say that this link will be remove at some point in the future (e.g. when the RDF-star WG is completed?).


Two directories tests-sparql-1/, tests-sparql-2/ which are SPARQL 1.0 updated and SPARQL 1.1 updated. (keeping them as two directories because we already have that split)

Do we need them, if we have the archives, and the new rdf-tests/sparql?

@afs
Copy link
Contributor Author

afs commented Nov 21, 2022

Hi @rubensworks -- thanks for the comments. Yes, there some disruption for existing projects in order to help newcomers. Running a top level manifest should mean that is limited as would linking sparql11/ -- or keeping a "sparql11 layout compatibility area".

It was while changing syntax-esc-05 in place that it occurred to me I was proposing changing the copy of the SPARQL 1.0 test suite and it would not be obvious as to the status. It looks like a local clone.

This balance is always hard but over time "legacy-first" builds a debt.

Do we need them [the 1.0/1.1 directories], if we have the archives, and the new rdf-tests/sparql?

We could have one area of all the tests, or a more logical split (query, update, protocol) . The query syntax tests would probably
a bit of sorting out but nothing major.

My goals are:

  • a layout that newcomers and browsing users can navigate (README and walk to git tree)
  • be clear it's "live" and current, not the original WG tests (c.f. living spec)
  • prepare for more tests - RDF-star and other tests (errata) from the new WG
  • avoid a big change if possible - reorganise then do improvement work - otherwise
  • The test can be run by cloning the repo for a local file copy and running tests offline or in a CI system.

I'll put in a PR so there is a concrete thing to discuss.

@gkellogg
Copy link
Member

@afs said:

Suggestion: have a reference area for working groups for SPARQL 1.0 (DAWG) and 1.1 test suites and make rdf-tests/sparql11 a "current" area.

So, this is a version control system, and previous states of the tests can always be described on a persistent branch. Particularly for systems still using RDF 1.0 semantics, I'm not sure how much we should pander to their needs, as it becomes an ongoing maintenance burden.

  • Put reference copies in archive/data-r2 and archive/data-sparql11/.
  • archive/README.md
  • Two directories tests-sparql-1/, tests-sparql-2/ which are SPARQL 1.0 updated and SPARQL 1.1 updated. (keeping them as two directories because we already have that split)

Does this imply an eventual tests-sparql-3 directory for SPARQL 1.2? As all tests are currently under /sparql11, maybe (if we want to maintain this), a /sparql10 and a /sparql11 test, but really using something like a specVersion property in individual unit tests is more maintainable; JSON-LD used this paradigm.

  • New README.md that says "use tests-sparql-1/ and tests-sparql-2/"
  • CHANGES.md lists briefly - not justification - the changes made.
  • Move sparql11/index.html to archive/wg-tests-sparql11-index.html
  • Update rdf-tests/index.html for the moved index file.

The current sparql11/README.md linking to index.html isn't nice!

If we have both README.md and index.html in the same directory, the index.html should probably be somewhat derived from the README.md.

I also hope this reorg is not a huge amount of work. We can make more changes later after the basic are done.

More disruptive:

  • Rename rdf-tests/sparql11 as rdf-tests/sparql in preparation for the WG RDF-star which is also chartered for errata.

IMO, RDF-star should probably use this repo, and we need to decide how to support future, current, and previous versions of tests. Maintaining archive directories probably still require maintenance, and we can point to a branch for previous versions of the test suites, while maintaining compatibility with difference specification versions using test properties.

@rubensworks said:

In general, I agree with the proposed changes. However, I would be in favor of making these changes backwards-compatible, as there may be quite a bit of projects that depend on this organization, and would break due to these changes.

There's an ongoing cost to this. if projects are continuing to run off of the repo, or pull from it, some amount of change should be tolerated. The manifest structures are intended to allow testing without built-in dependence on the directory structure.

I think we can go forward with these changes and remain backwards-compatible by adding a softlink from sparql11/ to archive/data-sparql11/ or tests-sparql-2/, so that the organization in that directory remains as-is. Optionally, we could add a deprecation note on this softlink somewhere, and say that this link will be remove at some point in the future (e.g. when the RDF-star WG is completed?).

If we use an archive on the branch with symlinks, I think it should be clear that these are for transitional use, and point people to named branches to get earlier versions of the test suites.

@afs said:

Hi @rubensworks -- thanks for the comments. Yes, there some disruption for existing projects in order to help newcomers. Running a top level manifest should mean that is limited as would linking sparql11/ -- or keeping a "sparql11 layout compatibility area".

It was while changing syntax-esc-05 in place that it occurred to me I was proposing changing the copy of the SPARQL 1.0 test suite and it would not be obvious as to the status. It looks like a local clone.

This balance is always hard but over time "legacy-first" builds a debt.

!!!

  • a layout that newcomers and browsing users can navigate (README and walk to git tree)
  • be clear it's "live" and current, not the original WG tests (c.f. living spec)
  • prepare for more tests - RDF-star and other tests (errata) from the new WG
  • avoid a big change if possible - reorganise then do improvement work - otherwise
  • The test can be run by cloning the repo for a local file copy and running tests offline or in a CI system.

I'll put in a PR so there is a concrete thing to discuss.

This sounds like a good plan. Note that #79 is long outstanding, and attempts to decouple the protocol tests from the use of a single externally maintained tool, and should also be considered in any reorganization.

@rubensworks
Copy link
Member

I definitely agree with the intended re-organization, but we have to be thoughtful of systems depending on the current layout for their testing architecture, and that these systems will break if we make sudden non-backwards-compatible changes.
That is why I think it's really important to have this backwards-compatibility, even if it's just temporary, to allow external systems to transition to the new layout.

So, this is a version control system, and previous states of the tests can always be described on a persistent branch.

That is indeed true for systems that use the tests via git. However, since this repo is also exposed via GitHub Pages, there are also systems consuming these tests over HTTP, and only the latest version (since GitHub Pages doesn't enable versioning).

For instance, I am aware of at least 36 projects that depend on these tests over HTTP, and can have their CI suddenly fail if breaking changes are made to the layout of this repo.

So my suggestion is: go forward with the repo reorganization, but temporarily (3 or 6 months?) keep the old structure (e.g. via softlinks), and communicate this deprecation to consumers to give them the time to transition.

My goals are:

Based on the above, another goal we could add is the following:

  • The test can be run by consuming the test suites over HTTP via GitHub Pages and running tests offline or in a CI system.

@afs
Copy link
Contributor Author

afs commented Nov 22, 2022

"consuming the test suites over HTTP".

Effort went into making sure the tests work over HTTP or from local files controlled by the base URL.

What github pages means is that we don't have full control over the webserver for file shuffling.

We do have our own HTTP-enabled "symbolic links" - the manifest files are a point of indirection. We don't have to rely on git symbolic link handling and any OS interaction.

The proposal is to do some directory renaming to be a bit clearer these are not the original WG tests and untangle them from a fixed name of "sparql-11" going forward.

@TallTed
Copy link
Member

TallTed commented Jan 24, 2023

I hate tests-sparql-2 and tests-sparql-3 for SPARQL 1.1 and SPARQL 1.2, respectively. These can and will do nothing but increase the already confusing naming that omits the decimal (e.g., sparql11 and sparql12 for the same SPARQL 1.1 and SPARQL 1.2 ... are those "sparq one eleven" and "sparq one twelve"? or "sparql eleven" and "sparql twelve"?). I know, no-one wants dots in file or directory names; fine, make them all hyphens (i.e., sparql-1-1 and sparql-1-2, for the same SPARQL 1.1 and SPARQL 1.2, or even better sparql-01-01 and sparql-01-02 so we've got easy room to grow, hopefully not beyond double-digit major or minor versions).

Further, I am coming to (strongly!) believe that there should be a set of subdirectories that combine to form a matrix of RDF and SPARQL versions, such that there are tests for RDF 1.0 with SPARQL 1.2, and separate tests for RDF 1.2 with SPARQL 1.0 (because, yes, such combinations will be found in the wild [1], no matter how strongly we advise against whichever combinations are problematic, and such tests will at least hopefully decrease the numbers of implementers and deployers who choose a challenging combination without any idea of what they're in for).

[1] SPARQL-FED introduces further wrinkles, as a SPARQL 1.1 or 1.2 processor may call on a remote SPARQL SERVICE which may be SPARQL 1.0, 1.1, or 1.2, and (currently) have no way to interrogate the remote processor about that detail, nor to request that (for instance) a SPARQL 1.2 SERVICE process the request as SPARQL 1.0 (if it can)!

@afs
Copy link
Contributor Author

afs commented Feb 5, 2023

Here is a plan for a rdf-test repo change in the light of discussions here.

The current situation is that SPARQL tests are rooted at "sparql11" which isn't going remain a suitable name, /sparql11/data-r2/ and /sparql11/data-sparql11/ are not great names, "r2" in particular.

/rdf-tests/sparql11/data-sparql11/ has been the focus on improvement (e.g. URI resolution, casting tests, escapes) by community agreement.

/rdf-tests/sparql11/data-r2/ has had some manifest fixups.

The suggestion is have a new area, /rdf-tests/sparql/ with a transition paths from /rdf-tests/sparql-11 described below.

Objectives:

  • a clearer naming tree especially for casual/first time visitors (contributors and users)
  • prepare for future additions (RDF 1.2, SPARQL 1.2)
  • Have a maintained area that reflects the community consensus on what is "current".
  • Don't create additional work for contributors.
  • No material for previous versions is removed and is still accessible.

FYI There have been 6 contributors over the lifetime of the /rdf-test/sparql11/ and also 6 to RDF syntax tests.

Without contributors, we don't have a useful community asset.

New

Root the tests at /rdf-tests/sparql/

This allows for the possibility in the future of

  • /rdf-tests/sparql/sparql-12
  • /rdf-tests/sparql/sparql-dev now or in the future depending on how the RDF-star working group operates.

New

/rdf-tests/sparql/sparql-10/
(copy of data-r2 updated for RDF 1.1)
/rdf-tests/sparql/sparql-11/
(copy data-sparql11 which is already RDF 1.1 and modified for community consensus)

The names reflect the origin.

(Merging into one active set is out of scope for the change described here - if anyone is interested in driving that, let's have a separate issue.)

/rdf-tests/sparql/README
to explain the layout and provide links.

Keep the old layout

Transition for /rdf-tests/sparql11/data-* to a new location.
Examples:

  • /rdf-tests/sparql/previous/data-*
  • /rdf-tests/sparql/archive/data-*
  • Part of /rdf-tests/sparql/matrix/

This could be the less-updated currently tagged at sparql-mixed-rdf-version-tests.

Do not move without warning.

Transition

The transition is to leave /rdf-tests/sparql11/ in-place as Ruben asks and change /rdf-tests/sparql11/README.md to explain the situation.

Document links back to the WG original copies that are the authoritative test suites for each WG in /rdf-tests/sparql/README.

After some reasonable time (at least a few months), rename /rdf-tests/sparql11/ as agreed on a separate issue.

Also

If there are to be maintained additional variants added, then there needs to be people willing to take on that work.

@afs
Copy link
Contributor Author

afs commented Feb 5, 2023

Preview: https://github.com/afs/rdf-tests/tree/reorg

@rubensworks
Copy link
Member

Thanks for including the backwards-compatbility in /rdf-tests/sparql11/ @afs, looks perfect to me!

Once this is merged, it would be good to announce this on the proper mailinglists, since we may not reach all relevant people through this issue. Happy to take this up myself if desired.


The only issue I still have with this format is the confusion of major/minor version (sparql-11 vs sparql-1-1).
Since @TallTed has also raised this issue before, and this came up in the RDF-star WG as well, it might make sense to have a vote on this. Not sure if this vote should take place in this CG, or the WG though. Any thoughts?

@afs
Copy link
Contributor Author

afs commented Feb 6, 2023

Re: naming: it is a hot topic with views on both sides. There is a lot of comments on the other side as well.

It could be sparql10, sparql11, sparql12 to more closely align with the short names.

It is the RDF-Star WG naming style and the style on other W3C specs (inside and outside semweb sphere).

If there is a matrix, it might make sense there. But that's a different issue.

@gkellogg
Copy link
Member

gkellogg commented Feb 6, 2023

Re: naming: it is a hot topic with views on both sides. There is a lot of comments on the other side as well.

It could be sparql10, sparql11, sparql12 to more closely align with the short names.

It is the RDF-Star WG naming style and the style on other W3C specs (inside and outside semweb sphere).

If there is a matrix, it might make sense there. But that's a different issue.

Naming of W3C short-names pretty much settles this, and that's what RDF-star chose. Anything other than sparql12 would introduce even more confusion.

@afs
Copy link
Contributor Author

afs commented Jul 14, 2023

After some reasonable time (at least a few months), rename /rdf-tests/sparql11/ as agreed on a separate issue.

The "rename" is actually a copy-delete (#103 is the delete) because /rdf-tests/sparql11/ was split into SPARQL 1.0 /rdf-tests/sparql/sparql10/ and SPARQL 1.1 /rdf-tests/sparql/sparql11/ areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants