-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate additional Schematron Rules for GeekoDoc #6
Comments
I guess adding this to GeekoDoc might be the better idea for the time being... For an idea of what we could do with Schematron directly in GeekoDoc, see: openSUSE/suse-xsl#222 . There is quite a number of cases associated with table markup and you generally notice those issues currently when going the step from FO->PDF because FOP balks. This is also not really style checker territory because it really leads to hard errors that are not caught by current validation methods. Then again, if we have more such cases, we could move some checks from the style checker to GeekoDoc. |
DocBook >= 5.0 brings also some (ISO) Schematron files, see However, it seems, oXygen is not that happy with the schema. It shows this error message:
This is the respective line: <s:pattern name="Glossary 'firstterm' type constraint"> which should be corrected like this: <s:pattern>
<s:title>Glossary 'firstterm' type constraint</s:title> |
The tools side of Schematron seems to be interesting ...
Websites related to Schematron are also interesting: They seem to either show lots of 404 errors (schematron.com has a working front page but all sub pages 404), lead to ad farms (Rick Jeliffe's home page with the reference implementation, Probatron) or advertise proprietary software (Oxygen, XML Buddy, Topologi). I am starting to think that investing in Schematron at this point might not be such a good idea. [edit 1, sknorr: libxml does have Schematron 1.5 support but it is not mentioned in the man page.] |
Actually, this is not quite true. There is the option I wouldn't consider this a valid alternative... |
I think the best approach would be to write a wrapper in Python using lxml library. This library supports ISO Schematron. A quick fix reveals some nice features: from lxml import isoschematron
from lxml import etree
# Create a Schematron parser:
sch_doc = etree.parse("geekodoc5.sch")
schematron = isoschematron.Schematron(sch_doc)
# Parse our DocBook5 source:
doc = etree.parse("foo.xml")
schematron.validate(doc)
# => False
print(schematron.error_log)
# => Prints an extensive error log (XML) which can be parsed I think, this can be easily created into a small Python "Schematron validation script". ;-) |
Yes, I can understand that you get this impression. I've recently discovered this 404 page as well. Not sure why this isn't available anymore. Nevertheless, I don't think it is that bad. As I've shown in my earlier post, it can be used in lxml, with some minimal scripting efforts. All in all, I don't think this is something I would abandon Schematron at this stage. Of course, if lxml reveals some technical problems. we will need to think again. |
Apart from my last comment, we should add specific rules depending on GeekoDoc and our styleguide. DefinitionsI would suggest to distinguish between "hard" and "soft" rules:
Hard Rules
Soft Rules
Probably I miss other rules. |
toms wrote...
Both of those rules are good ways to make our "documentation updates" sections fail validation... :/ |
Ahh, right! Ok, we could move these from hard to soft rules. I just try to collect some examples... |
As I said somewhere above: within tables, counting the actual columns v/ columns set up via colspec would be great. And there are more issues concerning tables that should make validation fail but don't: such as bad column name references etc. We could also check for spaces in ID attributes, such as in e.g. These would also give us added value as opposed to reimplementing something that is already covered by SDSC. |
Well, we could check if the value of However, tables can get complicated when spanning a cell or row are involved.
Great idea!
But don't we want to move these parts into the Schematron schema? |
Moved the list of checks into original description. |
From #6 (comment), I've tried to create a script which can validate our (yet to be definied) Schematron schema. In the long run, the script can be integrated into daps (if not, it was a good exercise 😀 ). @sknorr: For a first draft, see https://github.com/openSUSE/schvalidator |
In openSUSE/suse-doc-style-checker#117, I raised the question if a Schematron schema could be useful for SDSC. The same question can be asked for GeekoDoc as well.
A Schematron schema can be used in two ways:
Schematron rules are embedded inside the RNG schema.
Schematron rules are collected outside in a different file (extension
.sch
). They are independant of the existing GeekoDoc RNG.The validation procedure would be different:
The validation with Schematron would be an integral part. In other words, after structural validation
the rule-based validation process would be performed. Both can't be separated.
The validation with a separate Schematron schema would be step-wise. First step would be always
the structural validation with RNG. If wanted (or needed), additional validation can be performed
with Schematron. Both validation processes can be separated.
Rick Jelliffe, the inventor of Schematron, describe the language as "a feather duster to reach the parts other schema languages cannot reach". ;-)
Benefits
Schematron Versions
Currently, there are two versions of Schematron:
ISO-Schematron (published Mai 2006)
the de-facto standard of Schematron. The new namespace
http://purl.oclc.org/dsdl/schematron
.Schematron 1.5 (published 2001)
The old reference implementation in pure XSLT. The namespace is
http://xml.ascc.net/schematron/
.Tools
Schematron validation are supported by:
xmllint
and option--schematron
.lxml
, see http://lxml.de/validation.html#id2See also
Personal
From my perspective, I prefer the separate Schematron schema (assuming all is possible, feasible, or useful). It seems, this doesn't introduce too many changes and gives greater flexibility.
I see it more as a "conformance and consistency" check rather than a hard validation. Of course, the rules shouldn't bother our writers too much.
Maybe we should also (re?)think about our definition of "validity/validation".
--
Update: List of Checks
Hard Rules
xml:id
Soft Rules
xml:id
attributes.@sknorr I've separated the discussion in SDSC from the GeekoDoc aspect. Feel free to comment. :)
The text was updated successfully, but these errors were encountered: