-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-values] Automatic parsing of value definitions #2921
Comments
Wow, this is just written very badly. That's totally invalid syntax, and should instead be
This was fixed, as you noted.
This is valid; it's using the extended Value Definition Syntax for rules defined in https://drafts.csswg.org/css-syntax/#rule-defs.
Yes, this is hand-wavey. I'd be happy to work out a more precise syntax for it. Updating every single spec is more work, of course. ^_^
Correct on both counts.
They'll be single quotes in the source; curly quotes are probably coming from Bikeshed's formatting of the output.
Hm. 'fill' is just us sketching; that line wouldn't make it into a professional spec. I suppose I can move the footnote markers out of the grammar and just refer to the productions more explicitly in prose.
I think we should encourage using =, yeah. While technically not necessary for linking purposes (Bikeshed takes care of things already), it makes things a bit clearer, I think.
Ranges, yeah. Regexes unlikely to show up - what's the use-case for them?
Hmm, maybe. It seems less directly useful than the IDL index, but I'm not opposed to such a thing. |
Ah, got it! We did not pay much attention to these rules because we restricted ourselves to properties and descriptors for now. From an automated parsing perspective, it's not entirely clear how to distinguish between the two when parsing a I guess another thing that I'm wondering about is the intended scope of these definitions. For instance, that spec contains two references to
OK. FWIW, we created a list of "missing" rules with possible values for definitions that we could not extract automatically. Time permitting, we'll report these to individual specs or prepare appropriate PR.
I guess the use cases that we had in mind for regexes were:
That may not be compelling enough use cases to warrant the introduction of regexes. |
That's a spec bug, yes. The one for the
Nice. You can just drop that into a single issue and tag all the specs that are mentioned.
Yeah, I value machine-readability here much less than I value an understandable and readable grammar definition. ^_^ The prose definitions are much more acceptable, imo, as a definition for these things, and the number of times we'd actually want to do some sophisticated matching on token representations are so small in the first place. |
I'm starting to implement the changes resolved on in w3c/reffy#355, to use the bracketed range notation to indicate values with constraints on them. That's relevant to this discussion in two ways:
|
I don't think glyph-orientation-vertical should be switched over. It's a bizarro legacy thing, leave it as such. |
Addition to the CSS Grammar in https://drafts.csswg.org/css-values-3/#numeric-ranges See also w3c/csswg-drafts#2921 (comment)
* Addition to the CSS Grammar in https://drafts.csswg.org/css-values-3/#numeric-ranges See also w3c/csswg-drafts#2921 (comment) * Add range to CSS Grammar Parsing output schema * Test support for range restriction grammar parsing
If you allow me to report some issues I had while implementing a library for automatic parsing of CSS values... Context: this library aims at replacing I will only report the issues that are related to automatic parsing of definitions (of types and properties, ie. the title of this issue), which is defined in a comment above as the goal of the CSS value syntax, defined in the specification of CSS values, because I think the issues related to parsing a CSS value by using a parse function generated from the corresponding property definition, deserve their own issues. I was not sure where to submit the following issues. Some are more directly related to Types written in prose and not reported by
|
Yes. (Tho possibly logicals with physical equivalents also have this.)
Unclear. The logicals probably are the same as shorthands, and don't have an initial value at all; this isn't consistent between css-logical and css-backgrounds currently.
No, they can't be - there are often many ways to produce a particular terminal value that are not just the literal tokens. For example,
You may have heard that CSS tokenization is greedy, which is true (for example, (Technically "greedy" doesn't apply to tokenization, because it's specified with an algorithm rather than as a grammar. But the algo is designed to implement "longest-match" greedy semantics for a theoretical equivalent grammar, because that's the semantics that CSS2 had when it did specify tokenization with a grammar.) |
The
|
Currently I define
Sorry. I was (confusedly) thinking that
Ok. I guess that the many ways to procedure a particular terminal value are when consuming a function or a simple block. I didn't really understood that a math function can be many possible types, such as
The most up to date definition of Thank you tidoust for giving me these links. I will certainly read everything because I'm sure to learn other things that will help me to save time. |
Re: parsing greediness, I'm not sure I understand your response. As I said, parsing is non-greedy; if the first branch that starts to match eventually fails, you just move on to the second branch and try again. There's no need to order the branches in any particular way to accommodate this, so we order them for readability usually. If you're trying to match CSS grammars against values using a greedy (non-backtracking) parser, you're gonna have a bad time. You have to be able to backtrack. |
I wanted to know the expected behavior when the first branch successfully matches the first component value in the input list, if that branch expects a single component value while other branches accept multiple component values. When parsing EDIT: oh ok I got it, it means that I should move "one step backwards" and try with another branch, if any, even if it means matching a branch of a type that (my current implementation of) the parser tought that it already had a match. |
Right, that's a backtracking parser vs a greedy/first-match parser. CSS grammars are intended for use with a backtracking parser. |
Could I please have a clarification on definitions marked up with I'm asking this because I see "<url()>": {
"prose": "Typically, a <url> is written with the url() or src() functional notations:"
},
"<src()>": {
"prose": "Typically, a <url> is written with the url() or src() functional notations:"
}, But there is no reference of these productions in any definition, not even in the definition of They may be isolated cases, but it makes me wonder if the only difference with
I believe that the only difference with a function definition marked up with |
This is the function definition pattern I think we should be aiming for:
|
This is now defined by a4bfe38 |
It would fix all of the issues I have with extracting function value definitions (that are not inlined in a context value definition) from the data exposed by Actually, I do not really understand why the For example, when following the reasoning justifying the need of |
Well, I split up the CSS and IDL dfn types too finely when I started Bikeshed originally. Ideally they'd be organized by whether or not they could potentially have name clashes. (For example, in IDL the attribute and method types could/should be combined, but interface has to be separate from those.)
This definitely isn't correct, tho. The definition types aren't meant to map to any particular token type. |
I see.
When you define However, some functions are context-free but are only defined with I need to resolve |
Context for this issue is that @dontcallmedom and I spent some time integrating CSS specs in the list of specs crawled by Reffy. Our goal was to extract and parse value definitions for CSS properties and descriptors from all CSS specs, first step that could then perhaps be used to detect potential anomalies, automate the creation of parsing tests, or create tools that list CSS properties (apart from the detection of a few anomalies which led to the issues I reported yesterday and today on individual specs, we haven't had time yet to look into using the result of this parsing).
This exercise was also meant as an occasion for us to take a deeper look at how CSS specs are written. It is quite possible that we misunderstood a few things, we're much more familiar with API specs in practice. Also, as opposed to API specs where the automatic extraction of IDL content allows to create tests, and actual stubs for implementation, the automatic extraction and parsing of value definitions of CSS properties may perhaps not be seen as an interesting goal or a priority, because that does not trigger major interop issues in practice.
That being said, taking for granted that the goal of the Value Definition Syntax is to ease the automatic parsing of values, we noted a few potential issues:
Keyword values do not allow for some of the values actually used in specs. The syntax defines keyword values as identifiers which conform to the
<ident-token>
grammar. Unless we read that definition incorrectly, this does not allow keywords to start with a digit or with an@
. Some values need that such asglyph-orientation-vertical
,font-weight
(although value was replaced by<number>
in Level 4), or<feature-type>
(e.g.@stylistic
).The syntax does not describe the use of
=
to define expansion rules of non-terminals. Most specs use<non-terminal> = <actual-dfn>
equations, but the parsing of that equation is not defined anywhere as far as we can tell. In practice, the<>
are sometimes omitted on the left-side of the equation as in theinset()
definition andcontent-list
definition. In other cases, the=
is not used at all as in thefade()
definition. Some definitions also use a final semi-colon as in CSS Display, CSS Box Alignment and CSS Counter Styles.Some specs extend the definition of a property with a "New value" field, which we understand must be combined with the actual definition with a
|
combinator. Unless we missed something, this is not described anywhere though.The syntax talks about quotes, but does not explicitly define which quotes to use. Curly quotes get used in practice in most specs. We were rather reading "single quotes" as meaning the apostrophe
'
character, still used in other specs (such as in CSS 2.2). Anything is fine, but it would be good to make that explicit in the Value Syntax Definition.Some specs mix actual value definitions with prose. I reported the
fill
property in [fill-stroke] Missing quotes around property ref, and not a real "value" fxtf-drafts#300 for instance. Another example is the use of dagger characters to reference footnotes in the<an+b>
definition. These values are not ambiguous for human beings, but makes automated parsing more challenging.From time to time, some rules that could be defined with an
=
construct are actually defined in prose. For instance,<basic-shape>
could perhaps be defined as<inset()> | <circle()> | <ellipse()> | <polygon()>
, or<border-style>
asnone | hidden | dotted | dashed | solid | double | groove | ridge | inset | outset
. The generic question is whether that is something that you'd like to encourage.The syntax does not allow to define apparently "simple" things such as ranges or regular expressions. We noted the discussion in [css-values] Define value syntax that limits <integer>, <number>, <length>, etc. to ranges #355, so that's probably under way.
All in all, what we're wondering is whether it could be useful to end up with a CSS Value Definition Syntax that would allow the creation of a dump similar to the IDL index that appear at the end of API specs. For instance, for CSS Display Module Level 3, this dump could be:
CSS Flexbox would then complete the definition of
display
with:The text was updated successfully, but these errors were encountered: