-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a common language style throughout the specifications #1509
Comments
Another style consideration is that in several places, we add requirements on schema authors, e.g. "Schema authors SHOULD NOT use pointers that cross resource boundaries." Would it be better to instead place limitations of support on implementations, e.g. "Implementations SHOULD NOT support pointers that cross resource boundaries"? By placing the requirement on the implementation, it forces schema authors to conform. |
I agree that what you call "restrictive language" is better. I advocate for this all the time, although I've never described it that way. I think the spec should be a lot like JSON Schema itself, by default anything is allowed and the constraints put limits on what implementations can do. For example, pointers crossing resource boundaries shouldn't be allowed or disallowed, just undefined. Implementations can handle this however they want, but schema authors shouldn't rely on that behavior. I think sticking to restrictive language could help with the bloat the spec has accumulated as well. As for places where we put requirements on schemas authors, that's another thing that bothers me as well. We definitely shouldn't be placing requirements on schema authors. That doesn't make sense. The spec is for implementations. |
I never wrote specs myself, so wondering what is the benefit of leaving things open in this way? Seems to be that a lot of the benefit of JSON Schema results when interoperability is achieved, and these gray areas tend to be where people get really confused. If we can help it, wouldn't it be better to reduce undefined behaviour? |
That's absolutely correct. We should minimize what we leave undefined. A specification defines a set of behaviors which an implementation must exhibit. That means that users of the implementation can rely on that set of behaviors. None of this is in question here. This issue is about the language used to define the behavior. I think it makes more sense for the language to define restrictions rather than give permission. Saying
is a permission, but saying
is a restriction. It's defining a "compliance box". Within the box, implementations are expected to behave a certain way. They can still operate outside of the box if they choose, but users shouldn't expect such operation to be interoperable because the spec doesn't address it. @jdesrosiers said it well, I think:
|
Sometimes it's just to not invalidate the behavior of existing, well established implementations. For example, in JSON, the behavior of duplicate keys in an object is undefined. Different implementations handled that in different ways and it's nonsense anyway, so saying it's undefined allows existing implementations to be compliant rather than insisting on a specific behavior for something that people couldn't use in a way that made sense anyway. I personally, see the pointers crossing resource boundaries exactly that way. It's nonsense and people should never do it even if it happens to work, but requiring that it produce an error could require a significant change in the architecture of many existing implementations. Placing that burden on existing implementations isn't necessary for something that doesn't make sense anyway. |
Makes sense!
Do we have a list of things that are possible but we think they are non-sense and people should never do? I would love to write linter rules for these |
The normative language could definitely be reviewed and streamlined for sure. The relevant specification is BCP 14. Though some amount of mixed language may be necessary. The most important purpose of the all-caps BCP 14 language is interoperability, and selection of prescriptive vs. proscriptive language will still be mixed when defining what will make the protocol or format interoperable or forward compatible. You will often find statements in complimentary pairs like "A validator MUST reject schemas that..." and "Schemas MUST NOT specify..." due to the fact that normative requirements usually target one party at a time; and truly prohibiting something necessitates normative language on both parties. For example, some construct might be prohibited in schemas ("MUST NOT") because of known interoperability issues. But this doesn't impact validators; the specification might still permit validators to handle the construct, or it might require an error ("MUST reject"), because any new usage would harm interoperability. A prohibition on schemas doesn't imply what validators ought to do one way or the other. |
I completely agree, @awwright. My concern isn't with the BCP 14 keywords, but rather how the requirements are defined. Currently there's a mix of "implmentations MAY do X" and "implmenetations MUST do Y". In these phrases, "MUST" creates a boundary: a line that implementations can't cross (without operating outside of the spec). However, "MAY" is giving permission to do something. The way I see it, if you define a boundary of behavior, implementations MAY do whatever they want within that boundary without any kind of explicit permission to do so. The only reason to use "MAY", then, is to allow for a behavior that exists outside of the defined boundary. I'd just prefer to define the boundary correctly from the beginning.
I also think we should avoid language that places requirements on schema authors. Such requirements have no benefit unless an implementation provides behavior to enforce it. So putting a requirement on the author necessarily creates an implicit requirement for the implementation, and such requirements can be difficult to identify. Instead, we should be defining direct requirements on implementations that enforce a particular behavior from authors. (e.g. "Don't create that construct because the implementation will error.") So for this example, we'd put a requirement on the implementation to detect and disallow this construct. Schema authors will naturally fall in line. However, implementations could still offer an opt-in to support the construct, though. But that's the key for me: such behavior needs to be opt-in. |
@awwright has a good comment here where he explains what it means to place a requirement on the schema author:
Maybe we just need a clarification somewhere that explicitly states that these requirements are not necessarily something for implementations to enforce but rather a note of caution for authors. It could even just be in the site docs or something; it doesn't need to be in the spec. Still, I'm not optimistic about schema authors reading the spec. It does make sense to me for us to define behavior for the implementation that then drives authors to conform. |
I agree completely with what Austin described, but I don't think adding a clarification is the right thing. Let's remove the ambiguity altogether and not put requirements on schema authors. If there's a schema author requirement that we think should have a defined behavior, we should define that on the implementation. However, I think most things like this can either be removed entirely or phrased as informational rather than as requirement. |
There are generally two approaches to specifying requirements:
(I do see the irony in these labels, as the "restrictive language" approach actually results in a more permissive outcome for the implementation. For this conversation, I chose to focus on the language used rather than the outcome for the implementation.)
In editing the documents, I've found that both approaches are present in our specifications. Here's an example from Core 4.3.1 where both styles exist in the same paragraph:
We're giving permission for implementations to support unknown keywords, and then requiring (softly) that unknown keyword values be collected as annotations.
We should decide on one or the other.
Personally, I prefer restrictive language as it gives implementations freedom to explore the edges of what's possible with JSON Schema while defining the bounds of what JSON Schema is; whereas permissive language requires that an implementation only ever be the prescribed thing because it can't do something if we don't give it permission. If we still want to explicitly state what an implementation MAY do, I think that should be either in an editor's note or in documentation.
I would rewrite the example above as simply the second sentence (disregarding that we're disallowing unknown keywords moving forward):
With this, unknown keywords are implicitly allowed. It doesn't make sense to collect unknown keywords as annotations if they're not allowed in the first place. The first sentence in the original adds nothing.
I'd like to take a pass before the initial stable release to go through all of the language and remove these "permissions" and just let the restrictions stand. This would move us into "the spec doesn't say I can't do this" territory, which I think is more open.
Also related to #922.
The text was updated successfully, but these errors were encountered: