Implement a common language style throughout the specifications #1509

gregsdennis · 2024-05-22T02:07:38Z

There are generally two approaches to specifying requirements:

Permissive language, with which the specification explicitly states what an implementations is allowed to do.

The implementation can only and MUST do these things.
Restrictive language, with which the specification only states what an implementation MUST or MUST NOT do.

The implementation can do what it wants within these bounds.

(I do see the irony in these labels, as the "restrictive language" approach actually results in a more permissive outcome for the implementation. For this conversation, I chose to focus on the language used rather than the outcome for the implementation.)

In editing the documents, I've found that both approaches are present in our specifications. Here's an example from Core 4.3.1 where both styles exist in the same paragraph:

A JSON Schema MAY contain properties which are not schema keywords. Unknown keywords SHOULD be treated as annotations, where the value of the keyword is the value of the annotation.

We're giving permission for implementations to support unknown keywords, and then requiring (softly) that unknown keyword values be collected as annotations.

We should decide on one or the other.

Personally, I prefer restrictive language as it gives implementations freedom to explore the edges of what's possible with JSON Schema while defining the bounds of what JSON Schema is; whereas permissive language requires that an implementation only ever be the prescribed thing because it can't do something if we don't give it permission. If we still want to explicitly state what an implementation MAY do, I think that should be either in an editor's note or in documentation.

I would rewrite the example above as simply the second sentence (disregarding that we're disallowing unknown keywords moving forward):

Unknown keywords SHOULD be treated as annotations, where the value of the keyword is the value of the annotation.

With this, unknown keywords are implicitly allowed. It doesn't make sense to collect unknown keywords as annotations if they're not allowed in the first place. The first sentence in the original adds nothing.

I'd like to take a pass before the initial stable release to go through all of the language and remove these "permissions" and just let the restrictions stand. This would move us into "the spec doesn't say I can't do this" territory, which I think is more open.

Also related to #922.

gregsdennis · 2024-05-22T20:23:52Z

Another style consideration is that in several places, we add requirements on schema authors, e.g. "Schema authors SHOULD NOT use pointers that cross resource boundaries." Would it be better to instead place limitations of support on implementations, e.g. "Implementations SHOULD NOT support pointers that cross resource boundaries"?

By placing the requirement on the implementation, it forces schema authors to conform.

jdesrosiers · 2024-05-24T13:32:43Z

I agree that what you call "restrictive language" is better. I advocate for this all the time, although I've never described it that way. I think the spec should be a lot like JSON Schema itself, by default anything is allowed and the constraints put limits on what implementations can do. For example, pointers crossing resource boundaries shouldn't be allowed or disallowed, just undefined. Implementations can handle this however they want, but schema authors shouldn't rely on that behavior. I think sticking to restrictive language could help with the bloat the spec has accumulated as well.

As for places where we put requirements on schemas authors, that's another thing that bothers me as well. We definitely shouldn't be placing requirements on schema authors. That doesn't make sense. The spec is for implementations.

jviotti · 2024-06-20T02:25:23Z

For example, pointers crossing resource boundaries shouldn't be allowed or disallowed, just undefined

I never wrote specs myself, so wondering what is the benefit of leaving things open in this way? Seems to be that a lot of the benefit of JSON Schema results when interoperability is achieved, and these gray areas tend to be where people get really confused. If we can help it, wouldn't it be better to reduce undefined behaviour?

gregsdennis · 2024-06-20T02:46:27Z

a lot of the benefit of JSON Schema results when interoperability is achieved

That's absolutely correct. We should minimize what we leave undefined. A specification defines a set of behaviors which an implementation must exhibit. That means that users of the implementation can rely on that set of behaviors. None of this is in question here.

This issue is about the language used to define the behavior. I think it makes more sense for the language to define restrictions rather than give permission.

Saying

A JSON Schema MAY contain properties which are not schema keywords.

is a permission, but saying

Implementations MUST NOT error when encountering properties which are not schema keywords.

is a restriction.

It's defining a "compliance box". Within the box, implementations are expected to behave a certain way. They can still operate outside of the box if they choose, but users shouldn't expect such operation to be interoperable because the spec doesn't address it.

@jdesrosiers said it well, I think:

I think the spec should be a lot like JSON Schema itself, by default anything is allowed and the constraints put limits on what implementations can do.

jdesrosiers · 2024-06-21T02:55:30Z

I never wrote specs myself, so wondering what is the benefit of leaving things open in this way?

Sometimes it's just to not invalidate the behavior of existing, well established implementations. For example, in JSON, the behavior of duplicate keys in an object is undefined. Different implementations handled that in different ways and it's nonsense anyway, so saying it's undefined allows existing implementations to be compliant rather than insisting on a specific behavior for something that people couldn't use in a way that made sense anyway.

I personally, see the pointers crossing resource boundaries exactly that way. It's nonsense and people should never do it even if it happens to work, but requiring that it produce an error could require a significant change in the architecture of many existing implementations. Placing that burden on existing implementations isn't necessary for something that doesn't make sense anyway.

jviotti · 2024-06-21T23:05:00Z

Makes sense!

I personally, see the pointers crossing resource boundaries exactly that way. It's nonsense and people should never do it even if it happens to work, but requiring that it produce an error could require a significant change in the architecture of many existing implementations. Placing that burden on existing implementations isn't necessary for something that doesn't make sense anyway.

Do we have a list of things that are possible but we think they are non-sense and people should never do? I would love to write linter rules for these

awwright · 2024-07-11T21:28:24Z

The normative language could definitely be reviewed and streamlined for sure. The relevant specification is BCP 14. Though some amount of mixed language may be necessary. The most important purpose of the all-caps BCP 14 language is interoperability, and selection of prescriptive vs. proscriptive language will still be mixed when defining what will make the protocol or format interoperable or forward compatible. You will often find statements in complimentary pairs like "A validator MUST reject schemas that..." and "Schemas MUST NOT specify..." due to the fact that normative requirements usually target one party at a time; and truly prohibiting something necessitates normative language on both parties.

For example, some construct might be prohibited in schemas ("MUST NOT") because of known interoperability issues. But this doesn't impact validators; the specification might still permit validators to handle the construct, or it might require an error ("MUST reject"), because any new usage would harm interoperability. A prohibition on schemas doesn't imply what validators ought to do one way or the other.

gregsdennis · 2024-07-12T02:47:50Z

I completely agree, @awwright. My concern isn't with the BCP 14 keywords, but rather how the requirements are defined.

Currently there's a mix of "implmentations MAY do X" and "implmenetations MUST do Y". In these phrases, "MUST" creates a boundary: a line that implementations can't cross (without operating outside of the spec). However, "MAY" is giving permission to do something.

The way I see it, if you define a boundary of behavior, implementations MAY do whatever they want within that boundary without any kind of explicit permission to do so. The only reason to use "MAY", then, is to allow for a behavior that exists outside of the defined boundary. I'd just prefer to define the boundary correctly from the beginning.

some construct might be prohibited in schemas ("MUST NOT") because of known interoperability issues

I also think we should avoid language that places requirements on schema authors. Such requirements have no benefit unless an implementation provides behavior to enforce it. So putting a requirement on the author necessarily creates an implicit requirement for the implementation, and such requirements can be difficult to identify. Instead, we should be defining direct requirements on implementations that enforce a particular behavior from authors. (e.g. "Don't create that construct because the implementation will error.")

So for this example, we'd put a requirement on the implementation to detect and disallow this construct. Schema authors will naturally fall in line. However, implementations could still offer an opt-in to support the construct, though. But that's the key for me: such behavior needs to be opt-in.

gregsdennis · 2024-09-30T22:04:03Z

Another style consideration is that in several places, we add requirements on schema authors, e.g. "Schema authors SHOULD NOT use pointers that cross resource boundaries." Would it be better to instead place limitations of support on implementations, e.g. "Implementations SHOULD NOT support pointers that cross resource boundaries"?

By placing the requirement on the implementation, it forces schema authors to conform.

@gregsdennis (me)

@awwright has a good comment here where he explains what it means to place a requirement on the schema author:

Practically speaking, it says if you don't follow this requirement, then interoperability with other implementations won't necessarily be guaranteed.

Maybe we just need a clarification somewhere that explicitly states that these requirements are not necessarily something for implementations to enforce but rather a note of caution for authors. It could even just be in the site docs or something; it doesn't need to be in the spec.

Still, I'm not optimistic about schema authors reading the spec. It does make sense to me for us to define behavior for the implementation that then drives authors to conform.

jdesrosiers · 2024-10-01T22:06:23Z

I agree completely with what Austin described, but I don't think adding a clarification is the right thing. Let's remove the ambiguity altogether and not put requirements on schema authors. If there's a schema author requirement that we think should have a defined behavior, we should define that on the implementation. However, I think most things like this can either be removed entirely or phrased as informational rather than as requirement.

gregsdennis added this to Stable Release Development May 23, 2024

gregsdennis added this to the stable-release milestone May 23, 2024

gregsdennis moved this to In Discussion in Stable Release Development May 23, 2024

gregsdennis mentioned this issue Aug 14, 2024

Proposal: Make format validate by default #1520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a common language style throughout the specifications #1509

Implement a common language style throughout the specifications #1509

gregsdennis commented May 22, 2024 •

edited

Loading

gregsdennis commented May 22, 2024 •

edited

Loading

jdesrosiers commented May 24, 2024

jviotti commented Jun 20, 2024

gregsdennis commented Jun 20, 2024

jdesrosiers commented Jun 21, 2024

jviotti commented Jun 21, 2024

awwright commented Jul 11, 2024

gregsdennis commented Jul 12, 2024

gregsdennis commented Sep 30, 2024

jdesrosiers commented Oct 1, 2024

Implement a common language style throughout the specifications #1509

Implement a common language style throughout the specifications #1509

Comments

gregsdennis commented May 22, 2024 • edited Loading

gregsdennis commented May 22, 2024 • edited Loading

jdesrosiers commented May 24, 2024

jviotti commented Jun 20, 2024

gregsdennis commented Jun 20, 2024

jdesrosiers commented Jun 21, 2024

jviotti commented Jun 21, 2024

awwright commented Jul 11, 2024

gregsdennis commented Jul 12, 2024

gregsdennis commented Sep 30, 2024

jdesrosiers commented Oct 1, 2024

gregsdennis commented May 22, 2024 •

edited

Loading

gregsdennis commented May 22, 2024 •

edited

Loading