Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a common language style throughout the specifications #1509

Open
gregsdennis opened this issue May 22, 2024 · 10 comments
Open

Implement a common language style throughout the specifications #1509

gregsdennis opened this issue May 22, 2024 · 10 comments

Comments

@gregsdennis
Copy link
Member

gregsdennis commented May 22, 2024

There are generally two approaches to specifying requirements:

  • Permissive language, with which the specification explicitly states what an implementations is allowed to do.

    The implementation can only and MUST do these things.

  • Restrictive language, with which the specification only states what an implementation MUST or MUST NOT do.

    The implementation can do what it wants within these bounds.

(I do see the irony in these labels, as the "restrictive language" approach actually results in a more permissive outcome for the implementation. For this conversation, I chose to focus on the language used rather than the outcome for the implementation.)

In editing the documents, I've found that both approaches are present in our specifications. Here's an example from Core 4.3.1 where both styles exist in the same paragraph:

A JSON Schema MAY contain properties which are not schema keywords. Unknown keywords SHOULD be treated as annotations, where the value of the keyword is the value of the annotation.

We're giving permission for implementations to support unknown keywords, and then requiring (softly) that unknown keyword values be collected as annotations.

We should decide on one or the other.

Personally, I prefer restrictive language as it gives implementations freedom to explore the edges of what's possible with JSON Schema while defining the bounds of what JSON Schema is; whereas permissive language requires that an implementation only ever be the prescribed thing because it can't do something if we don't give it permission. If we still want to explicitly state what an implementation MAY do, I think that should be either in an editor's note or in documentation.

I would rewrite the example above as simply the second sentence (disregarding that we're disallowing unknown keywords moving forward):

Unknown keywords SHOULD be treated as annotations, where the value of the keyword is the value of the annotation.

With this, unknown keywords are implicitly allowed. It doesn't make sense to collect unknown keywords as annotations if they're not allowed in the first place. The first sentence in the original adds nothing.

I'd like to take a pass before the initial stable release to go through all of the language and remove these "permissions" and just let the restrictions stand. This would move us into "the spec doesn't say I can't do this" territory, which I think is more open.

Also related to #922.

@gregsdennis
Copy link
Member Author

gregsdennis commented May 22, 2024

Another style consideration is that in several places, we add requirements on schema authors, e.g. "Schema authors SHOULD NOT use pointers that cross resource boundaries." Would it be better to instead place limitations of support on implementations, e.g. "Implementations SHOULD NOT support pointers that cross resource boundaries"?

By placing the requirement on the implementation, it forces schema authors to conform.

@jdesrosiers
Copy link
Member

I agree that what you call "restrictive language" is better. I advocate for this all the time, although I've never described it that way. I think the spec should be a lot like JSON Schema itself, by default anything is allowed and the constraints put limits on what implementations can do. For example, pointers crossing resource boundaries shouldn't be allowed or disallowed, just undefined. Implementations can handle this however they want, but schema authors shouldn't rely on that behavior. I think sticking to restrictive language could help with the bloat the spec has accumulated as well.

As for places where we put requirements on schemas authors, that's another thing that bothers me as well. We definitely shouldn't be placing requirements on schema authors. That doesn't make sense. The spec is for implementations.

@jviotti
Copy link
Member

jviotti commented Jun 20, 2024

For example, pointers crossing resource boundaries shouldn't be allowed or disallowed, just undefined

I never wrote specs myself, so wondering what is the benefit of leaving things open in this way? Seems to be that a lot of the benefit of JSON Schema results when interoperability is achieved, and these gray areas tend to be where people get really confused. If we can help it, wouldn't it be better to reduce undefined behaviour?

@gregsdennis
Copy link
Member Author

a lot of the benefit of JSON Schema results when interoperability is achieved

That's absolutely correct. We should minimize what we leave undefined. A specification defines a set of behaviors which an implementation must exhibit. That means that users of the implementation can rely on that set of behaviors. None of this is in question here.

This issue is about the language used to define the behavior. I think it makes more sense for the language to define restrictions rather than give permission.

Saying

A JSON Schema MAY contain properties which are not schema keywords.

is a permission, but saying

Implementations MUST NOT error when encountering properties which are not schema keywords.

is a restriction.

It's defining a "compliance box". Within the box, implementations are expected to behave a certain way. They can still operate outside of the box if they choose, but users shouldn't expect such operation to be interoperable because the spec doesn't address it.

@jdesrosiers said it well, I think:

I think the spec should be a lot like JSON Schema itself, by default anything is allowed and the constraints put limits on what implementations can do.

@jdesrosiers
Copy link
Member

I never wrote specs myself, so wondering what is the benefit of leaving things open in this way?

Sometimes it's just to not invalidate the behavior of existing, well established implementations. For example, in JSON, the behavior of duplicate keys in an object is undefined. Different implementations handled that in different ways and it's nonsense anyway, so saying it's undefined allows existing implementations to be compliant rather than insisting on a specific behavior for something that people couldn't use in a way that made sense anyway.

I personally, see the pointers crossing resource boundaries exactly that way. It's nonsense and people should never do it even if it happens to work, but requiring that it produce an error could require a significant change in the architecture of many existing implementations. Placing that burden on existing implementations isn't necessary for something that doesn't make sense anyway.

@jviotti
Copy link
Member

jviotti commented Jun 21, 2024

Makes sense!

I personally, see the pointers crossing resource boundaries exactly that way. It's nonsense and people should never do it even if it happens to work, but requiring that it produce an error could require a significant change in the architecture of many existing implementations. Placing that burden on existing implementations isn't necessary for something that doesn't make sense anyway.

Do we have a list of things that are possible but we think they are non-sense and people should never do? I would love to write linter rules for these

@awwright
Copy link
Member

The normative language could definitely be reviewed and streamlined for sure. The relevant specification is BCP 14. Though some amount of mixed language may be necessary. The most important purpose of the all-caps BCP 14 language is interoperability, and selection of prescriptive vs. proscriptive language will still be mixed when defining what will make the protocol or format interoperable or forward compatible. You will often find statements in complimentary pairs like "A validator MUST reject schemas that..." and "Schemas MUST NOT specify..." due to the fact that normative requirements usually target one party at a time; and truly prohibiting something necessitates normative language on both parties.

For example, some construct might be prohibited in schemas ("MUST NOT") because of known interoperability issues. But this doesn't impact validators; the specification might still permit validators to handle the construct, or it might require an error ("MUST reject"), because any new usage would harm interoperability. A prohibition on schemas doesn't imply what validators ought to do one way or the other.

@gregsdennis
Copy link
Member Author

I completely agree, @awwright. My concern isn't with the BCP 14 keywords, but rather how the requirements are defined.

Currently there's a mix of "implmentations MAY do X" and "implmenetations MUST do Y". In these phrases, "MUST" creates a boundary: a line that implementations can't cross (without operating outside of the spec). However, "MAY" is giving permission to do something.

The way I see it, if you define a boundary of behavior, implementations MAY do whatever they want within that boundary without any kind of explicit permission to do so. The only reason to use "MAY", then, is to allow for a behavior that exists outside of the defined boundary. I'd just prefer to define the boundary correctly from the beginning.

some construct might be prohibited in schemas ("MUST NOT") because of known interoperability issues

I also think we should avoid language that places requirements on schema authors. Such requirements have no benefit unless an implementation provides behavior to enforce it. So putting a requirement on the author necessarily creates an implicit requirement for the implementation, and such requirements can be difficult to identify. Instead, we should be defining direct requirements on implementations that enforce a particular behavior from authors. (e.g. "Don't create that construct because the implementation will error.")

So for this example, we'd put a requirement on the implementation to detect and disallow this construct. Schema authors will naturally fall in line. However, implementations could still offer an opt-in to support the construct, though. But that's the key for me: such behavior needs to be opt-in.

@gregsdennis
Copy link
Member Author

Another style consideration is that in several places, we add requirements on schema authors, e.g. "Schema authors SHOULD NOT use pointers that cross resource boundaries." Would it be better to instead place limitations of support on implementations, e.g. "Implementations SHOULD NOT support pointers that cross resource boundaries"?

By placing the requirement on the implementation, it forces schema authors to conform.

@awwright has a good comment here where he explains what it means to place a requirement on the schema author:

Practically speaking, it says if you don't follow this requirement, then interoperability with other implementations won't necessarily be guaranteed.

Maybe we just need a clarification somewhere that explicitly states that these requirements are not necessarily something for implementations to enforce but rather a note of caution for authors. It could even just be in the site docs or something; it doesn't need to be in the spec.

Still, I'm not optimistic about schema authors reading the spec. It does make sense to me for us to define behavior for the implementation that then drives authors to conform.

@jdesrosiers
Copy link
Member

I agree completely with what Austin described, but I don't think adding a clarification is the right thing. Let's remove the ambiguity altogether and not put requirements on schema authors. If there's a schema author requirement that we think should have a defined behavior, we should define that on the implementation. However, I think most things like this can either be removed entirely or phrased as informational rather than as requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Discussion
Development

No branches or pull requests

4 participants