-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define fragment identifiers for application/yaml #38
Conversation
@eemeli Can you please add an example YAML URI with fragment identifier, so that we could find a way to provide it? |
Co-authored-by: Roberto Polli <[email protected]>
Struggling a bit to figure out just what you're looking for here. For an example of how this would work, let's presume that we have %YAML 1.2
---
one: &foo scalar
two:
- some
- sequence
- &bar items Then, Do you want this sort of example to be included in the RFC? |
@eemeli probably it could be useful to add either an example section or a normative section that explicits that. An alternative could be to add this information in the YAML spec. WDYT? I imagine something brief like https://datatracker.ietf.org/doc/html/rfc6901#section-6 which includes some considerations on percent-encoding and some examples. A couple of question, for example:
|
Let's discuss this topic in the issue #21 |
The YAML spec already includes sections on Node Anchors and Alias Nodes, which then include some examples of them in use. The intent here is to defer to that spec's definition of alias nodes.
At the point where the YAML spec defines anchors and aliases, it's treating its input as a sequence of Unicode code points, i.e. it doesn't care about their encoding. The YAML 1.2 set of acceptable characters for these is tbh far too wide, as it allows for nearly all printable Unicode code points. |
foo: &però ciao
bar: *però reading https://www.rfc-editor.org/rfc/rfc3986#section-3.5 iiuc I need to %encode the (maybe related to ingydotnet/yaml-pm#127) |
|
## Fragment identification {#application-yaml-fragment} | ||
|
||
This section describes how to use | ||
named anchors (see Section 3.2.2.2 of [YAML]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using named anchors because they can be defined even when no alias is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this works for now. The intent of the original language was to account for an expected expansion of alias functionality in the YAML spec, which would allow with a document like this:
- &foo
bar:
- 1
- 2
- 42
for an alias *foo/bar/2
to point at the value 42
.
By referring directly to anchors here, such a later change to the YAML spec would not be reflected in the mediatype's fragment id.
To be clear, pathlike aliases are not yet valid and there's no fixed schedule for when we might get a YAML 1.3 spec out, so defining the mediatype according to current reality is an entirely valid thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eemeli thanks for the insight! Some question then
- is
/
a valid character for a named anchor? If/
is used as a pathlike separator, isn't using/
in named anchors problematic? - since keys can contain non-string characters, how can I address pathlike alias node such as
*foo/bar/1
or*fizz/buzz/baz
?
- &foo
bar:
1: "integer"
"1": "string"
- baz: *foo/bar/1
- &fizz
"buzz/baz": "a"
"buzz":
"baz": "b"
- roc: *fizz/buzz/baz
If we want to achieve publication quickly, I think that using "named anchors" is easier, It is always possible to amend the media type registration and the according fragment identifiers interacting directly with IANA.
It is ok to spend some time trying to use alias nodes, provided that:
- we need to specify that the fragment identifier "should be interpreted as an alias node": this is because a named anchor might not be referenced by an alias node;
- since
/
is a valid key character, we need to encode it properly like it is done in json pointers. This is probably valid independently on the fragment identifier; - for the sake of interoperability, I suggest to at least having an idea of how to handle the behavior of the above yaml document with the following fragments:
- file.yaml#foo
- file.yaml#foo/bar
- file.yaml#foo/bar/1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is
/
a valid character for a named anchor? If/
is used as a pathlike separator, isn't using/
in named anchors problematic?
Yes, it's a valid character, and yes it's potentially problematic. Not fatally so, though, as preferentially matching the longest substring allows for a deterministic and pretty sensible resolution. I've a prototypical implementation of how this could work here: eemeli/yaml#380.
- since keys can contain non-string characters, how can I address pathlike alias node such as
*foo/bar/1
or*fizz/buzz/baz
?
Badly. The resolution algorithm can end up pretty straightforward, but with degenerate cases like this that'll mean resolving one of the possible nodes while making the other one unaddressable via a pathlike alias. But it's possible to attach an anchor to a node, which circumvents the problem. That's also the solution for addressing e.g. nodes that are in a mapping and have a non-scalar key like { [ foo, bar ]: value }
.
If we want to achieve publication quickly, I think that using "named anchors" is easier, It is always possible to amend the media type registration and the according fragment identifiers interacting directly with IANA.
Yeah, that's why I said referring to anchors directly should be ok for now. They'll need to continue working in the future as well, and any changes should just make expressions that currently fail potentially start resolving, rather than changing the meaning of anything that's currently valid.
If multiple nodes would match a fragment identifier, | ||
the first such match is selected. | ||
|
||
Users concerned with interoperability of fragment identifiers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether anchor names allows all possible UTF-32, so here we suggest an interoperale behavior.
pyyaml for example only supports [a-zA-Z0-9\-_]+
for anchor names; I didn't test other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above snippet looks odd. we're working on a media type registration. the above text seems to define/describe behavior that hopefully is well-defined for the format, and if it's not, then that's too bad but nothing that a media registration should attempt to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the YAML spec, the allowed characters in ns-anchor-name is literally everything up to \x10FFFF
except for:
- C0 and C1 control codes, though the Next Line character
\x85
is allowed \x20 | ',' | '[' | ']' | '{' | '}'
- Surrogates
[\xD800-\xD8FF]
- the BOM character
\xFEFF
That range is rather silly, and as @ioggstream noted, not supported by all implementations. Sticking to [\w-]+
is indeed recommended for interoperability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do you think it's worth reducing the possible values of ns-anchor-name in a future YAML revision ? @dret 's comment is relevant.
- what happens if I have something like
*foo/bar/baz
?
- &foo
"bar/baz": "a"
"bar":
"baz": "b"
for example, json pointers encodes them in a special way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reducing the valid space of anchor names is definitely planned. I even wrote up a proposal for it (yaml/yaml-spec#64), but the spec update progress has been a bit stop-and-go-ish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eemeli it would be a major improvement :)
|
||
~~~ example | ||
%YAML 1.2 | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a bug in kramdown: it seems it can't process ---
in code blocks. I kludged it adding a space to each line. Alternatively, we could remove the %YAML header.
A YAML named anchor can be represented in a URI fragment identifier | ||
by encoding it into octects using UTF-8 {{!UTF-8=RFC3629}}, | ||
while percent-encoding those characters not allowed by the fragment rule | ||
in {{Section 3.5 of URI}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines taken from json pointer rfc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that read "referenced" instead of "represented"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote "represent" because the serialization is different from the YAML's one.
Merging and moving discussion in #41 |
Closes #21
This defines
application/yaml
fragment identifiers to be parsed as YAML aliases, which currently means that they must point to an explicitly defined anchor in the document, a feature natively supported by YAML.The definition intentionally allows for changes in later editions of the YAML spec to be automatically supported, e.g. as we're working towards supporting something like JSON pointers as well.
The language in the
+yaml
fragment identifier section seems a bit complex, and I'm not sure if it should be updated as well. Formatsxxx/yyy+yaml
should be allowed to define their own rules for fragment identifiers. Is this currently the case?