Define fragment identifiers for application/yaml #38

eemeli · 2022-04-05T09:32:03Z

Closes #21

This defines application/yaml fragment identifiers to be parsed as YAML aliases, which currently means that they must point to an explicitly defined anchor in the document, a feature natively supported by YAML.

The definition intentionally allows for changes in later editions of the YAML spec to be automatically supported, e.g. as we're working towards supporting something like JSON pointers as well.

The language in the +yaml fragment identifier section seems a bit complex, and I'm not sure if it should be updated as well. Formats xxx/yyy+yaml should be allowed to define their own rules for fragment identifiers. Is this currently the case?

ioggstream · 2022-04-05T16:02:27Z

@eemeli Can you please add an example YAML URI with fragment identifier, so that we could find a way to provide it?

draft-ietf-httpapi-yaml-mediatypes.md

Co-authored-by: Roberto Polli <[email protected]>

eemeli · 2022-04-05T18:00:28Z

Can you please add an example YAML URI with fragment identifier, so that we could find a way to provide it?

Struggling a bit to figure out just what you're looking for here. For an example of how this would work, let's presume that we have file.yaml with the following contents:

%YAML 1.2
---
one: &foo scalar
two:
  - some
  - sequence
  - &bar items

Then, path/to/file.yaml#foo would be pointing at the node with the value scalar, while path/to/file.yaml#bar would point to the node with the value items.

Do you want this sort of example to be included in the RFC?

ioggstream · 2022-04-05T18:10:14Z

@eemeli probably it could be useful to add either an example section or a normative section that explicits that. An alternative could be to add this information in the YAML spec. WDYT?

I imagine something brief like https://datatracker.ietf.org/doc/html/rfc6901#section-6 which includes some considerations on percent-encoding and some examples. A couple of question, for example:

yaml can be encoded in UTF-8, 16, 32. Can anchor/alias nodes identifier be non-ascii / non-utf8 encoded ?

ioggstream · 2022-04-05T18:14:17Z

The language in the +yaml fragment identifier section seems a bit complex, and I'm not sure if it should be updated as well.
Formats xxx/yyy+yaml should be allowed to define their own rules for fragment identifiers. Is this currently the case?

Let's discuss this topic in the issue #21

eemeli · 2022-04-05T19:23:48Z

@eemeli probably it could be useful to add either an example section or a normative section that explicits that. An alternative could be to add this information in the YAML spec. WDYT?

The YAML spec already includes sections on Node Anchors and Alias Nodes, which then include some examples of them in use. The intent here is to defer to that spec's definition of alias nodes.

yaml can be encoded in UTF-8, 16, 32. Can anchor/alias nodes identifier be non-ascii / non-utf8 encoded ?

At the point where the YAML spec defines anchors and aliases, it's treating its input as a sequence of Unicode code points, i.e. it doesn't care about their encoding. The YAML 1.2 set of acceptable characters for these is tbh far too wide, as it allows for nearly all printable Unicode code points.

ioggstream · 2022-04-05T21:53:55Z

it's treating its input as a sequence of Unicode code points

foo: &però ciao
bar: *però

reading https://www.rfc-editor.org/rfc/rfc3986#section-3.5 iiuc I need to %encode the però string, right? In this case I am not sure how this should work with UTF-8, 16, 32... Can you make some examples?

(maybe related to ingydotnet/yaml-pm#127)

ioggstream · 2022-04-12T12:37:01Z

## Fragment identification {#application-yaml-fragment}

This section describes how to use
named anchors (see Section 3.2.2.2 of [YAML])
as fragment identifier to designate a node.

A YAML named anchor can be represented in a URI fragment identifier
by encoding it into octects using UTF-8 {{!UTF-8==RFC3629}},
while percent-encoding those characters not allowed by the fragment rule
in {{Section 3.5 of URI}}. 

If multiple nodes would match a fragment identifier,
the first such match is selected.

Users concerned with interoperability of fragment identifiers:

- SHOULD limit named anchors to a set of characters
  that do not require encoding 
  to be expressed as URI fragment identifiers:
  this is always possible since named anchors are a serialization
  detail;
- SHOULD NOT use a named anchor that matches multiple nodes.

In the example resource below, the URL `file.yaml#foo`
references the anchor `foo` pointing to the node with value `scalar`;
whereas
the URL `file.yaml#bar` references the anchor `bar` pointing to the node
with value `[ some, sequence, items ]`.

~~~ example
%YAML 1.2
---
one: &foo scalar
two: &bar
  - some
  - sequence
  - items
~~~

ioggstream · 2022-04-12T15:04:57Z

draft-ietf-httpapi-yaml-mediatypes.md

+## Fragment identification {#application-yaml-fragment}
+
+This section describes how to use
+named anchors (see Section 3.2.2.2 of [YAML])


Using named anchors because they can be defined even when no alias is defined.

Sure, this works for now. The intent of the original language was to account for an expected expansion of alias functionality in the YAML spec, which would allow with a document like this:

- &foo bar: - 1 - 2 - 42

for an alias *foo/bar/2 to point at the value 42.

By referring directly to anchors here, such a later change to the YAML spec would not be reflected in the mediatype's fragment id.

To be clear, pathlike aliases are not yet valid and there's no fixed schedule for when we might get a YAML 1.3 spec out, so defining the mediatype according to current reality is an entirely valid thing to do.

@eemeli thanks for the insight! Some question then

is / a valid character for a named anchor? If / is used as a pathlike separator, isn't using / in named anchors problematic?

since keys can contain non-string characters, how can I address pathlike alias node such as *foo/bar/1 or *fizz/buzz/baz ?

- &foo bar: 1: "integer" "1": "string" - baz: *foo/bar/1 - &fizz "buzz/baz": "a" "buzz": "baz": "b" - roc: *fizz/buzz/baz

If we want to achieve publication quickly, I think that using "named anchors" is easier, It is always possible to amend the media type registration and the according fragment identifiers interacting directly with IANA.

It is ok to spend some time trying to use alias nodes, provided that:

we need to specify that the fragment identifier "should be interpreted as an alias node": this is because a named anchor might not be referenced by an alias node;

since / is a valid key character, we need to encode it properly like it is done in json pointers. This is probably valid independently on the fragment identifier;

for the sake of interoperability, I suggest to at least having an idea of how to handle the behavior of the above yaml document with the following fragments:

file.yaml#foo

file.yaml#foo/bar

file.yaml#foo/bar/1

is / a valid character for a named anchor? If / is used as a pathlike separator, isn't using / in named anchors problematic?

Yes, it's a valid character, and yes it's potentially problematic. Not fatally so, though, as preferentially matching the longest substring allows for a deterministic and pretty sensible resolution. I've a prototypical implementation of how this could work here: eemeli/yaml#380.

since keys can contain non-string characters, how can I address pathlike alias node such as *foo/bar/1 or *fizz/buzz/baz ?

Badly. The resolution algorithm can end up pretty straightforward, but with degenerate cases like this that'll mean resolving one of the possible nodes while making the other one unaddressable via a pathlike alias. But it's possible to attach an anchor to a node, which circumvents the problem. That's also the solution for addressing e.g. nodes that are in a mapping and have a non-scalar key like { [ foo, bar ]: value }.

If we want to achieve publication quickly, I think that using "named anchors" is easier, It is always possible to amend the media type registration and the according fragment identifiers interacting directly with IANA.

Yeah, that's why I said referring to anchors directly should be ok for now. They'll need to continue working in the future as well, and any changes should just make expressions that currently fail potentially start resolving, rather than changing the meaning of anything that's currently valid.

ioggstream · 2022-04-12T15:07:55Z

draft-ietf-httpapi-yaml-mediatypes.md

+If multiple nodes would match a fragment identifier,
+the first such match is selected.
+
+Users concerned with interoperability of fragment identifiers:


I am not sure whether anchor names allows all possible UTF-32, so here we suggest an interoperale behavior.

pyyaml for example only supports [a-zA-Z0-9\-_]+ for anchor names; I didn't test other implementations.

the above snippet looks odd. we're working on a media type registration. the above text seems to define/describe behavior that hopefully is well-defined for the format, and if it's not, then that's too bad but nothing that a media registration should attempt to change.

According to the YAML spec, the allowed characters in ns-anchor-name is literally everything up to \x10FFFF except for:

C0 and C1 control codes, though the Next Line character \x85 is allowed

\x20 | ',' | '[' | ']' | '{' | '}'

Surrogates [\xD800-\xD8FF]

the BOM character \xFEFF

That range is rather silly, and as @ioggstream noted, not supported by all implementations. Sticking to [\w-]+ is indeed recommended for interoperability.

@eemeli

do you think it's worth reducing the possible values of ns-anchor-name in a future YAML revision ? @dret 's comment is relevant.

what happens if I have something like *foo/bar/baz ?

- &foo "bar/baz": "a" "bar": "baz": "b"

for example, json pointers encodes them in a special way

Yes, reducing the valid space of anchor names is definitely planned. I even wrote up a proposal for it (yaml/yaml-spec#64), but the spec update progress has been a bit stop-and-go-ish.

@eemeli it would be a major improvement :)

ioggstream · 2022-04-12T15:09:17Z

draft-ietf-httpapi-yaml-mediatypes.md

+
+~~~ example
+ %YAML 1.2
+ ---


there's a bug in kramdown: it seems it can't process --- in code blocks. I kludged it adding a space to each line. Alternatively, we could remove the %YAML header.

ioggstream · 2022-04-12T15:10:02Z

draft-ietf-httpapi-yaml-mediatypes.md

+A YAML named anchor can be represented in a URI fragment identifier
+by encoding it into octects using UTF-8 {{!UTF-8=RFC3629}},
+while percent-encoding those characters not allowed by the fragment rule
+in {{Section 3.5 of URI}}. 


Lines taken from json pointer rfc

shouldn't that read "referenced" instead of "represented"?

I wrote "represent" because the serialization is different from the YAML's one.

draft-ietf-httpapi-yaml-mediatypes.md

ioggstream · 2022-05-12T11:54:58Z

Merging and moving discussion in #41

Define fragment identifiers for application/yaml

b1103ff

eemeli requested a review from ioggstream April 5, 2022 09:32

Use first match when alias targets are not unique

9857e8e

ioggstream reviewed Apr 5, 2022

View reviewed changes

draft-ietf-httpapi-yaml-mediatypes.md Outdated Show resolved Hide resolved

Update draft-ietf-httpapi-yaml-mediatypes.md

8eab8ac

Co-authored-by: Roberto Polli <[email protected]>

ioggstream added 2 commits April 12, 2022 14:42

Fix: ietf-wg-httpapi#21. Fragment identifiers.

c387b6b

typos

d712a47

ioggstream reviewed Apr 12, 2022

View reviewed changes

ioggstream requested a review from dret April 12, 2022 15:10

ioggstream added the yaml label Apr 12, 2022

dret reviewed Apr 22, 2022

View reviewed changes

draft-ietf-httpapi-yaml-mediatypes.md Outdated Show resolved Hide resolved

Update draft-ietf-httpapi-yaml-mediatypes.md

55a95ae

ioggstream merged commit 4ff8647 into ietf-wg-httpapi:main May 12, 2022

eemeli deleted the anchor-fragments branch May 12, 2022 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define fragment identifiers for application/yaml #38

Define fragment identifiers for application/yaml #38

eemeli commented Apr 5, 2022

ioggstream commented Apr 5, 2022

eemeli commented Apr 5, 2022

ioggstream commented Apr 5, 2022 •

edited

Loading

ioggstream commented Apr 5, 2022

eemeli commented Apr 5, 2022

ioggstream commented Apr 5, 2022 •

edited

Loading

ioggstream commented Apr 12, 2022 •

edited

Loading

ioggstream Apr 12, 2022

eemeli Apr 23, 2022

ioggstream Apr 25, 2022 •

edited

Loading

eemeli Apr 26, 2022 •

edited

Loading

ioggstream Apr 12, 2022

dret Apr 22, 2022

eemeli Apr 23, 2022

ioggstream Apr 25, 2022 •

edited

Loading

eemeli Apr 26, 2022

ioggstream Apr 27, 2022

ioggstream Apr 12, 2022

ioggstream Apr 12, 2022

dret Apr 22, 2022

ioggstream Apr 25, 2022

ioggstream commented May 12, 2022

Define fragment identifiers for application/yaml #38

Define fragment identifiers for application/yaml #38

Conversation

eemeli commented Apr 5, 2022

ioggstream commented Apr 5, 2022

eemeli commented Apr 5, 2022

ioggstream commented Apr 5, 2022 • edited Loading

ioggstream commented Apr 5, 2022

eemeli commented Apr 5, 2022

ioggstream commented Apr 5, 2022 • edited Loading

ioggstream commented Apr 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ioggstream Apr 25, 2022 • edited Loading

Choose a reason for hiding this comment

eemeli Apr 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ioggstream Apr 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ioggstream commented May 12, 2022

ioggstream commented Apr 5, 2022 •

edited

Loading

ioggstream commented Apr 5, 2022 •

edited

Loading

ioggstream commented Apr 12, 2022 •

edited

Loading

ioggstream Apr 25, 2022 •

edited

Loading

eemeli Apr 26, 2022 •

edited

Loading

ioggstream Apr 25, 2022 •

edited

Loading