-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Percent-decode each path fragment. #51
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we made things way more complicated than they needed to be with the colon syntax, and we should remove it while we have the chance. See issue #52.
Looks like you found a typo in that example (and I saw at least one more). The value MUST be percent encoded, so there would be no slashes there. The normative definition for
https://www.w3.org/TR/did-core/#did-parameters In other words, using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we would encourage folks to include :
in url path names... that seems like a really bad idea.
See also: https://stackoverflow.com/questions/1856785/characters-allowed-in-a-url
Here is the code we use to handle this today: https://github.com/transmute-industries/verifiable-data/blob/main/packages/did-web/src/convertDidToEndpoint.ts
Ping for reviews / feedback otherwise will close in a week or 6 months. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement on the current spec. Note that there will be a PR to deprecate the colon syntax and prefer just straight HTTP URL translation in time.
@OR13 - any objections to this PR? |
This PR results in asymmetric handling of URL paths with percent-encoded characters. To avoid breaking those URLs, those would need double-encoding. I think we're better off scrapping the colon-delimited scheme. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax breaks interop with the PATH parameter in a DID URL, and needlessly introduces breaking changes, unless a substantial number of existing implementers come forward planning to support this, i suggest we close.
As of today, Transmute does not intend to implement this breaking change.
Similarly, mesur.io does not intend to implement this or other breaking changes |
Cross posted to did core, w3c/did-core#821 removing pending close label. |
I don't understand what this PR has to do with either the path in a DID URL, or with relativeRef. This only seems to be about the method-specific-id of the DID, not anything else in a DID URL. I think the PR is fine. |
I don't think addressing this issue requires a breaking change to the current did:web method. IMO it could be addressed by simply adding some language saying something like "Colon characters MUST NOT be used in path elements for the target HTTPS URL". |
Percent-decode each path fragment. (Note that although URI paths are technically | ||
allowed to contain <code>:</code> characters, doing so is not recommended, and | ||
may hinder interoperability.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't percent-decoding each path part mean that percent-encoded non-URL-safe characters in the DID would end up not being representable in the HTTPS URL (because they need to be percent-encoded)? These would then have to be double-percent-encoded in the DID.
e.g.
did:example:example.org:dir:space%20bar
→ https://example.org/dir/space bar.json
did:example:example.org:dir:space%25%20bar
→ https://example.org/dir/space%20bar.json
while I would think the desired case would be:
did:example:example.org:dir:space%20bar
→ https://example.org/dir/space%20bar.json
Would it make more sense to just say percent-decode the colon character in each path part, similar to how that is done for the domain part in the previous step? i.e. replace occurrences of %3A
with :
in each path part.
Maybe saying "path part" instead of "path fragment" could be more clear, so that the term fragment could be used more consistently?
@peacekeeper wrote:
DID:WEB rewrites https URL paths to fit them into the It would be preferable to handle DID URL The same logic applies to |
For the most part, the logic should align with RFC3986, and special behavior does not need to be defined. Instead, you would be better off referencing the relevant parts of that RFC and defining test vectors. E.g. on Percent-Encoding:
Typically a percent-encoded character has no distinct semantic meaning if the encoded characters did not meet this requirement and could have been representable without encoding. However, this determination is typically done as part of the interpretation of the URL, possibly with some language library help. For example, the use of As such, I would recommend the following behavior for converting a
For the most convoluted example I can manufacture
|
No it doesn't. Do you have a reference where you found this information?
That's actually how it is. A DID URL with a path can be used to locate any type of resource, even an image, arbitrary JSON data, PDF, etc. And the fragment is used to reference a secondary resource that is part of, or related to, the primary resource. |
https://hyperledger.github.io/indy-did-method/#did-urls-for-indy-object-identifiers |
I'm in favor of implementing @dwaite 's suggestion, I would approve a PR that implements it and provides test vectors. I don't think we should address each encoding issue that deviates RFC 3986 in a 1 off section of text in a CCG draft, that will quickly lead to a painful spec, that does not make good use of existing normative references or provide concrete test vectors for proving interop. |
Or we could replace that complex algorithm (which, don't get me wrong is impressive, @dwaite) with this: Convert to did:web:
Convert to https:
... and get rid of all of the unnecessary complexity and deviation from RFC 3986. :) |
I never really understood why people invented did:web in the first place instead of just using https:// |
Because we needed:
In theory, we could:
I expect trying those things will be messier than just getting did:web right. :) |
Perhaps you should register did https? |
DID URLs are not supported in the |
As @msporny pointed out, the
DID-URL What I think is causing some confusion is that the resource involved in a DID URL is NOT the DID Document (as implied by @gribneau). For example,
The meaning of fragment, path, and query parts is up to the DID Method to define, as long as their representation of those parts in the DID-URL itself is consistent with RFC3986. To wit, with did:cosmos DID URLs for IID resources are "locating the resource", which you might think of that as "in the DID (namespace)", in the same way that normal web resources are "in the website's namespace". Note that "normal" URLs point to different resources by having different paths. DID URLs, especially how they are used in did:cosmos (and other IIDs) match this behavior exactly. Each DID-URL with a path part can separately point to different resources, just like regular URLs. I believe Orie is just fixing a round-trip encoding algorithm problem with did:web in particular, which has nothing to do with did-core. So let's not conflate the resource of a DID-URL with the Subject of the DID nor the DID Document. The resource of a DID-URL is defined within the context of the DID and likely declared or presented within the DID Document, but they can refer to ANY resource, not just the DID Document. |
I am not confused @jandrieu, I simply disagree with the core specification's interpretation of RFC 3986. @msporny wrote:
This cannot currently happen because the DID Subjects of both of these would be
The limitation is imposed by section 5.1.1. In the presence of that limitation, the handling of
In contrast, both path and query "serve to identify a resource", while the authority preceding them does not identify a resource at all. It is unfortunate that the |
@gribneau wrote:
Yep, @gribneau is correct... several technical issues:
In other words, use the rules in the did:web spec to transform these HTTPS URLs into did:web DID URLs and back out again:
I suggest that there are no rules in the did:web spec that tell you how to round trip those URLs. If someone knows of any, please point them out to me 'cause I can't find them in the spec. One interpretation is that
... but it seems like some are saying (without this current PR), no, the proper encoding of those URLs is actually this:
which, when going back through round tripping (per the rules in the current did:web specification) would be turned into these HTTPS URLs:
The former approach seems to work (and is undocumented) the latter approach is broken (note all the crazy slashes that exist in the URL that shouldn't be there... this seems to be what some in this thread are suggesting the current spec states). Or multiple variations of different interpretations in between. What am I missing? Can someone round-trip those URLs for me in a way that is consistent? |
IMHO, the problem is that the usage of DID path is not restrictive enough - there isn't currently a way to differentiate a DID URL path that conforms to the described behavior of the method on DID "CRUD" operations. The resource might instead be hosted content or a service, and may support additional interactions outside the DID resolution definition. |
@msporny wrote:
I agree. This is an easy fix. It was, however, discussed and rejected prompting some of what we have now.
The The create section is clearly inadequate as well. Specific steps should be provided in addition to the handful of examples provided.
I don't disagree. It is, at best, an 80% solution today. |
Hrm, rejected by whom? Speaking as the lead DID Core spec editor, I don't recall that we've made any consensus decisions of the sort. That idea is still very much alive and well, IMHO. The best way to address the problem in DID Core, however, is to get folks to admit it's a problem here and then bring that problem back to DID Core (with a fairly simple errata fix to the spec noting that we plan to expand
Agreed.
You're being generous. :) Shipping specs with known bugs, especially ones as big as this, are a standards anti-pattern. @dwaite wrote:
Can you explain this a bit more, @dwaite?
... and a bit more detail on this one as well, please? |
@msporny wrote:
Is that a reference to did:web or the core? The only required elements for a URI are Given that a URI identifies a resource, and given that |
I think it would be okay change this, i.e. also allow DID URLs for "id", but I wouldn't go as far as calling it "errata". We did have some discussions about this in the WG, and I think there were also legitimate opinions for now allowing it. Found the following older issues that could be relevent: |
I think this notion would fundamentally break the semantic architecture of DIDs. The DID Document represents the verification relationships (and methods) and service endpoints for an identifier, the DID. The DID represents the authority part of the RFC3986 standard URL syntax. As such, the DID Document is the metadata that represents assertions by the authority for interacting with that DID, including identifiers within that DID namespace as delineated by DID URLs. It makes for a straightforward resolution process that parallels DNS quite nicely. You have a DID or DID URL
If you allow a DID Document's ID to be for a particular DID URL, then how do you look up the DID Document for the DID part of that DID URL? In other words, if It may be that the way you are hoping to use these path parts is an instance of 'turtles all the way down'. The fact is, the turtles have to stop somewhere. That somewhere is the authority part. That's where the buck stops. That authority part presents the necessary metadata as the DID Document for that authority. It is NOT the metadata for the DID URL resource. Just the metadata for the DID. The way we (with IIDs) use DID URLs to reference IID Resources and IID References, as defined in the DID Document, allows us to add per-resource meta-data as appropriate. This may be a better way for you to think through your solution. http://w3id.org/earth/identifiers IMO, it would be a colossal error to violate this separation of responsibilities and allow DID Documents to have IDs that are DID URLs. Can you describe the use case that requires this feature? What's the value-adding interaction as user would get out of this feature? |
The Web. :)
The ability to serve two different resources from the same authority -- which is what the Web does.
You use a resolver and use whatever it gives back to you. Remember, DID Methods are what determine what you get back when you resolve something. Let's take an example: RESOLVE Plug that into a resolver and you might get a DID Document that looks like this:
That's one subject... but try this and you might get nothing ( RESOLVE ... but try this and you might get the authority (aka DNS domain) DID Document: RESOLVE
DID Core does not allow for that fairly sane thing to happen today... that's the error we made in the DID WG. |
That's not a use case. That's a platform. Which already works great. We aren't recreating the web. We are creating something different, or at least expanding the web in new directions. Again, what's the value-added use case? What user does what to get what value?
Why on earth would resolving that give you a DID Document with that as the id? It wouldn't. It would give you a DID Document with DID URLs do not and never have resolved to DID Documents. They dereference to resources. Now, I can, as I described in a different answer, define DIDs resolve to DID Documents. Full stop. |
If I were to accept your interpretation of DID Core, then we have created something that is incompatible with large portions of the Web. :) At this point, I expect that you haven't actually read the algorithms in the DID Web Method spec... knowing you (at some level), I expect you'd be just as confused as I am if you were to read the text in the method specific id and the Read section of the spec. You would probably see that the method specific id is defined incorrectly (as it allows two different
The ability to publish multiple DID Documents on a single DNS domain without using this weird/broken colon-path syntax that did:web uses (that is clearly broken and not round-trippable in the spec, per the comments above).
Weird, I have never thought that that's where we were going with DID Core. :P
Where does DID Core state that DID URLs can never resolve to DID Documents?
... and those resources might be DID Documents themselves. :)
Citation required. :) Here are the citations that back up my point, which is that a DID URL can be resolved to a DID Document, like Section 7.2 DID URL Dereferencing:
So, the DID Core spec is either internally inconsistent or tragically limiting -- the "tragically limiting" perspective says that you can use a DID URL to get a DID Document, but when you get that DID Document, the identifier isn't going to be for the resource you fetched! |
Interestingly, there is a path forward here without changing anything. 5.1.1 requires the DID Subject to conform with 3.1, which in turn asserts that RFC3986 controls. RFC3986 3.3 provides that:
It seems, then, that these are equivalent:
|
Unfortunately, that's a hyperbolic and disingenuous response. On the one hand, of course DIDs are incompatible with large portions of the web: not a single browser supports them. On the other hand, you provide no explanation of what this means. It is an empty attack without foundation. What parts of the web are now broken?
That's funny. I would expect that opposite in that none of the examples you've used use the current syntax for did:web. There are not two different I understand the round-trip problems Orie is attempting to fix and to my initial analysis, he is correct that %encoding colons is the simple fix. None of that has anything to do with did-core. Yes. We should fix did:web. But did:core has a particular and distinct differentiation between DIDs and DID URLs. You may recall that I warned you, @talltree, and @peacekeeper that the term DID URL is going to confuse people. People will see DID URLs and expect them to be DIDs. I don't have a better term for DID URLs, but I believe your argument is an excellent example of the problem I raised back then: even an editor of the DID Core Specification is confusing the two.
I'm sorry, but that still isn't a use case. It's just a broken round-trip algorithm for a particular DID method. Once did:web fixes it with encoding, we are good to go. You may be frustrated to have to encoded your colons in did:web, but that's no more relevant than the frustration I've had debugging web apps and having to figure out when to use URL encoding and when not to, especially when the different parts of URLs have different encoding rules. It's sometimes complicated. But "not doing the rigorous thing you need to do to make it work" is not itself a use case. If you percent encode your colons, did:web works just fine.
Weird, I would have thought you understood DID Core.
DID Core never states that DID URLs resolve to anything.
Yes. But they are not the DID documents of the DID in the DID URL. They could refer to anything, any resource. But that basically doesn't mean anything in this context. What we care about is the DID document that is returned from resolution of a DID. There is no resolution defined for a DID URL.
Here you go:
Note that DIDS are resolvable. In contrast, DID URLs locate particular resources. Other statements about DID URLs:
Note that fragment, path, and query are ONLY defined as part of the DID URL. Not part of the DID.
Section 3.2 DID URL syntax https://www.w3.org/TR/did-core/#did-url-syntax
Section 3.2.1 DID Parameters
Note that the parameter affects the resource identifier, the DID URL, not the DID. TL;DR:After an exhaustive search through the spec, only DIDs "resolve". DID URLs are "dereferenced".
This is not a statement about resolution. This is a statement about dereferencing. I think we are all agreed that a DID URL dereferences to a specific resource. It doesn't resolve to that resource. Rather, the DID part of the DID URL is resolved to a DID Document which can then be used to dereference to the actual resource:
It's the DID that is resolved. Not the DID URL
It is neither. The DID Core spec is exceptionally consistent on this issue. It's the unfortunately conflation of DIDs & DID URLs on the one hand and resolving & dereferencing on the other. Fortunately, the specification is consistent on this, but it is, understandably, a challenge to keep all of this straight. To return to your initial example That DID URL has the following parts {
scheme : "did",
method : "web",
method-specific-id : "subject.example",
path : "/people/jane"
} The DID for this DID URL is Which will resolve to a DID document at However, what I think you probably wanted to do was to resolve to a DID document at See Example 4 https://w3c-ccg.github.io/did-method-web/#example-creating-the-did-with-optional-path as well as Section 2.5.4 Optional Path Considerations https://w3c-ccg.github.io/did-method-web/#optional-path-considerations The resource referred to by My advocacy is to use linkedResource property from IIDs and did:cosmos. Paths in did:cosmos DID URLs refer to resources defined in a linkedResource section. However, this is an exceptionally new property. I've been hoping to get a demonstrable implementation in place before adding it to the DID Spec Registries, but it is in use in the IID spec and did:cosmos and I think the IID Reference and IID Resource approach taken by the IID spec is a superior pattern for avoiding the type of confusion this Github issue has illuminated. It is also worth noting that colons are already escaped in the did:web method-specific-id for port specification in the authority part of the encoded web URL. See Example 5 https://w3c-ccg.github.io/did-method-web/#example-creating-the-did-with-optional-path-and-port So, Orie's solution is minimal, effective, and in-line with existing processes for other restricted characters. It's unfortunate that the percent encoding was restricted to the colon used for port specification, but it's an easy fix. Which this PR does. No changes needed to did-core. Just some upskilling on the distinctions between DIDs / DID URLs and resolving / dereferencing. |
Replace ":" with "/" in the method specific identifier to obtain the fully | ||
qualified domain name and optional path. | ||
</li> | ||
<li> | ||
If the domain contains a port percent decode the colon. | ||
If the domain contains a port, percent decode the colon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder if we can keep the port part, and scratch the rest of this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, the spec already mentions ports ("A port MAY be included and the colon MUST be percent encoded to prevent a conflict with paths.") So this PR is mainly about the path part.
If the domain contains a port, percent decode the colon. | ||
</li> | ||
<li> | ||
Percent-decode each path fragment. (Note that although URI paths are technically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing this change suggestion (move to an issue / tackle in a subsequent pr).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main point of the PR tho..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitrizagidulin can you split the "percent encoding of the port" from the "percent encoding of the path" parts.... I think this will be easier to merge in 2 steps.... also please ping me on anything that sits this long.
Please move "changes to resolution discussions" to issues, and keep this PR focused on adding percent encoding. |
Is this moving forward? The reason we need to interpolate the path into the method specific identifier with colons, which then requires percent-encoding in some cases, is because core violates RFC3986 by asserting that a URI path does cannot be used to identify the primary resource. We would be better off recognizing that as errata and leaving the handling of path under the control of the method, including the decision of whether to use it to identify the primary resource. |
@gribneau This PR has not moved forward... thats why I am trying to help it along. Lets not conflate rule change to percent encoding of Let's continue to discuss the I suggest you comment on DID Core repo regarding errata for that spec, feel free to cross link here. |
Co-authored-by: Charles E. Lehner <[email protected]>
Looking at the the amount of pushback to this PR (with regards to percent-decoding the path part), and the fact that it's addressing a very niche case (not actually a problem, in other words), I'd like to close it. |
Extracted from the discussion in PR #47.
The current spec contains an edge case that prevents round-trip conversion between
did:web
URIs and HTTPS URIs.Currently:
With this PR:
Note that using
:
characters in your web URL paths is not at all recommended by the authors of this spec. However, since it is allowed by current URL rules, it's important that we address this corner case.