Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up herd privacy issues #539

Open
jandrieu opened this issue Jan 11, 2021 · 19 comments
Open

Clean up herd privacy issues #539

jandrieu opened this issue Jan 11, 2021 · 19 comments
Assignees
Labels
class 2 Changes that do not functionally affect interpretation of the document discuss Needs further discussion before a pull request can be created p1 high priority

Comments

@jandrieu
Copy link
Contributor

Continuing a conversation from PR #480

Section 10.5 Herd Privacy https://w3c.github.io/did-core/#herd-privacy says

When a DID subject is indistinguishable from others in the herd, privacy is available. When the act of engaging privately with another party is by itself a recognizable flag, privacy is greatly diminished. DIDs and DID methods need to work to improve herd privacy, particularly for those who legitimately need it most. Choose technologies and human interfaces that default to preserving anonymity and pseudonymity. To reduce digital fingerprints, share common settings across requesting party implementations, keep negotiated options to a minimum on wire protocols, use encrypted transport layers, and pad messages to standard lengths.

However, there is some debate about what herd privacy means and how to apply it to DIDs.

This issue is for developing a consensus definition, after which one or more PRs are expected to be proposed to provide corrections to the DID specifications.

Herd privacy works exactly to the extent that a given feature applies to all DIDs and DID Subjects. That's the herd. When a feature applies to just certain DID Subjects, that enables privacy penetration that would not be possible if true herd privacy exists. When the herd is separable into distinguishable segments, it is possible to discern unintended details about each segment, causing privacy leaks.

For comparison, consider IP v4 addresses which, have excellent herd privacy. Every public IP address is structured and interpreted the same way and reveal nothing about who owns the machine responding to that IP address, what applications are running on that IP, or what type of hardware or network is running at the endpoint. In fact, you can't even tell if that IP address is just a first hop in a forwarding process that ends up with some other machine—with a different IP address—actually doing the work of composing a response.

The exceptions to herd privacy with IP addresses are informative.

First, there are a few, special, functionally unique private addresses are which do behave differently. There are the private identifier subnets like 192.168., link-local address like 169.254.*, and localhost 127.0.0.0. If the IP address is one of those exceptions, you know something about the initial destination (it's on a private subnet, its likely on a network without DHCP or an assigned IP, and its running on the same machine), but NOTHING else. You still don't know who owns it, what machine is running there, or what applications might be running on it.

Second, the process of assigning IP addresses has a historical legacy that makes it possible to guess, with some level of accuracy who owns a given IP, or at least who secured that IP address from IANA, and often what region of the world that party is from. This was originally done to simplify IP-based routing and the bureaucratic overhead of issuing IP numbers. These optimizations are not part of the IP spec, but rather a management decision that has ongoing consequences.

Third, the inevitable visibility of IP addresses on the network—you can't route an IP packet without looking at the destination header—means that network analysis can, to some level of accuracy, identify the geographic destination of IP addresses. As a result, there are numerous directory services that provide exactly this functionality. They don't always get it right, especially when Network Address Translation is used by large aggregators, but it is a privacy leak with the design.
VPNs, TOR, NATs, and other approaches help mitigate these problems. However, IP addresses sometimes can and ARE used to de-anonymize parties and, in some cases must be treated a PII or personal data.

How does this apply to DIDs?

For DIDs to be a privacy-respecting technology, it is imperative that DIDs for different Subjects remain indistinguishable from each other, modulo the functional mechanisms necessary to establish proof of control over the unique identifiers themselves. You should not be able to tell, by looking at a DID, a DID-URL, or a DID Document, that the Subject of that DID is a particular person or organization, nor what type of entity it is: a human, a corporation, or an inanimate object.

The community behind DIDs already established a privacy-respecting way for assertions about Subjects to be made and managed, Verifiable Credentials (VCs). VCs allow anyone to say anything about any Subject and, thanks to Verifiable Presentations (VPs), there is, in the specification, a concrete mechanism to ensure that reliance upon a given assertion is consented to by the holder. When the holder signs a VP, it establishes that a party who controls the identifier of the Subject consents to its use for some purpose under some terms.

DIDs by their nature don't have that. DID resolution, like IP addresses, requires that DIDs, DID-URLs, and DID Documents be visible and resolvable by anyone. Yes, you can, in theory, add a privacy layer for "private" DIDs whose DID Document is only retrievable after some form of authentication, authorization, or consent protocol. However, those very mechanisms cannot be validated externally; they create a dependency on a trusted endpoint for deciding who does or does not get the cryptographic material associated with a given identifier. In order for DIDs to realize their goal of decentralization, it MUST be possible to retrieve the cryptographic material that secures subsequent interactions WITHOUT reliance on a trusted third party.

Yes, one could look at did:peer, did:key, and did:schema as counter examples, but those are all mechanisms that are either not publicly resolvable (and therefore unsuitable for cross-context verification like VCs), not updatable (without explicit communication to all parties using the identifier, which may be impossible), or they don't provide a means to demonstrate proof-of-control at all. Like the exceptions to public IP addresses, these methods demonstrate ways that you can work within the DID spec to recreate existing paradigms like PGP, public/private keypairs, and CIDs, rather than innate capabilities that are necessary and applicable to all DID Methods.

Which is fine. If you want your DID Method to be for Subjects of a particular nature, such as a DID Method that ALWAYS represents cars, go for it. DID Methods are free to violate herd privacy in this manner. But DID Core should not.

Returning to the IP example, there are services that will, based on publicly available information, map an IP address to a specific geographic location with some level of accuracy and precision.
These services highlight a flaw in IP addresses' herd privacy that could have been avoided. This is a lesson we must take to heart.

What can be scraped, will be.

What can be observed on the network, will be.

These threat vectors MUST be included in the privacy analysis of everything that goes into DID Core.

What can be provided at another layer, should be.

We have a moral obligation to avoid these sorts of privacy problems whenever possible. Herd privacy is how we do that. Herd privacy is violated directly proportionally to the variability that can be used to separate the herd.

As such, we must bring the DID Spec into alignment with principles of herd privacy, while allowing DID Methods to innovate and extend.

DID Core is a meta-standard that enables unbounded extensibility. The burden is on us to make sure that the foundation is privacy respecting. Only those properties and features which are necessary and appropriate should be enshrined in DID Core. Because DID Methods have the freedom to innovate, THAT is where potentially risky or harmful ideas are best tried out. In contrast, properties in DID Core literally define commonality and best practices, which, even when "optional" encourage new DID Methods to use those features. This amplification effect is why it is so imperative that we minimize the potential harms from features of DID Core. If we do not, we will actively encourage DID Method designers to adopt harmful practices.

Finally, I want to directly refute a claim made by one of the DID Core spec editors @talltree:
#480 (comment)

insisting that herd privacy must apply to all DIDs is going to far

This reflects a fundamental misunderstanding of how herd privacy works. Herd privacy ONLY works when it applies to all DIDs equally. Any differentiation between classes of DIDs directly from features defined in DID Core undermines herd privacy.

Discussion welcome.

@agropper
Copy link
Contributor

I agree with @jandrieu analysis and suggestions but there's an even bigger risk than herd privacy if DID Core allows things that could be done in DID methods.

Governments, employers, and manufacturers have the power to decide which DID methods they will accept. They are the sovereigns by virtue of their asymmetric power in most cases. We are familiar with US services that ask for your social security number or with Indian Aadhaar identifiers being required for access to non-government services. When we weaken the privacy protections in DID core we are guaranteeing that methods will use that feature to serve the business interests associated with those methods. I believe we are already seeing this as unspoken substrate to this issue.

If we allow this to happen, it threatens the very foundation of our work on self-sovereign identity and it risks wasting the huge investment we have all made in this work.

DID core must not tacitly encourage privacy compromises. The differentiating factors among DID methods must be highly visible and clearly linked to the DID method.

Consider, for example, the way renters are protected by standardized rental agreements in most large cities. Nothing prevents the landlord from amending the standard rental contract but those amendments stand out when they are added in longhand and that triggers discussion. Loan agreements have similar protections that require certain standard features like the effective interest rate to be in unmistakably bold type.

I think of DID core as the equivalent to that standard rental agreement. For consumer protection and the very principles of self-sovereignty that we're working towards, every deviation from privacy by default (as nicely described by @jandrieu above and with respect to various issues) must stand out as prominently as we can make it. If we don't our good work could turn out to be overtaken by unintended consequences.

@ChristopherA
Copy link
Contributor

Speaking up on Herd Privacy

I've been meaning for some months to speak up and express my serious concerns with the approaching CR (Candidate Recommendation) version of the DID specification due to issues of lack of focus on some privacy security fundamentals, of which this issue of herd privacy is only part of the story.

I believe supporting herd privacy is critical for DIDs. Unfortunately, it is now endangered by features added to DID Core that make DID Documents slightly different, and thus more correlatable. Instead, I need to reiterate my strong support for herd privacy by placing features of this type into DID Methods or in "off-DID" approaches leveraging Verifiable Credentials and/or zcaps, other protocols, and not have them as part of DID Core. By doing so we can ensure that DIDs can serve vulnerable populations for whom that privacy can literally be a matter of life and death.

The Definition of Herd Privacy

Defining herd privacy is simple. It can be found in section 10.5 of the specification, which says that "When a DID subject is indistinguishable from others in the herd, privacy is available. When the act of engaging privately with another party is by itself a recognizable flag, privacy is greatly diminished."

That same section further states the importance of herd privacy, stating: "DIDs and DID methods need to work to improve herd privacy, particularly for those who legitimately need it most. Choose technologies and human interfaces that default to preserving anonymity and pseudonymity." This is not a new element of the specification. It was part of the DID Implementers Draft 0.1, which I co-authored at Rebooting the Web of Trust 3, in late 2016, which was effectively the first public version of the DID specification.

The Importance of Herd Privacy

I believe that herd privacy continues to be a critical element of the Core specification that we need to respect accordingly. This is not a philosophical objection. Although DIDs will serve many purposes, some commercial and some not, one of the defining commitments that came out of our work with ID2020 at RWOT in early 2016 that led to DIDs was the need to serve vulnerable populations. As I said, privacy can be a matter of life and death to them, and herd privacy is how the DID spec achieves that. These might be people in China who are in a disagreement with the government; they might be whistleblowers in the United States who are speaking out against abuses of the government or a corporation; or they might be citizens in Africa who could be vulnerable to warlords or other extra-legal forces. DIDs give them a way to authenticate themselves with verifiable credentials, allowing them to prove who or what they are for various networked activities. But if they are not protected with a cloak of privacy, they will either be forced out of the electronic networking of the twenty-first century or else endangered by becoming a part of it.

Herd Privacy isn't only W3C requirement, as IETF RFC 6973 - Privacy Considerations states:

The size of the anonymity set has a direct impact on identity confidentiality, since the smaller the set is, the easier it is to identify the initiator. Identity confidentiality aims to provide a protection against eavesdroppers and intermediaries rather than against the intended communication endpoints.

My History with Herd Privacy

This also isn't a new topic for me. I've actively advocated for herd privacy in the past and seen the success of including it in a major internet spec. One of my main concerns when co-authoring the SSL/TLS standard was to ensure that all traffic was indistinguishable: that "herd privacy" could be achieved if sufficient numbers of people used it correctly. Other early competing protocols to SSL/TLS were specific to the web or didn't protect metadata, but our SSL/TLS architecture was agnostic. We were charted to focus on the web, I instead focused on an architecture to secure all transports (which sometimes got me in trouble). This architecture helped make it the world's most widely deployed security standard. One of my proudest moments (and the inspiration for our first Rebooting Web of Trust meeting) was when I heard in 2015 that more than 50% of email to and from Google was being secured by SSL/TLS.

The architecture worked! It also showed that herd privacy wasn't solely a question of serving a vulnerable population, but also something that could lead to the overall success of a specification.

Unfortunately, SSL/TLS has also shown us how privacy protections of this sort can be eroded over time. Across two decades of practical use, a variety of identifier and correlation attacks, as well as architectural challenges such as the dependence on DNS and certificates, have made the herd privacy of SSL/TLS less powerful. We will face those same challenges as DID grows in usage and popularity in the decades to come. We thus need to ensure that our initial release of the specification serves the needs of herd privacy to the greatest degree possible, without introducing susceptibilities such as identifiable DID Documents. These cracks in our privacy model would grow over time.

Current Issues with Herd Privacy

I believe that the privacy vulnerabilities now being considered as part of the upcoming CR are the result of us becoming too focused on the financial opportunities of LESS (Legally Enabled Self-Sovereign) Identity, and not spending enough time working on the trust-minimized version, which we describe in our own spec as the support for "those who legitimately need it most". We need to stop being a big tent for all purposes and instead first focus on fit-for-purpose, in particular for oppressed groups, who were one of the main communities for whom we created DIDs in the first place. Making features in the core DID spec "optional" is not sufficient: we "SHOULD" offer herd privacy and better defaults at the DID Core level for version 1.0. I might even argue for "MUST". I know that normative language was pulled from the 2016's DID Implementers' Draft 0.1 since we can't "prove" conformance, but I do still believe that it is a requirement.

Alternate Solutions for Those Other Needs

This is not to say that we can't serve those financial opportunities of LESS. We certainly want DIDs to become commercially successful to ensure widespread adoption, just like my work with SSL/TLS. However, we already addressed that in our preliminary work on DIDs way back at Rebooting Web of Trust 2, when we first linked up with the ID2020 community. That's when we created the compromise that split DIDs up between DID Core and DID Methods.

DiD Core should be conservative, especially in regard to potentially existential dangers such as impairing herd privacy. A trust-minimized version as a minimal architectural specification fits in with that conservative view. Innovation in DIDs is welcome, but I believe it must occur at the DID Method level.

My Final Thoughts

I have demonstrated my committed toward the completion of the DID standard for which I'm credited as a co-author. I am the founder of Rebooting the Web of Trust, where DIDs were first incubated and were iterated through its first primordial requirements and specification. I was co-chair of the W3C Credentials CG for 4 years where the DID spec continued to be incubated toward a draft that could become the basis for a W3C Working Group. Now I continue to contribute as an invited expert to both the W3C DID and VC Working Groups. My views on the formal specification of DIDs are colored by my co-authorship of the TLS specification, one of the most successful and widely adopted security protocols on the internet, and more recently by Bitcoin, one of the most secure modern cryptographic protocols.

Unfortunately, my experience — based on the original goals of DIDs, the needs of a new internet standard, and the evolutionary growth that occurs following adoption — suggests that some of the current issues with DIDs could invalidate all of our hard work. We must reemphasize the need for herd privacy, and privacy in general, in the CR of our DID specification, and we must do so by removing elements that endanger it, such as correlatable parts of DID Documents. This can be done by minimizing the elements in the DID Core spec itself, with the expectation that they can be accomodated in DID Methods or other protocols.

I am available for discussing this topic with others, in order to support a more minimal core spec.

-- Christopher Allen, Principal Architect & Executive Director, Blockchain Commons

@iherman
Copy link
Member

iherman commented Jan 13, 2021

Looking at the three comments above (from @jandrieu, @ChristopherA, and @agropper) and I find all these arguments compelling.

But I am not a privacy expert, so I am looking for something more tangible. What are the features that should not be in the specification? Can we have some list of features that we should remove either from the normative part of the specification or from the specification altogether? This is the only way to move forward at this stage.

If I we were in the early stage of design, I would opt to remove the concept of DID URLs altogether. I feel that it would resolve most of the problems listed in this issue, and the only thing we would have to formally define instead is how to identify a specific verification method within a DID Document to be able to cross-reference it. (That is the only internal reliance on DID URLs I can see).

But I realize that this is a nuclear option, and it is probably not viable at this point: too many implementations and uses of DID URLs out there already. I just raise it to motivate the creation of a more specific list of features to be removed in order to move ahead...

@agropper
Copy link
Contributor

@iherman, the privacy issue is not DID dereferencing. It's the expectation that a DID document has added stuff that I cannot remove because the verifiers expect it to be there as a convenience to them.

The "papers please" problem arises when we combine the features that enable control and verification of a DID Document with features that enable linkage of the identifier with attributes of a subject. We wisely decided that the linkage of attributes to an identifier would be standardized as a VC. We realized the importance of separating the bundles of attributes so that an identity we call a 'holder' could choose which attributes to present in which circumstance. Every single SSI wallet I have ever seen stresses this ability for the holder to choose which attributes about themselves to present in which context.

If DID-core specifies the introduction of one or more subject attributes in the DID Document we are turning that document into a certificate (of identity) that does not have an obvious means of control over which attributes are presented when asked for "papers please".

Let's say I use DIDs to sign-in to Twitter by proving to Twitter control over some key material. At the border with Elbonia, I am asked to unlock my Twitter account for inspection. I have taken the trouble to keep two separate Twitter identities under two separate DIDs. Or, maybe Twitter has taken the trouble to allow me to choose which DMs are displayed to the border guard or not depending on which of two DIDs I hand over to the Elbonian verifier.

Before looking through my Twitter persona, the verifier looks through the DID document I use to control that persona. How does the verifier know that I am the subject of that Twitter account? Is there anything in the DID document that links me to the contents of the Twitter account other than the DID itself? Maybe yes, maybe no. Whether or not the DID document contains some such attributes depends on the DID method.

Who decided what DID method would be associated with a Twitter account? Does Twitter say I must not use did:key or did:peer because that have a Real Name policy? Or do I decide which DID method to use to open a Twitter account and then, if I choose, I post that DID in the account itself along with some selfies for all to see?

I don't see DID dereferencing as the real issue. if the Elbonian border guard can scan a DID I present and see my Twitter feed I can still have two Twitter accounts.

I can't prevent any sovereign from saying that all Twitter accounts must be "verified" or otherwise attached to a real name. They do that by specifying the acceptable DID method. But I can make it obvious that they are forcing me to use a verified DID in which case I would self-censor what I post on Twitter and take my real communications to somewhere else. If the method has something in the DID document that I want to remove, how do I do that?

The privacy issue is not DID dereferencing. It's the expectation that a DID document has added stuff that I cannot remove because the verifiers expect it to be there as a convenience to them. And now, that DID document is no longer under my control.

@iherman
Copy link
Member

iherman commented Jan 13, 2021

@agropper, forgive me, but I try to be very much down-to-Earth here, with an eye on our plan to publish a CR soon…

@iherman, the privacy issue is not DID dereferencing. It's the expectation that a DID document has added stuff that I cannot remove because the verifiers expect it to be there as a convenience to them.

The "papers please" problem arises when we combine the features that enable control and verification of a DID Document with features that enable linkage of the identifier with attributes of a subject. We wisely decided that the linkage of attributes to an identifier would be standardized as a VC. We realized the importance of separating the bundles of attributes so that an identity we call a 'holder' could choose which attributes to present in which circumstance. Every single SSI wallet I have ever seen stresses this ability for the holder to choose which attributes about themselves to present in which context.

If DID-core specifies the introduction of one or more subject attributes in the DID Document we are turning that document into a certificate (of identity) that does not have an obvious means of control over which attributes are presented when asked for "papers please".

You use conditionals here. Is there any specific DID core term that is allowed on a DID Document that violates this? Do we have to add some specific extra constraints to the specification to avoid these issues? Because if the answer to both of these questions is 'no', then I am not sure what we are discussing here.

Let's say I use DIDs to sign-in to Twitter by proving to Twitter control over some key material. At the border with Elbonia, I am asked to unlock my Twitter account for inspection. I have taken the trouble to keep two separate Twitter identities under two separate DIDs. Or, maybe Twitter has taken the trouble to allow me to choose which DMs are displayed to the border guard or not depending on which of two DIDs I hand over to the Elbonian verifier.

Before looking through my Twitter persona, the verifier looks through the DID document I use to control that persona. How does the verifier know that I am the subject of that Twitter account? Is there anything in the DID document that links me to the contents of the Twitter account other than the DID itself? Maybe yes, maybe no. Whether or not the DID document contains some such attributes depends on the DID method.

Right. And the DID Core specification does not say too much about the DID method in this respect. Should we formulate a more restrictive view on methods in the Core specification? Should we have criteria that affect what methods we accept in the registry and what methods are not? Do we miss something that should be added to the DID Rubric document?

Who decided what DID method would be associated with a Twitter account? Does Twitter say I must not use did:key or did:peer because that have a Real Name policy? Or do I decide which DID method to use to open a Twitter account and then, if I choose, I post that DID in the account itself along with some selfies for all to see?

I don't see DID dereferencing as the real issue. if the Elbonian border guard can scan a DID I present and see my Twitter feed I can still have two Twitter accounts.

I can't prevent any sovereign from saying that all Twitter accounts must be "verified" or otherwise attached to a real name. They do that by specifying the acceptable DID method. But I can make it obvious that they are forcing me to use a verified DID in which case I would self-censor what I post on Twitter and take my real communications to somewhere else. If the method has something in the DID document that I want to remove, how do I do that?

The privacy issue is not DID dereferencing. It's the expectation that a DID document has added stuff that I cannot remove because the verifiers expect it to be there as a convenience to them. And now, that DID document is no longer under my control.

Again, I am not saying what we discuss here is not important. Obviously it is, very much so. But we should concentrate on the DID Core specification at this point, hence my questions.

@jandrieu
Copy link
Contributor Author

@iherman This issue, and the special topic call, are for clarifying the scope and intention of the group with regard to herd privacy. "Privacy" has been mentioned in 41 different issues, including this one. Arguments in many of those are based, essentially on herd privacy. Seven issues explicitly mention herd privacy and in the most recent discussion, both on voice calls and on github @talltree has dismissed privacy concerns asserting that herd privacy doesn't apply to all DIDs.

I'd like to establish a consensus notion of herd privacy that will provide guidance for specific text changes. Once we have that, I will recommend specific PRs that address this consensus (although if the consensus is to remove the language of herd privacy, I'll defer that to someone else).

It would be premature in this conversation to propose specific text changes, as the problem is that in multiple issues and PRs, there is a significant disconnect on this topic that has led more to arguments than productive debate. If we can establish a common framework, we can raise PRs that will bring the spec into alignment.

@talltree
Copy link
Contributor

Seven issues explicitly mention herd privacy and in the most recent discussion, both on voice calls and on github @talltree has dismissed privacy concerns asserting that herd privacy doesn't apply to all DIDs.

@jandrieu, I know this is an emotional issue for you and others on this thread, but do you honestly believe it is fair to characterize me as "dismissing privacy concerns" when I have been the #1 advocate for Privacy by Design with DIDs since the very first line of this spec was ever written?

I have never "dismissed privacy concerns" with regard to DIDs of any kinds.

What I disagree with is the thesis that the "herd" when it comes to herd privacy for DIDs must be 100% of all DIDs. I have summarized the rationale in this short Google Slides deck that I prepared for the special topic call tomorrow and also attached as a PDF for those who cannot access Google Slides.

It is not a complex argument. To argue that all DIDs must be designed to support herd privacy is to argue that all DIDs must appear in a context that supports anonymous or pseudonymous relationships. That ignores all the contexts where exactly the opposite is true: the DID must be well-known. This is summarized in this diagram from my slides:

Screen Shot 2021-01-13 at 12 38 16 PM

I look forward to discussing this in the special topic call tomorrow (note that I have developed a conflict for the early part of the call, so unfortunately I may be late).

Herd Privacy and DIDs.pdf

@ChristopherA
Copy link
Contributor

I am extremely uncomfortable with this image. It entirely out of proportion. Deployments in the near future these public DID ovals will fill the almost the entire box, and with just a little white space for the "all other DIDs". But these are the ones that need protection by being indistinguishable and non-correlatable.

This would be like saying in SSL/TLS "we'll only secure payments", which means many parties could censor your shopping, vs. have the entire website in secured and no one can know if you are getting news, looking at a catalog, price shopping, or making a payment.

I was similarly uncomfortable in the latest TLS 1.3 and how it supports traffic monitoring and ESNI. Early proposals for TLS 1.3 eliminated support for traffic monitoring entirely for herd privacy reasons, but they were added back in because of demands of enterprise to monitor traffic. The result is as I predicted, China is censoring TLS 1.3 that doesn't support traffic monitoring or that does support ESNI.

True herd privacy demands that the private be indistinguishable from the public.

@jandrieu
Copy link
Contributor Author

jandrieu commented Jan 13, 2021

@talltree I think it is completely fair to say that you dismissed privacy concerns. In multiple threads, including the latest with "resource" and definitely in the debate on "type", you dismissed my concerns over herd privacy, claiming it doesn't apply to all DIDs.

Hence, this issue and tomorrow's special topic call.

@talltree
Copy link
Contributor

talltree commented Jan 14, 2021

@jandrieu Saying that "herd privacy does not apply to all DIDs"—which I am—and "dismissing privacy concerns" are two different things. But I guess we'll just have to agree to disagree about that.

However, since this topic is about privacy concerns, what I find most concerning is this statement from @ChristopherA:

Deployments in the near future these public DID ovals will fill the almost the entire box, and with just a little white space for the "all other DIDs".

There is a specific reason the diagram shows peer DIDs taking up so much room. To my knowledge, the vast majority of actual production deployments of verifiable credentials (VCs) where the subject is a human being will NOT use public DIDs for the subject due to—you guessed it—privacy concerns. Not only does using a public DID for the subject make it trivial to correlate across all presentations, but so does the signature over the credential presentation.

In short, using public DIDs for individuals in VCs are perfect tracking beacons.

What's worse, writing a public DID whose subject is a human being to an immutable blockchain is such a clear challenge for the GDPR right of erasure ("right to be forgotten") that the practice is still not allowed under the Sovrin Governance Framework. I don't say that lightly. The Sovrin Foundation spent a year working with attorneys and GDPR experts trying to find a clear path for individuals to register public DIDs on the Sovrin ledger without creating irresolvable GDPR conflicts. You can read the resulting analysis in this paper. We were never able to find a satisfactory answer.

Therefore every implementation of verifiable credentials that use the Hyperledger Indy/Ursa/Aries stack that involves issuing VCs for human subjects use peer DIDs and ZKP credential formats.

So I find it ironic that you are contending that all DIDs need herd privacy when in fact the strongest privacy is provided by using peer DIDs which do not need to be public at all.

@jandrieu
Copy link
Contributor Author

@talltree Your fundamental argument, as illustrated in your diagram, dismisses the privacy concerns of DIDs for individuals because, to you, Peer DIDs and "public" DIDs matter more.

A more collaborative engagement would be to say "I recognize your concerns. Let's find a way to address them."

@TallTed
Copy link
Member

TallTed commented Jan 14, 2021

@jandrieu -- In your initial post, creating this issue, you said --

When the holder signs a VP, it establishes that a party who controls the identifier of the Subject consents to its use for some purpose under some terms.

I'm pretty sure you meant, "...a party who controls an identifier of the Subject..." because there's no such thing as the identifier of a Subject, because anyone can create a new identifier of any Subject at any time for their own purposes -- especially in the universe of DIDs -- and this is vitally important for any semblance of the kind of privacy you're trying to build.

But even with that correction, your assertion is incorrect.

The Holder signing a VP only has such control over the VC they're packaging as a VP that a Holder might have -- which doesn't necessarily include control over any identifier of the Subject. That Holder is consenting to the use of the VP for some purpose under some terms. This does not necessarily extend to use of any identifier of any entity, VC Subject or otherwise.

Even the Issuer of a VC doesn't necessarily have control over any identifier of the Subject! The Issuer has control over some identifier(s) of the VC itself -- but that's it!

Running into these basic inaccuracies in the first post in this thread does not bode well for the rest of it holding together.


I submit that "privacy of DID Subjects" might be considered as one of the many axes of consideration in the DID Rubric, but it should not (I daresay, can not) be thought of as something which should (or can!) be built in and universally true, to any depth between zero and perfect, for all DID Subjects of all DIDs in all DID Methods.

Existing in the world is imperfectly private. Perfect privacy is impossible.

Regrettably, by including a number of absolutes, the GDPR is an over-reaching piece of legislation which will eventually be shown to have forbidden many technologies which would have improved the situations the GDPR was meant to address, and instead the GDPR is solidifying those aspects of the extremely un-private world at their current levels, and may even be making them worse.

@jandrieu
Copy link
Contributor Author

@TallTed I agree with your nuanced critique of the notion of "the" identifier, although I would clarify by saying the VP only provides proof for the identifier that is in the subject (assuming a single-subject VP and that the VIP is signed with cryptographic material provably linked to the VC Subject). That the "the" I was referring to.

However, your second point misses the best practice of using proof-of-control both before issuance of a VC--which proves to the issuer that the soon-to-be holder is, in fact, in control of that identifier--AND proof-of-control of the identifier in that VC upon presentation, in the form of signing a challenge string in the VP using the same cryptography as that of the cryptographic identifier that is the Subject of the VC. Together, these two proofs demonstrate that the presenter has access to the same cryptographic secret(s) that the initial recipient does. This practice depends on proof that the presenter, does, in fact, control the identifier in that VC.

My other clarification: because of our extensibility model, we can't prevent DID Methods from violating herd privacy, and I'm not arguing for that. Rather, I'm arguing that DID Core should not define features that promote violations of herd privacy. DID Methods are the proper place for such innovations.

@iherman
Copy link
Member

iherman commented Jan 15, 2021

The issue was discussed in a meeting on 2021-01-14

List of resolutions:

  • Resolution No. 1: Move resource=true from DID Core to the DID Spec Registries (as an extension).
View the transcript

1. Herd Privacy

See github issue #539.

Ivan Herman: See Drummond's slides.

Manu Sporny: we can review joe's position, then discuss a fallback consensus position.

Christopher Allen: I thought i can recap history or talk about pre-DIDs while we're waiting for Drummond.
… this is not just a DID thing.
… this is an ongoing problem that the entire internet design community is dealing with.

Daniel Hardman: I'm aware of the herd privacy topic and I'm pretty sure I'd say anything drummond would say.

Joe Andrieu: I think herd privacy is vital to the DID Core spec, and we need to update it to address herd privacy while ensuring that DID method implementers are free to innovate.
… herd privacy means you should not be able to discern the nature of the subject by comparing did core features of dids, did urls and did documents.
… Herd privacy is vital to the DID Core specification. DID Core shall be updated to better address herd privacy while attempting to ensure DID Method implementers are free to innovate. Herd privacy means that one should not be able to discern the nature of the Subject by comparing the DID Core features of DIDs, DID-URLs, or DID Documents. DID Core features that impact herd privacy MUST justify their functionality through exceptional ben[CUT].

Dave Longley: +1 generally to Joe.

Daniel Hardman: the first part feels comfortable, the second part needs some nuance.
… herd privacy ought to mean you can't tell anything about the DID subject.
… I think it would be fine to know a particular DID relates to a document or a iot device.
… herd privacy as it relates to things as it relates to things other than individuals is a little bit iffy.
… I'm not convinced that it's vital for nonhumans.

Orie Steele: to note two points related the concept of herd privacy being fundamental.
… I'm aware of an intention to use the DID identifier idchar string to identify software packages as a distinct subset from people.
… this is work that microsoft folks are interested in, a registry for different types of identifiers.
… the first use case they've talked about is software packages as distinct from did subjects that might be users.
… microsoft is talking about that, they are already working on things related to that.
… you should know about that use case.
… It'd be awesome to hear from the about their intention for registries of types of dids for ion.
… they've thought about this a good amount.
… I've argued a lot about this topic.
… One thing about registries we should be aware of is you don't get to control how the registry is used in the future.
… especially in social media, culture changes over time.
… eg. master for the main branch in git was acceptable before, is no longer acceptable now.
… we should be capable about making unenforceable statements in did core.
… we are not in control of the future or english language.

Dave Longley: I think it's important to note is that any DID Document that contains "type" information, etc. -- it does not contribute to "the herd" -- because it can be separated from it..

Manu Sporny: I feel like I understand mostly where people are coming from.
… What changes to the spec are we talking about?.
… the current thing under consideration is suggesting that we move resource dereferencing out of the spec into a resolver parameter.
… I think Joe is saying let's take resource=true out of the core spec and if people want that they can register it in the registries and the way to do it is through a DID resolver parameter of some kind.
… that's one concrete spec change I can think of.
… everything else feels non-normative.
… any other changes contemplating suggesting anyone?

Christopher Allen: I'm coming from a history of great intentions that result in regrets and there's a lot of people in ietf that have turned around and said we need to be radical.
… we didn't do it good enough.
… and consider myself in that category.
… a thing I love about DIDs is that it really truly abstracted a part of the bootstrap.
… given I need an identifier that isn't controlled by others, but I need to be able to prove control.
… anything beyond that ought to be a different level.
… yes I might ask the resolver to get me more information, it ought to be going to the endpoint and doing a protocol but that's a different layer of the stack than DID is.
… DID fundamentally is about being able to prove control over an identifier.
… I'm more radical, I'd probably rip more out of DID core.
… but i really am trying to protect people.
… I'm doing DID onion and you should know with did onion whether there's a person behind it, securing git archives and stuff and things that aren't people, but when I'm done you won't know at the DID level what's what or who's what or if it is a who.
… you'll just be able to know that the identifier has a controller.
… and everything else will be another layer.

Adrian Gropper: to daniel's point, and I think drummond's, I don't believe that DIDs for things and documents where those things and documents are associated with people are not part of herd privacy.
… I disagree with what I think I heard drummond and daniel say.
… And to the point just made, I would put everything behind service endpoints.
… I would see that DID core should not introduce any properties or parameters other than those involved in controlling verification, and everything else would have to go behind a service endpoint of oen kind or another.

Drummond Reed: I have never said that DIDs for certain types of things do not need herd privacy..

Joe Andrieu: one of the reasons for this call is that I'm fed up of arguing with drummond and having concern for the privacy of individuals dismissed because of features for things that aren't individuals.
… it's premature to itemise all of the things in the spec that need adjustment. resource=true is one of them. It's on me to go through and do a deep dive.
… but that's not worth doing if the group doesn't support a shared definition of herd privacy.

Drummond Reed: I am upset with Joe using the term that I've been dismissing privacy.
… I do not want this to be about this reaction.
… I have a very straightforward position on this issue which I've shared.

Manu Sporny: Agree — we should stay away from tying people to positions. And discuss the ideas on their own..

Drummond Reed: which is I'm a huge supporter of herd privacy for the context in which it's needed.
… I know the contention on the table is that it's needed for all DIDs.
… and I want to point out that that is not true.
… if that were true we would have to agree that there's only oen context for all DIDs, which is anonymity or pseudonymity and that would ignore use cases that don't involve those.
… so it's counter productive for the spec and for our goals to insist that herd privacy must apply to 100% of DIDs. It may be 99% of DIDs, that's fine.
… it's going to be a mistake for us to say it's going to apply to 100%.
… and we can talk about the specific cases.

Daniel Buchner: Microsoft must herd with other corporations to ensure that no one knows our main DID is related to Microsoft the company.
We wouldn't want it getting out that we have a lot of trusted things tied to our DID! clutches pearls.

Ted Thibodeau Jr.: If the herd is all DIDs and DID Subjects, across all DID Methods, then the herd is large, and some herd privacy might be possible.
If each DID Method has its own herd (separating, perhaps, IoT devices from documents/information-resources from humans; and possibly those humans who care about herd privacy from those who don't [yet, now, etc.]), then some herds will be large and may bestow some inherent herd privacy, but other herds will be small and bestow less, if any, inherent herd privacy.
If the only DID Subjects who do whatever heavy lifting is needed to use a DID Method that tries to preserve herd privacy are the ones who need that privacy, they have already lost it.
Anonymity and pseudonymity must be the default, and still allow for nymity.

Orie Steele: "each did method" is a herd!
"obviously did:github" is only for users of github.... "did:onion" for tor hidden services....

Manu Sporny: +1 to what Ted just said.

Amy Guy: +1 TallTed.

Ted Thibodeau Jr.: names can be used in the anonymous sphere without disrupting the anonymity of those who prefer not to assert some known identity.
… we have to do the anonymising, it's the only way that gets us there.

Dave Longley: I was about to say something very similar to what ted said.
… every time we have more information added to a DID doc that information is not random, you're creating new herds.
… every time you add more information like that, you separate an existing herd into the set of information that have this new non random piece of data and the set that does not.
… it's important for people to understand that you keep reducing the size of the herd whenever you do this.
… you create separate herds that may not be large enough to provide the protection necessary.

Manu Sporny: a concrete question I have for joe and christopher, it feels like we're all on the same page with respect to trying to protect people when it comes to herd privacy.
… the main point of discussion or disagreement is what endangers herd privacy.
… I've heard that this resource=true param endangers herd privacy.
… I don't necessarily understand how.
… what's different wrt this feature that other features don't have.
… so for example, I think we could say any extra piece of information that you add to a DID doc besides the base key materials stuff endangers herd privacy.
… we may be able to have consensus to say that in the spec.
… but what specifically about resource=true is the issue?
… is it this is just one example among many and we're nipping it in the bud and asserting it here?
… or is there any inherent thing that's different about resource=true from the key material and the service endpoints?

Orie Steele: if adding properties other than key material to a did document endangers herd privacy.... why did we spend months adding arbitrary extensibility to did documents?.

Joe Andrieu: to manu's question, it reveals the nature of the subject.
… the cryptographic material does not do that.
… to drummond's point that is is all about me trying to get herd privacy to apply to all dids.
… that's not true. I want it to apply to all DID Core features.
… DID methods must be able to innovate, and of course they will do things that are going to create new herds.
… but that's their choice.
… we have an extensibility method that embraces that.
… please don't say it's all about trying to make all did methods apply with my ideas.

Drummond Reed: I acknowledge what Joe said, thanks.

Dave Longley: Orie, I think Joe just responded to that -- we can't stop extensibility, we say how to do it, in fact, so there is interoperability. However, Joe's argument is to not build certain things into DID core so it's not something every method needs to support or is coerced to support.

Christopher Allen: I have no problem with there being some type of feature where you can request is this a resource=true. I feel like resource=true is a verifiable credential.
… a self defined, I sign this, I'm saying I'm a resource.
… all these things are at another layer.
… right after united nations id2020 when we did grand compromises on these architectures, we really said we need to separate out identifiers from names and all the other things.
… if we can let that one layer be as safe as possible and allow for methods and other types of things to be able to extend it.
… then that's the great compromise.
… I would say I should have pushed more back then that even method type stuff ought to be at a slightly different layer, and I didn't push that hard for that.
… I'm a little more radical, I'd like to cut out more.
… I'm not saying we don't want to offer these features, I just want to make it at a different layer.
… Names and other identification things that are beyond the control of the identifier.
… this really is a decentralized identifier standard.

Daniel Buchner: use cases [use cases] [[use cases]].
… I don't know the nuances of resource=true.
… one thing we're dogfooding right now.
… you could put a nonhuman type inside of the method, because that was something that didn't exist in the DID core spec, you can do that within the method so you can query by type.
… we're doing software packages.
… today we own your github, npm, we do want to disrupt ourselves, one idea is to have a DID to stand for a software package, and to be able to index those in ion.
… we don't have to echo the types out any further.
… I can index by type, find that type of thing, get that URL, and present the same interface that npm would but you don't have to be owned by us!.
… as long as the use cases are efficient and don't need a second VDR, these are expensive to run.. we need a ledger to deal with these sorts of thing.
… That's one use case example.
… I want to make sure it's not affected.
… service endpoints and typing can be internal to the method, but we need those sorts of things.

Markus Sabadello: comment about DID methods.
… each method is its own herd.
… methods are primarily about how control is established.
… how DIDs are created.
… DID methods should not be about what the IDD subject is and what the DID identifies.

Christopher Allen: +1 to methods about how control is established!.

Markus Sabadello: I think it would a be a big mistake for a did method that can only identify things, only identify github users.
… and extensions are not necessarily done only by DID methods.
… if someone wanted to invent a type property or parameter, that's not necessarily limited to a method.

Manu Sporny: questions for each side.
… I don't see how resource=true violates herd privacy, I've tried to understand and I don't get it.
… you still don't know if it's an iot device or a person or a chair.
… the reason I'm making a point of this is we need to understand where the line is in the spec.
… I don't think it's being well defined.
… if we can talk about why resource=true is an issue and separate that form just adding any other params.
… it could be that what joe and christopher are arguing for is anything beyond how to establish control shouldn't go in the core spec except for services, that would be something.
… there's nothing for me to latch onto as an editor to make a judgement call.
… to ask folks on the other side of the point,.
… I don't think anyone is saying we're not going to address the use case.
… there will be something to the effect of resource=true, the only question is is it in the core spec or the registry, and what language will we write around that.
… if it comes to it we would move it to the registry, would that trigger a formal objection from anyone?.

Joe Andrieu: q.

Ted Thibodeau Jr.: If I resolve & dereference every DID I encounter and can see "these DIDs say resource=true, and those don't" then I can focus further observations and correlative efforts on the latter.

Daniel Buchner: I need at least Service Endpoints, for them to be type-extendable. If you leave the type of the DID to Method-land, that's fine, but you will then have even more clear 'winners' even faster when it comes to Methods.
the Method that embraces the best typing structure will have a HUGE advantage.
And if you want to hand that advantage to those of us who want to do that, so be it.
but do understand the game theoretical result of that decision.

Drummond Reed: christopher said something I find helpful. We're talking about layering.
… it's important for us to distinguish between DIDs and DID URLs.

Ivan Herman: oh yes.

Drummond Reed: that distinction is so important.
… we may not remember when were wondered what to call URLs.
… my point is DID URLs, it's a little bit like we're talking about domain names and we're going to cast requirements on all the URIs on which domain names are just roots.
… we can say herd privacy does not apply to everything that has a URI, there are many things you need to use a URI for that doesn't require herd privacy.
… the resource=true parameter turns a DID into a DID URL.
… that DID URL has a specific use.
… the indy community has an immediate, they're planning on using that param in their DID method. it's not a layer violation because the resource will be at layer 1, the resource will be in the VDR.
… there is no higher layer involved there.
… all you're doing is taking the DID, which all by itself the did doc wouldn't indicate anything by itself, but if you've now turned it into a DID URL because you want to do something specific, you're going to know that that DID URL if you ... if I use a DID for an individual, even a DID that has herd privacy by itself from the DID Doc, the usage of the system. many will be violating herd privacy.
… the case for the resource=true param is that it is a particular kind of DID URL which is when the resource is an information resource stored on the VDR, that should be available to any DID method that wants to use it.

Orie Steele: In ION, resource=true is implemented as leading byte prefix.... for "resource type".

Manu Sporny: Orie, is it a multicodec value?.

Daniel Buchner: DPM FTW!.

Daniel Buchner: NPM, too centralized, boooo!.

Daniel Buchner: Sure, and you can have the underlying DID use a blinded authority scheme for its keys and commitments.

Drummond Reed: I think it is a wonderful idea to have DID methods that are designed explicitly to maximize herd privacy.

Drummond Reed: @manu - I'm happy to explain why I believe resource = true should be in DID Core.

Manu Sporny: @drummond -- not the question :) -- would you object to it being in did spec registries?.

Daniel Buchner: I don't care if consumers of DID Docs have to go hit a SE inside the DID Doc.

Christopher Allen: I want to respond to daniel's thing of having this universal way of differentiating this is a software file.
… I happen to be also working on securing software packages use case.
… the problem is people who are authors, contributors to code, you have organisations that are contributors, you have the actual code itself which evolves over time, based on the authority and verification of the various parties.
… I'm very explicitly with the did:onion, noncorrelation, at the DID level you don't know anything at that point.
… there's real concern, you shouldn't know that this is a person, org or thing at that level.
… you need to go to the endpoint for that thing and request the verifiable claims that is where it says "I'm a resource" or "organisation" or "contributor" which could be denied.
… because at that point, if it happens too soon it can be a denial of service.
… I can go i'm gonna deny that particular level.
… I have knowledge that i don't want these other people to share.
… I don't think it's required to be at that level.
… there is some risk.
… this goes back to an early rwot use case which was the amira use case where you have a software engineer who was working on anti violence software.
… and feels like they could become a target.
… of people that object.
… in DID onion and the code around it we're keeping that very explicit.
… the whole DID URL thing has been controversial for a while.
… if it's separate from an identifier in a VC there are additional params that ell a resolver to do a shortcut to get these things so I just make one request.
… I end up with this as a resource.
… I don't have a problem with that, the resolver can do more than just prove control.
… it can walk through the steps on behalf of the user.
… but that means we've made some poor choices by having so much of the DID URL spec in the Core.
… it should be more in deferred to the resolver spec.
… where I draw the line is when there is something in the DID Doc.
… I'm okay if resolvers can offer other services.
… but if it's in the DID doc that's where I draw my line.

Daniel Buchner: is this just about automating that at the Resolver layer?
if so, for me, that's not a deal breaker.

Joe Andrieu: slightly different answer to dbuc.
… I think methods are where you should be innovating these things.
… if microsoft wants to do this, more power to them.
… the notion that certain groups are already expecting this feature is exactly why it's important that we vet all the features in DID core to make sure they are useful for every method.
… they will be adopted by people who aren't here for these conversations.
… how resource=true violates herd privacy, the only use case presented are when dids represent a resource.
… it is for discerning human resources from digital resources.

Adrian Gropper: [??] from my perspective the efficiency of the process is not the issue at the DID Core level.
… moving everything behind the service endpoint should be where DID Core puts the demarkation point.
… and then we should be working on more efficient auth methods if that is where the bottleneck shows up.

Dave Longley: to say why allowing resource=true causes trouble for herd privacy: it separates one herd into two... one that is known to be for digital resources, and the other that is unknown (but smaller in size, not including those things that are known to be digital resources).

Manu Sporny: resource=true I'm not hearing.. I heard joe when he said that the use case that was presented was to make distinct human resources from nonhuman resources, yes that was presented and that clearly cuts it between two things we don't want to cut it between.
… but I don't think that's actually the use case.
… you can have a human actor put resource=true just as well as you can a digital resource or an iot resource.
… I don't think the actual use cases make that distinction.
… i still don't see the difference.
… I didn't hear an answer to my question that would anyone object if we put resource=true in the Registries.
… the people that want it would still be able to use it.
… that seems to be the compromise position.
… if we go on the route we're on right now we'll see objections.
… we're going to have to decide whether to take it out.
… it'll be clear we don't have consensus to keep it in.
… the only way it ends up staying in the spec is if we get consensus and it doesn't seem like we are converging.
… who is going to object if we put it in the Registries?.
… if there are objections from that side, we'll have to see what has the fewest objections.
… I don't see people changing their position on the call.
… I see people continuing the current position.
… we're headed towards what's going to get the least number of objections.
… It's important to hear clear answers from joe as to why this is any different from any other feature, and from drummond or anyone are you going to formally object if it's in the registries and then why.

Ted Thibodeau Jr.:

  1. All DID Methods must include resource=true on the surface of all DID Documents, eliminating its utility, but preserving herd privacy.
  2. If I resolve & dereference every DID I encounter and can see without decrypting anything "these DIDs say resource=true, and those don't", then I can focus further observations and correlative efforts on the sub-group I care about.
  3. If everything that doesn't care about herd privacy turns it off, they radically shrink the size of the "privacy" herd, radically reducing that herd privacy, potentially to zero.
  4. Bad actors (including national entities working for their good, which may not be universal) will request all sorts of things through all sorts of disguises and anything that is on the "visible" surface changes the shape available for attack.
  5. Pieces of this are similarly possible with TLS/SSL re TCP/IP traffic, which (lacking VPN) does expose the endpoints of those connections which can be an exposure point. See the analysis of American Revolutionary postal correspondence that discovered the network of conspirators for an example.

Daniel Buchner: types being generalized in the sense of there being a concept in the top level DID spec is good. There are drawbacks that come with it. If it's generalized across the methods they can interoperate. If it doesn't happen that's cool, but different methods might pick different types.
… with service endpoints in the DID doc, I can still make what i want to make.
… my question about this resource=true, what would this being there or not being there imperil about a use case like that?.
… is there something it does?.
… we might only need service endpoints and typing at some layer? am I wrong?.

Grant Noble: the DID method I'm working on is great for long term DIDs that should not be controlled by other entities, and humans who want maximum privacy.
… we pick options that maximize herd privacy.
… if it didn't break anything we'd set resource=true and if that screws up anyone else's library, tell me how it's going to break my experience.
… if it doesn't help herd privacy, I'm going to pick the most private option.

Ted Thibodeau Jr.: Camouflage is great for herd privacy..

Joe Andrieu: I feel like this was ramrodded in and doesn't solve the underlying use case.
… there's a willingness on behalf of the chairs and editors to put any feature in because somebody wants it.
… if there isn't a good use case that isn't privacy impacting it shouldn't be in the document.
… we don't need it.
… and of the reasons is there was a conflation by brent between resolution and dereferencing.
… method specs are already able to define how we dereference.

Manu Sporny: Nope, disagree with what Joe is saying -- this wasn't ramrodded -- we merge things per our process..

Daniel Buchner: Drummond: can you tell me where this being in/out breaks the use case of finding a typed DID in a registry, then following a typed service endpoint to a resource?.

Joe Andrieu: you can take a DID, resolve to a DID doc, and dereference according to the method, and get a resource.
… because it addresses resources in a VDR..

Daniel Buchner: I feel like I am agreeing with Joe, and it kinda feels weird.

Daniel Buchner: ;).

Drummond Reed: I strongly disagree with what joe just said.
… there are obviously ways to doing it. The number one reason to have resource=true in DID core is so that any method that wants to make resources available directly out of VDR has a way of doing it.
… is evernym going to object if that ends up in the registries? probably not.
… but I would consider it a disservice to the adoption of DIDs an dDID URLs if don't point out a standard way to return a resource and a standard param that any DID method can use to do that.
… the DID itself and associated DID doc for herd privacy, when you add the parameter resource=true and any DID method can add anything they want, DIDs are the root of DID URLs which will have millions of uses.
… all we're saying is there's one use of a parameter that will allow you to return a resource from a VDR.
… I believe the universality of the value of that parameter or any kind of resource you might want to return, is the reason for utility's sake to put that in DID Core.

Daniel Buchner: Drummond: could this be standardized via the Resolution spec?
Perhaps I just don't get it, I might plead ignorance here.

Ivan Herman: when I looked through the slides, there was the reference to this long appendix which seems to be also hanging on the same discussion.

Drummond Reed: @Ivan - I think the appendix topic is mostly about a separate issue, not herd privacy, so it may be that it warrants another special topic call.

Manu Sporny: thank you drummond, that is helpful to know evernym is not going to object..
… I'm just stating my opinion, not as an editor.
… It's heartening to hear everyone is more or less on the same page, just disagreeing about details. I know some of you don't think they are details.
… I don't know if we have enough to write a definition of herd privacy right now, but feels like we're all going for the same thing.

Daniel Buchner: Type makes sense, but I don't get this prop, to be honest.

Manu Sporny: The challenge I have with this entire discussion, and the same thing with type, this is a criticism of the arguments to not put this in did core, if these are useful features people are going to use them anyway.
… I think we should write about it in the spec, but that this keeps coming up and things like type and resource=true and these other things get kicked out of the spec is not going to do anything to prevent people from using it.
… the only thing that is going to prevent that is if there are compelling arguments made for why you should not use these features.
… I still don't know as a person needing to implement the spec what a dangerous feature is and what isn't when it comes to herd privacy.
… all I know is that adding more things hurts herd privacy.
… next steps here are calling for consensus to keep tin the spec.
… I expect people will push back on that.
… and the fallback position is to put it in DID spec registries, and it's going to change any implementer decision.
… the people that wanted it are going to use it anyway.
… so we haven't accomplished anything other than move the text from one place to another.
… that's my concern.

Daniel Buchner: This feels like DID Outer Solar System.

Manu Sporny: dbuc, the Kupier Belt of DIDs..

Daniel Buchner: DID Sun Core: Keys, Endpoints - DID Rocky Planet Methods: Typing, indexing - DID Outer Rim Resolution Middleware: resource fetching and advanced transforms.

Drummond Reed: I agree with Manu. I believe we've actually weakened the DID Core Specification by removing a universally useful feature..

Dave Longley: all of this comes down to philosophy around what goes into DID core and what doesn't.
… a lot of what manu said is true about no matter what we do we're not going to prevent behaviour. but nobody is arguing that.
… no spec can stop people from doing this.
… what this is about is whatever features we put into DID core we're announcing that this is something your DID methdo should support, and this is the right way to do it.
… anything we put in DID core is encouraging certain sets of features to be implemented by people who want to particpate in this ecosystem.

Joe Andrieu: q.

Dave Longley: If there are sets of things that people feel like will cause problems with privacy and security, we have a place to put those sorts of features, and that is in the registries.
… it doesn't have to go in core to encourage or coerce all did method authors to support these sorts of features.
… we have the mechanisms we need, we can push things to the registries when they aren't common features that everyone feels comfortable with.
… if we can adopt that policy we can get through this and have a solution.

Ivan Herman: how do the editors feel? do they have something to go away with, or not?.

Christopher Allen: I think it is more than resource=true. We need to define the principle..

Joe Andrieu: this call was not to be about resource=true.
… there's a fundamental to get agreement on. What does it mean to have herd privacy? I'm sad we didn't get that.

Jonathan Holt: +1 JoeAndrieu, I think this needs a lot more discussion.

Manu Sporny: two proposals for each point of view. I'm trying to focus on resource=true as something concrete.
… any modifications to either of those?.

Ted Thibodeau Jr.: looks more like straw poll than proposals?.

Grant Noble: Digital Contract Design has applied for W3C membership, and if a vote is taken would like to vote..

Christopher Allen: I think resource=true is a red herring of a larger problem.

Daniel Buchner: 0 on both - I am too ignorant on this topic to vote.

Proposed resolution: Keep resource=true in the DID Core specification.. (Manu Sporny)

Drummond Reed: +1.

Daniel Buchner: 0 - I am too ignorant on this topic.

Grant Noble: -1.

Dave Longley: -1.

Ivan Herman: 0.

Orie Steele: -1.

Joe Andrieu: -1.

Manu Sporny: -0.25.

Amy Guy: -0.5.

Ted Thibodeau Jr.: -0.5.

Markus Sabadello: 0.

Jonathan Holt: -0.5 i to am ignorant.

Ivan Herman: I have the impression that this will not make it.

Proposed resolution: Move resource=true from DID Core to the DID Spec Registries (as an extension). (Manu Sporny)

Daniel Buchner: 0.

Dave Longley: +1.

Grant Noble: 0.

Orie Steele: +1.

Drummond Reed: 0.

Ivan Herman: 1.

Manu Sporny: +0.75.

Joe Andrieu: 0.

Amy Guy: +0.5.

Christopher Allen: +0.5.

Ted Thibodeau Jr.: +0.5.

Markus Sabadello: +0.5.

Jonathan Holt: 0 i to am ignorant.

Resolution #1: Move resource=true from DID Core to the DID Spec Registries (as an extension).

Manu Sporny: we'll need to talk again, editors.
… in the Registries folks that want it can do what they want.

Markus Sabadello: Does this solve JoeAndrieu 's, agropper___ 's and ChristopherA 's concerns?.

Christopher Allen: Markus, it does not.
I like your definition earlier is that it is about control.

Dave Longley: i think if there were more decentralized registry technologies out there ... a lot of these issues would go away ... because people could post what they wanted over there using VCs... and DIDs could be left alone to do what they do best..

Christopher Allen: +1 to dlongley.

Ivan Herman: closing remark: I would like to see where this full discussion leads as far as the whole spec is concerned. I have heard several times that resource=true is one specific thing that triggers this discussion.
… We have to drive the document towards CR and that worries me.
… I would like to see specific change proposals if we have one to move ahead.
… thank you everyone for coming..


@jandrieu
Copy link
Contributor Author

@talltree Your argument continues to dismiss the concerns of what you see as a minority of usage. We can speculate all we want about which DIDs are going to be used more; that's a distraction. The DID Core specification MUST be seen through the light of all DIDs. To do anything less is to dismiss the concerns of those public DID Methods that are used for referring to individuals.

Just because your systems have made assumptions about which DIDs to use for which situations does not mean that your design choices should be imposed on DID Core and thereby encouraged for use by DID Methods. On the contrary, when we realize that Methods are choosing potentially dangerous features, they should be highlighted in the section on privacy and security concerns and in the DID Spec Registries if added there. They most certainly should not be enshrined in DID Core.

I'll reiterate my position, as expressed on that special topic call:

Herd privacy is vital to the DID Core specification. DID Core should be updated to better address herd privacy while attempting to ensure DID Method implementers are free to innovate. Herd privacy means that one should not be able to discern the nature of the Subject by comparing the DID Core features of DIDs, DID-URLs, or DID Documents. DID Core features that impact herd privacy MUST justify their functionality through exceptional benefits, and thoroughly documented.

@rhiaro
Copy link
Member

rhiaro commented Jan 17, 2021

I disagree with the notion that "herd privacy doesn't need to apply to all DIDs". The very definition of herd privacy is that it applies to everything in the herd. The moment anything in the herd is excluded, privacy diminishes for everything in the herd.

Responding to @ChristopherA:

normative language was pulled from the 2016's DID Implementers' Draft 0.1 since we can't "prove" conformance, but I do still believe that it is a requirement

and @jandrieu:

they should be highlighted in the section on privacy and security concerns and in the DID Spec Registries

What we could potentially do is to add normative language to Methods > Privacy Requirements to say something like:

DID method specifications MUST include a section on herd privacy. Specifically, DID method specifications MUST explicitly note any properties or metadata properties used beyond those in the DID Core specification, and MUST state to what extent such additional properties differentiate any DIDs (or subsets thereof) generated by this DID method from each other, or from DIDs generated by other DID methods.

(And, I assume only conforming DID method specs are (or will be eventually) admitted to the Registries. So any method specs without these sections would get immediately flagged on review.)

@jandrieu
Copy link
Contributor Author

Two specific PRs have been merged for this issue (including #616 created just now). Please mark the issue as deferred.

FWIW, there is still some confusion by implementers about how to best handle herd privacy. It may be appropriate to add some language to an implementation guide. For now, I understand we won't have an opportunity to get additional PRs in for the immanent Candidate Recommendation, but maybe the implementation guide is the right vehicle for further discussion.

@decentralgabe decentralgabe added discuss Needs further discussion before a pull request can be created and removed defer-v2 labels Jun 26, 2024
@msporny msporny added the class 2 Changes that do not functionally affect interpretation of the document label Jul 1, 2024
@decentralgabe
Copy link
Contributor

@jandrieu can this be closed? Happy to discuss on an upcoming call if it has not been fully addressed.

@decentralgabe decentralgabe added the p1 high priority label Dec 6, 2024
@w3cbot
Copy link

w3cbot commented Dec 6, 2024

This was discussed during the #did meeting on 06 December 2024.

View the transcript

w3c/did-core#539

manu: +1 to closing, the current process is to raise an issue, discuss in WG, then Pull Request, then merge if consensus.

<denkeni> +1 for it

decentralgabe: The section on herd privacy, what it means, how it applies to DIDs, good discussion here...

decentralgabe: Seems like we should discuss with Joe. Let's ask Joe if this could be closed.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
class 2 Changes that do not functionally affect interpretation of the document discuss Needs further discussion before a pull request can be created p1 high priority
Projects
None yet
Development

No branches or pull requests

10 participants