Did-core: Service Endpoints in the DID Doc might be an anti-pattern

Created on 28 Aug 2020  Â·  149Comments  Â·  Source: w3c/did-core

TL;DR: We don't need service endpoints in the DID Document... it's an overly-complicated anti-pattern that has a lot of downsides when we already have patterns that are implemented today that would work for all use cases.

It has been asserted that Service Endpoints in the DID Document might be an anti-pattern because, at worst, they can be used to express PII in DID Documents, and in every use case that we know of to date, they can be discovered through other means that are already employed today.

Ultimately, the problem is that developers need to be educated about the dangers of placing PII in service endpoints... many won't read the spec in detail... we have over 70 DID Methods now and the number is only increasing.

What are the chances that a non-trivial subset of them implement unwisely? My guess is the chances are pretty high, and that weakens the ecosystem.

We do have an option to not give developers foot guns... and we should try very hard not to do that. I'm afraid that non-normative documentation is better than nothing, but not good enough.

Here's what the group resolved yesterday (pending 7 days for objections to the resolutions):

RESOLVED: Discuss in a non-normative appendix how one might model Service Endpoints that preserve privacy.

RESOLVED: Define an abstract data model for serviceEndpoints in normative text, like we have done with verification methods.

RESOLVED: Define how you do service endpoint extensions using the DID Spec Registry.

I wish we would do more than that... there are alternatives that the group should consider in order to discover service endpoints:

  • Go to an entity's website, which would have a DID Auth button, which you could then use to send them your service endpoints privately using VCs.
  • Find an entity like we do today -- using a search engine of some kind... schema.org markup can be used to express public endpoints using VCs.

Both of those solutions allow us to 1) Use what we already have today, and 2) address all of the use cases that we know of.

PR exists pre-cr-p3

Most helpful comment

Whew! What a thread. I just read the whole thing because on this morning's DID WG call this issue was pointed to as "the outcome of the special topic call on service endpoints". I had no idea it had grown to this length.

I will keep this short. I hate to say it, but so far the content in this thread almost completely misses the two main reasons for keeping service endpoints in the spec provided that we include all appropriate privacy flags and warnings of course.

Reason Number 1: Public DID documents for public entities (like corporations, governments, NGOs, universities, churches, websites) who want to publicly advertise not just their DID and keys but their service endpoints. For these entities:

  1. We want to make it really easy.
  2. We want to make it one-stop/one-hop to post the original DID document and make updates.
  3. There are no GDPR or other privacy concerns.

Reason Number 2. Innovation. Who are we to say that we know all the right and wrong ways to safely use a service endpoint? DIDs are just entering the world. DIDComm is still an infant at the crawling stage. Yes, we should provide all the privacy warnings and guidance that we can. But to suggest that we remove the feature or artificially restrict a DID document to a single instance of a service endpoint seems a little like TBL predicting all the things URLs will be used for back in 1994.

Unless someone can decry these two motivations for service endpoints—which is why they have existed in the spec since the first draft four years ago—I suggest we move forward with Orie's suggestion:

  1. we should define an abstract data model for services like we have for verification methods
  2. we should warn about them in the privacy and security considerations
  3. we should warn about them in an implementation guide (if one ever gets created... if not... good thing we are committed to 2).

All 149 comments

Add a serviceEndpoint just in time, without updating the verifiable data registry using signed-ietf-json-patch.

However ^ this solution still requires us to define a data model.... and I would argue that so does "getting service endpoints" in credentials... unless you want every vendor to construct them differently, which will harm interoperability.

in other words... there is no solution to this problem that does not include a data model... but there are proposals for how that data model should be communicated, which have privacy, security and usability tradeoffs :)

Only using DID Documents on-ledger for well-known public identities, and using private off-ledger peer-wise DIDs for all personal identifiers mitigates the described issue as well. Personal service end-points would be shared only via the peer-wise connection, and public service endpoints are by definition meant to be public.

...there are alternatives that the group should consider in order to discover service endpoints:

  • Go to an entity's website, which would have a DID Auth button, which you could then use to send them your service endpoints privately using VCs.
  • Find an entity like we do today -- using a search engine of some kind... schema.org markup can be used to express public endpoints using VCs.

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document. Anyone who disagrees implicitly (whether they are aware or not) takes one of the two positions below, there simply is no third:

  1. All services should require centralized parties for location/distribution.
  2. Entities should not be able to share their intended-public data with others without participating in an explicit, out-of-band, DID Doc-external activity.

If you fall under Position 2 above, please do the following to ensure you are abiding by your own beliefs, if you have not already:

  • Delete your Twitter account
  • Delete your public blog domain
  • Delete your resume wherever you post it
  • Delete your images and videos from other social media sites
  • Turn all forms of openly accessible sharing and connections off in all interaction-based apps, such that no one can read your posts, messages, or communications without somehow contacting you and exchanging permissions through another channel.

If you do the above things in response to the implicit Position 2 that many seem to be taking, that is a first step in building credibility for the case that we should deprive people, companies, IoT devices and other entities from a more direct, decentralized mechanism of expressing themselves in fulfillment of application and service use cases.

we already have patterns that are implemented today that would work for all use cases...that we know of

I think a comparison will help explain why this argument falls flat for me.

The reason we need DIDs isn't because use cases aren't addressable, exactly -- it's because the nature of a use case's guarantees and semantics changes if we don't root them in DIDs. We could do VCs with SSH keys instead of DIDs, but we don't, because SSH keys don't have the same properties (decentralization, discovery, rotation, potential for multisig...) that DIDs do.

Similarly, the nature of a service endpoint's guarantees and semantics changes if we don't put them in DID docs. This is the essence of @csuwildcat 's comment above, which I agree with -- sure, you can do discovery with existing mechanisms, but you can't do it in a decentralized way unless you either use DID docs or invent an entirely new mechanism with the same characteristics as DID docs. Yes, there are alternative ways to communicate an endpoint. The DID controller may or may not control those alternative mechanisms. Therefore, by removing the service endpoint from the DID doc, we are allowing someone other than the DID controller to frame any conversations associated with that DID. You could say, "No big deal; the non-DID-controller can't lie about controlling the DID when a digital signature or encryption is required." I answer: "True, but that's not the full requirement, because just controlling the endpoint value itself allows a malicious party to simulate the silence, uncooperativeness, or flakiness of a DID owner they want to harass."

The recent Twitter hack of accounts belonging to Obama, Biden, Elon Musk, and others is exactly the sort of thing we enable if we communicate service endpoints outside the DID doc. That was an existing communication mechanism that could communicate endpoints, and its security properties are different from a DID doc itself. The claim that leaving service endpoints in the spec is an invitation for disaster is only half a story. Yes, doing service endpoints right is hard, and doing it wrong could be obnoxious. But taking it out is just as problematic, and I don't think developers write code that guards against ordinary cybersecurity risks any better than they write code that guards against service endpoint abuse. The difference is that service endpoints is a new field of knowledge where developers will be open to guidance, rather than familiar territory where developers will casually assume they already know best practice.

I agree that service endpoints certainly will reduce privacy in their (mis)use, and this is an important consideration to make.

However, I think that if we excluded them from the DID spec then the new risk we incur is one related to standards adoption--the standards will become far less useful without an ascribed way to do service discovery. +1 to @dhh1128's points about "what makes DIDs different and more useful than SSH keys?", with this being a core reason. Consider the impact of this on DIDComm, which in my mind is a major use case for DIDs. I believe we will need service endpoints to enable the discovery portion of DIDComm, though I'm not certain. cc @telegramsam @awoie

Also agreed to the point that if we punt service endpoints into another standard, then the problem still doesn't go away. In fact it might be solved in a lot less decentralized way than with DID documents, such as state-owned BigCo saying that they are the #1 DID Broker that's easiest to use for everyone because they can direct a slush fund towards winning the market in this way--and everyone would likely use the most convenient and free thing around, as we've seen for the past 10 years on the Internet.

So in summary, I recommend we keep service endpoints while acknowledging they _will_ bring privacy problems, with the understanding that having their functionality provided somewhere else could cause (1) significant adoption risks and (2) even larger systemic privacy risks. Perhaps if we agree on these logic inputs but disagree on the specific risk measures, we can make them part of the calculus from which the decision is made.

Finally, wanted to mention that resolution of this would unblock our work with the W3C privacy self-assessment here: https://github.com/w3c/did-core/issues/291#issuecomment-681172870

I propose a compromise solution based on my privacy-inspired perspective in https://github.com/w3c/did-core/issues/370#issuecomment-683075977

Relative to yesterday's pending resolutions:

RESOLVED: Discuss in a non-normative appendix how one might model Service Endpoints that preserve privacy.

Treat the PDP serviceEndpoint as normative, if present.

RESOLVED: Define an abstract data model for serviceEndpoints in normative text, like we have done with verification methods.

Define an abstract data model for the PDP serviceEndpoint based on standard UMA2 and pending GNAP practices.

RESOLVED: Define how you do service endpoint extensions using the DID Spec Registry.

Yes.

there are alternatives that the group should consider in order to discover service endpoints

I have some sympathies for this view; it seems to align with what Sam Smith has been trying to tell us since the Amsterdam F2F, which is that DID documents should only be about establishing control authority over the identifier, and that everything else (including service endpoints) should happen on a different layer.

But as others have pointed out in this thread, I also believe that alternatives (such as sending service endpoints together with the DID via the original channel, or using a search engine, or using a special refresh/notification/etc. service) will usually not provide the same guarantees that DID resolution and DID methods are supposed to provide, i.e. decentralization, control, cryptographic verifiability.

DIDs should enable service and data portability in the same way as they enable key rotation. Services are not comparable to VCs, they are much more foundational. DIDs are an indirection layer on top of both verification methods and services, since those are the fundamental constructs that enable trustable interaction associated with the subject.

Building on @Markus' "fundamental constructs that enable trustable
interaction associated with the subject", consider private keys and
authorization policies as the two things I should never be asked to put on
the wire. Relative to my private keys, all anyone ever sees is a useful
derivation. Relative to my policies, all anyone ever sees are a capability
derived from my policies. These are the two key functions of the
"indirection layer" enabled by DIDs. This is why I suggest that a PDP be
the first, maybe the only, normative data model associated with a DID.

On Fri, Aug 28, 2020 at 4:01 PM Markus Sabadello notifications@github.com
wrote:

there are alternatives that the group should consider in order to discover
service endpoints

I have some sympathies for this view; it seems to align with what Sam
Smith has been trying to tell us since the Amsterdam F2F, which is that DID
documents should only be about establishing control authority over the
identifier, and that everything else (including service endpoints) should
happen on a different layer.

But as others have pointed out in this thread, I also believe that
alternatives (such as sending service endpoints together with the DID via
the original channel, or using a search engine, or using a special
refresh/notification/etc. service) will usually not provide the same
guarantees that DID resolution and DID methods are supposed to provide,
i.e. decentralization, control, cryptographic verifiability.

DIDs should enable service and data portability in the same way as they
enable key rotation. Services are not comparable to VCs, they are much more
foundational. DIDs are an indirection layer on top of both verification
methods and services, since those are the fundamental constructs that
enable trustable interaction associated with the subject.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-683122421, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YN7KYB54ELXVNR3F23SDAEIBANCNFSM4QOKAB3A
.

I have a procedural objection to this approach. The proposals that we agreed to were an attempt to communicate consensus among the participates in a special topic call and as such are non-binding. As Ivan @iherman pointed out in the minutes these "resolutions" would be brought back to the rest of the group for broader discussion. Placing a 7 day window doesn't seem fair to such an important topic and itself is an "anti-pattern" to the standards development process.

Placing a 7 day window doesn't seem fair to such an important topic and itself is an "anti-pattern" to the standards development process.

The 7 day window is for the RESOLUTIONs we made, not for the topic at hand. This 7 day window is the process the group agreed to for the special topic calls. It provides an opportunity for people to object on the main topic call while ensuring that there is closure to resolutions so the group can build upon them.

/cc @brentzundel @burnburn -- we may want to remind the group of this process during the next call.

@jonnycrunch -- are you objecting to any of the RESOLUTIONS made during the last call? I note that you didn't object at the time: https://www.w3.org/2019/did-wg/Meetings/Minutes/2020-08-27-did-topic#res

@csuwildcat,

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document.

This may actually be more likely to happen as a result of exposing service endpoints in DID Documents. Especially if herd privacy is desirable -- it may result in a limited number of centralized parties providing service endpoint routers that can adequately provide that feature. You may end up having to choose from this limited selection in the same way we have to choose to "login with X" today.

You may say: But for cases where I don't care about unwanted correlation, I don't need herd privacy! Ok, I get it. You don't care about the privacy cases -- you've made that very clear. Please note, however, that it may be very challenging (or impossible) for a VDR (Verifiable Data Registry, aka DID ledger) to determine whether a service endpoint is "public" or not.

There's an implicit "typing" of service endpoints relative to whether or not people care about correlation here. If a VDR needs to accept service endpoints of "type" A and reject service endpoints of "type" B, but the VDR can't tell the difference, how would you resolve this problem? You may also say you don't care, you just want to use a DID Doc from a VDR. Well, there may not be such a VDR without solving this problem -- or the VDR you've chosen may get sued into the ground after you started using it and you'll be quite grumpy.

I want to see a solution here that addresses these issues. Ignoring them or saying they can't be discussed unless you delete your Twitter account -- while entertaining -- is missing the point. I also don't want to see a solution that furthers the kind of centralization problems we've seen in the past. Of course, this may mean leveraging more places to express service endpoints, not fewer. Note that that's a decentralized mechanism for solving this problem, not a centralized one.

@dhh1128 -- Can you provide a link to how the DIDComm community is considering how "GDPR-compliant service endpoints" might be implemented and how a VDR might differentiate them from non-compliant ones?

All: I think it would be most helpful to go through a number of concrete use cases around service endpoints to determine how they might be solved using service endpoints expressed in VDR-backed DID Documents vs. alternative approaches.

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document.

This may actually be more likely to happen as a result of exposing service endpoints in DID Documents. Especially if herd privacy is desirable -- it may result in a limited number of centralized parties providing service endpoint routers that can adequately provide that feature. You may end up having to choose from this limited selection in the same way we have to choose to "login with X" today.

I don't buy this argument at all - a Service Endpoint can contain a decentralized protocol URI.

You may say: But for cases where I don't care about unwanted correlation, I don't need herd privacy! Ok, I get it. You don't care about the privacy cases -- you've made that very clear. Please note, however, that it may be very challenging (or impossible) for a VDR (Verifiable Data Registry, aka DID ledger) to determine whether a service endpoint is "public" or not.

The owner of the DID determines this, not the DID ledger (nor should it, I would argue), so I don't find this line of argument persuasive.

There's an implicit "typing" of service endpoints relative to whether or not people care about correlation here. If a VDR needs to accept service endpoints of "type" A and reject service endpoints of "type" B, but the VDR can't tell the difference, how would you resolve this problem? You may also say you don't care, you just want to use a DID Doc from a VDR. Well, there may not be such a VDR without solving this problem -- or the VDR you've chosen may get sued into the ground after you started using it and you'll be quite grumpy.

A ledger is not the place where adjudication of purported types is resolved, that is always going to be in a less resource constrained system that has more latitude to evaluate assertions based on evidence that can be computed ad hoc. The ledger is the place for key awareness, routing, and type declaration - on the latter point, it's about efficient global sorting in the aggregate sense, not assertion validity evaluation, which is not a singular, universal, globally shared test anyway.

I want to see a solution here that addresses these issues. Ignoring them or saying they can't be discussed unless you delete your Twitter account -- while entertaining -- is missing the point. I also don't want to see a solution that furthers the kind of centralization problems we've seen in the past. Of course, this may mean leveraging _more_ places to express service endpoints, not fewer. Note that that's a _decentralized_ mechanism for solving this problem, not a _centralized_ one.

I am not reacting in this way to oppose any entity/implementer deciding to not use Service Endpoints, my opposition is strictly contained to spec changes and normative language that negatively impacts these features such that it hinders other entities/implementations from utilizing them.

All: I think it would be most helpful to go through a number of concrete use cases around service endpoints to determine how they might be solved using service endpoints expressed in VDR-backed DID Documents vs. alternative approaches.

Use cases: decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

@dlongley : providing a link is a bit challenging, because knowledge about the question exactly as you framed it is scattered through numerous documents. The best single doc I can offer is here. This covers about 70% of your question. I will attempt a summary here that is partly redundant with that doc, and that fills in some gaps.

First, it's important to understand that, because DIDComm is not API-centric, it doesn't need a different endpoint for every service or protocol it exposes. The DIDComm community is assuming that a party usually needs only one DIDComm endpoint (per transport) no matter how many services they intend to offer. (The "per transport" note is just to acknowledge that if you want to speak DIDComm over http, smtp, AMQP, BlueTooth, and sneakernet, those may be different endpoints -- but you don't need different ones for credential issuance, verification, and so forth. Those are all just protocols running over a single endpoint.)

Now, a DIDComm endpoint has baked into it the potential (but not the requirement) for routing. Routing is done by a mostly untrusted mediator that has its own encryption keys. If Alice is talking to Bob, and Bob is using a mediator, then Bob's service endpoint will be hosted by the mediator. Thousands or millions of other parties can (should) have exactly the same service endpoint. The URI for the endpoint has no query string and nothing in its domain name that identifies Bob in any way. Alice places her plaintext message (let's call this M[0]) inside an encryption envelope that only Bob can open. Let's call the encrypted result M[1]. Then Alice places M[1] inside an encryption envelope that only the mediator can open. Let's call that encrypted result M[2]. The encrypted header of M[2] tells the mediator what Bob's DID is. Bob and the mediator have previously arranged for the mediator to forward messages for Bob's DID to Bob. (There's a DIDComm protocol they can use, if they want -- or they can do it any proprietary way they like, since it doesn't have to be interoperable.)

When the mediator receives the message M[2], it opens the outer encryption envelope and peers inside. It sees that the encrypted inner message is intended for Bob's DID. It then forwards M[1] to Bob. How it does this is never publicly known; it is a private arrangement between Bob and the mediator.

In order for Alice to know that she must do the double wrapping required by Bob's mediator, the service endpoint for Bob needs to contain an ordered list of the keys (or DIDs that let her look up keys) that she has to use when encrypting for Bob's route. Thus we have a serviceEndpoint declaration with a routingKeys field that might contain: [<DID or key of Bob's mediator>]. A route that uses one mediator will have one entry in this array; a route that uses two mediators will have two, etc. (Why you'd want two mediators is beyond scope here; suffice it to say that either one or two might be common, but anything more than two will not be.)

Now, note the properties I've just described:

  1. There is no identifier for a recipient embedded in the service endpoint, and it is not transmitted as plaintext anywhere (in HTTP headers, in a POST body...) either. No eavesdroppers can learn anything.

  2. The serviceEndpoint section of Bob's DID doc #1 would be identical to that section in Bob's DID doc #2...N, and to the endpoints of all customers of the same mediator.

  3. The mediator knows that they have a message to give to Bob's DID, but they don't necessarily know who it's from, and they don't know anything about the message except the size of the encrypted BLOB. The mediator does not know the content of Bob's DID doc. Bob's DID doc can be pairwise; it doesn't have to be on a ledger.

  4. There are two abuses that a mediator could perpetrate: they could record all the times and the sizes or encrypted content of all inbound messages for Bob, and they could fail to forward messages (selective or total delete).

Given this, we believe the requirements for GDPR compliance of the endpoint are:

  • If the endpoint is directly owned/maintained by the DID controller, no requirements (there is no separate processor of data; all control resides with the DID controller, so GDPR is irrelevant). This is not the case I described above, but I mention it just for completeness. We know this condition obtains when the endpoint has no routing keys.

  • If the endpoint is mediated (which we can detect because there are one or more routing keys for the endpoint), then the mediator becomes a data processor, and their duty is to A) faithfully deliver messages; and B) delete all data and metadata about messages after they are delivered. In cases where duty B is nuanced in some way, this should be clearly specified in the terms and conditions that were worked out when Bob and his mediator negotiated services. (The DIDComm protocol that does this has a place for that.)

Now, you asked how the outside world can know that Bob's endpoint is GDPR-compliant. I would like to point out that this is far less interesting than how BOB knows that his service is GDPR-compliant; in fact, I'm not even sure the outside world's question is legitimate. We send one another emails all day long without knowing whether the email service used by the recipient is GDPR compliant. It's none of our business; all we need to know is that the person we're attempting to contact has asked us to hand off the data to a particular mail transfer agent, and is apparently satisfied that that MTA will do the right thing.

But if we really have to have a way for the outside world to know an endpoint has this property, we could add it by simply adding a gdpr-compliant property inside the serviceEndpoint data model. This would be self-attested by the DID controller, and I think that's both clear and plenty good.

I would like to point out a fundamental misalignment that permeates this thread. @msporny is approaching service endpoints from the standpoint that the goal of putting them into a DID document is to communicate a place to talk. I don't agree that this is an accurate summary of the goal. I would say that the goal of putting them into a DID document is to communicate a place to talk such that the communication is known to emanate from the DID controller, and such that the key material in the DID doc is known to apply to the associated endpoint in crisp, indivisible version evolution. That is, I want to be able to say that DID doc version X bundles a key state + an endpoint state, and version Y bundles a different key+endpoint state; I don't want them to be able to evolve independently. The "such thats" are very important to me, and I haven't yet seen any proposal that accomplishes these goals other than one of putting the endpoint in the DID doc. Manu has suggested that we need to explore alternatives. I'm totally fine with that -- but I'm only interested in alternatives that include my "such thats." Everything else is abandoning a vital security and control requirement of the system, IMO.

@dhh1128, is your "such that" framing for a optional but normative notification serviceEndpoint type the same idea as what I proposed above https://github.com/w3c/did-core/issues/382#issuecomment-683085862 except we substitute "notification" where I had "PDP" for the type and substitute "DIDComm" where I had "UMA2 and pending GNAP practices" for the data model?

As for @csuwildcat Use Case:

decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

I'm confused by the inclusion of both "intended-public info" in the same use case as "or ad hoc encrypted direct sends...". Can we deal with these separately?

The intendedPublic serviceEndpoint type does not benefit from access control but may benefit from checks on authenticity. We should be able to craft a normative data model for this optional serviceEndpoint.

The "ad hoc encrypted" serviceEndpoint type will require something like the "PDP" serviceEndpoint type where the other "entity" can provide some claims, endpoints, and encryption keys.

@csuwildcat,

I don't buy this argument at all - a Service Endpoint can contain a decentralized protocol URI.

Which one? Which one(s) will the VDR permit? Will there be a centralized allow list for the ones that are permitted? How will the URI handle herd privacy? After all of these questions are answered, could it be that you should have just asked that other decentralized network directly for a VC signed by one of the DID's keys?

The owner of the DID determines this, not the DID ledger (nor should it, I would argue), so I don't find this line of argument persuasive.

Then you don't understand the core problem I'm trying to highlight. The VDR/DID method gets to decide what will be accepted in a DID Document. This is related to the GDPR/privacy problem of what kind of information is allowed onto an immutable ledger.

I am not reacting in this way to oppose any entity/implementer deciding to not use Service Endpoints, my opposition is strictly contained to spec changes and normative language that negatively impacts these features such that it hinders other entities/implementations from utilizing them.

I also want to make sure we have a healthy ecosystem that can leverage service endpoints. All of these issues are interrelated.

Use cases: decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

Please describe a single user story that is specific and concrete for people in this thread to talk about. I think the above is too abstract to help move the needle.

@dhh1128,

Thank you for your response, there's a lot of good information in it. I'm going to try and focus down to the specific problem with expressing information on an immutable VDR.

But if we really have to have a way for the outside world to know an endpoint has this property, we could add it by simply adding a gdpr-compliant property inside the serviceEndpoint data model. This would be self-attested by the DID controller, and I think that's both clear and plenty good.

I think my question was unclear because it was interpreted to be talking about whether or not the service behind the endpoint itself was GDPR-compliant. Rather, I'm looking for a way to know whether or not the service endpoint itself, the URL, has PII in it. And I'm not talking about incidental PII or information that is intentionally encoded in some abusive way to circumvent the feature of expressing non-PII information in a DID Document.

As an example, how does a VDR distinguish this:

https://danielhardman.com/my-personal-handle

From something like this:

https://public-company.com/foo

From something like this:

ipfs://fl3hf4kjh4fk3f/fhjl2fjlk23f32f/23423

The first URL having implicitly human-meaningful identifiers baked into for a private party, the second having implicitly human-meaningful identifiers baked into it for a public party, and the last having no human-meaningful identifiers baked directly into the URL.

Could you provide an example DIDComm herd-privacy mediator URL? What would it look like? From your linked article I found this: http://agents-r-us.com/inbox. Is that a good example?

If the endpoint is directly owned/maintained by the DID controller, no requirements (there is no separate processor of data; all control resides with the DID controller, so GDPR is irrelevant).

Does this statement mean that you also believe that an immutable VDR can permit a DID controller to put any PII they want to into into a DID Document -- and there would be no "right to be forgotten" issues?

Another side question:

The serviceEndpoint section of Bob's DID doc #1 would be identical to that section in Bob's DID doc #2...N, and to the endpoints of all customers of the same mediator.

How many of these mediators do you expect to exist in the ecosystem?

Rather, I'm looking for a way to know whether or not the service endpoint itself, the URL, has PII in it.

Ah. Yes, you're right; I misinterpreted the question.

I know of no way to inspect a raw URL and conclude with certainty that it does or doesn't contain PII.

You seem to be poking at whether putting into a DID doc a service endpoint with PII in it alters the GDPR analysis, as if the service endpoint is the locus of the risk. I don't think this implication is correct, because a DID value on its own is PII. If you can write a personal DID doc to a ledger at all, you have a GDPR problem, whether or not you include a PII-containing service endpoint in it. This is why I assumed the other interpretation of your question.

Could you provide an example DIDComm herd-privacy mediator URL? What would it look like? From your linked article I found this: http://agents-r-us.com/inbox. Is that a good example?

Yes, that's a reasonable example. It could also be https://myisp.com/didcomm or https://myuniversity.edu/students or whatever. (HTTPS is not strictly required for security properties, but there are some benefits to it, such as the fact that mobile apps will pass review by app stores if they only make HTTPS calls.)

Does this statement mean that you also believe that an immutable VDR can permit a DID controller to put any PII they want to into into a DID Document -- and there would be no "right to be forgotten" issues?

No. Each VDR has to solve this problem. The first Indy/Aries solution to this problem is to use peer DIDs, which are never written to a ledger in the first place, and to use ZKPs for VCs, which don't require a binding to a public DID. Building on that, Sovrin's next solution to this problem is to decompose full DID docs into individual sections that have more specialized data models and their own transaction types. This makes them amenable to careful validation. That may filter some obvious stuff (query strings with DID values in them), but it will not fix the deeper problem in your 3-part example. Its next proximate solution to this problem is to require each writer to the ledger to include with their write a signature over a Transaction Author Agreement (essentially terms and conditions that clarify that no PII is allowed, and that by writing the data, any claim of right to be forgotten are explicitly forfeit). That probably will limit problems significantly, but it may not be enough in the end. Sovrin's final solution is to support a tombstoning mechanism that can be applied on a per-node, per-jurisdiction basis, such that read requests of a tombstoned record cause the semantic equivalent of an HTTP 451 error, yet the ledger's integrity, and the ability to forge consensus by nodes in different jurisdictions, is maintained.

Note that I deliberately said "Sovrin" in the preceding paragraph. Other Indy ledgers may choose to layer their own solutions on top of the peer DID strategy (or a different DID strategy), according to the governance they choose. Non-Indy ledgers each have to solve it, also. I'm not aware of a good solution yet for Bitcoin and Ethereum.

How many of these mediators do you expect to exist in the ecosystem?

The answer here will vary by time. Aries includes an Apache-2-licensed impl of one, and there are currently several SaaS vendors in production, who've got an interoperable wallet scheme to prevent vendor lockin... In the youth of the ecosystem, dozens or hundreds? Eventually, I'd say they will be offered by a meaningful % of ISPs or email providers and will have a long-tail distribution of customer counts like mail transfer agents -- so maybe tens of thousands, with a small handful supporting herd sizes in the billions or millions?

@agropper :

is your "such that" framing for a optional but normative notification serviceEndpoint type the same idea as what I proposed above #382 (comment) except we substitute "notification" where I had "PDP" for the type and substitute "DIDComm" where I had "UMA2 and pending GNAP practices" for the data model?

I'm not sure. I don't think DIDComm is a "notification" service endpoint type; I think it's a service endpoint type all its own. It can be used for anything that DIDComm can be used for, which is any message-based interaction (protocol) that wants to inherit DIDComm's security and privacy guarantees and processing model. I also don't know enough about UMA2 and GNAP to feel confident about the analog.

@dhh1128,

You seem to be poking at whether putting into a DID doc a service endpoint with PII in it alters the GDPR analysis, as if the service endpoint is the locus of the risk. I don't think this implication is correct, because a DID value on its own is PII. If you can write a personal DID doc to a ledger at all, you have a GDPR problem, whether or not you include a PII-containing service endpoint in it. This is why I assumed the other interpretation of your question.

A DID on its own does not necessarily identify a person. This depends on its use outside of the VDR. However, a URL that includes a person's full name identifies a person, all on its own.

@csuwildcat -- Please take a look at @dhh1128's comment. He covers Sovrin's view of putting PII onto a VDR and all of the problems there. This is the sort of thing I've been trying to highlight as a problem for the case you want supported.

@dlongley :

A DID on its own does not necessarily identify a person. This depends on its use outside of the VDR. However, a URL that includes a person's full name identifies a person, all on its own.

A DID that has as its subject a person is PII, according to legal experts who've studied PII+GDPR+SSI carefully. (Or perhaps more precisely, experts I've talked to say that they believe legal rulings will eventually formalize this legal conclusion.) The fact that some DIDs have subjects that aren't individuals is irrelevant. Putting a DID that identifies a person onto a public ledger is putting PII onto that ledger, even if it is not obvious to an outside observer that the DID in question has an individual as its subject. Obviousness is not a definitional criterion of PII, and does not eliminate the right-to-be-forgotten requirement.

@csuwildcat,

I'm pretty sure @dhh1128 is mostly advocating in this issue for service endpoints in did:peer DID Documents, which is a separate case from putting service endpoints directly on a VDR -- which, I believe, is what you want.

I'm pretty sure @dhh1128 is mostly advocating in this issue for service endpoints in did:peer DID Documents, which is a separate case from putting service endpoints directly on a VDR -- which, I believe, is what you want.

True. Well, sort of. I want service endpoints in the spec because A) I want institutions to publish their endpoints in their DID docs; and B) I want private individuals to put their endpoints in peer DID docs.

Daniel B's case of individuals publishing an endpoint for a public DID on a public ledger for discovery purposes is one I've thought less about. I do believe in individuals having public DIDs, and putting endpoints in the associated DID docs -- but I don't believe that requires a ledger. Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, [[edit: and in lots of other places]]). They still have all the characteristics of security and control you need, but they don't incur any right-to-be forgotten issues if the individuals publish them in places they control.

@dhh1128,

A DID that has as its subject a person is PII, according to legal experts who've studied PII+GDPR+SSI carefully. The fact that some DIDs have subjects that aren't individuals is irrelevant. Putting a DID that identifies a person onto a public ledger is putting PII onto that ledger, even if it is not obvious to an outside observer that the DID in question has an individual as its subject. Obviousness is not a definitional criterion of PII, and does not eliminate the right-to-be-forgotten requirement.

I understand that this is your position. It's not settled yet -- and until it is, there are possible interpretations that split the information into two separate classes. There are also a number of exceptions to the "right to be forgotten" for which this difference might be important either on its own or in conjunction with the function of or governance/authority structures for a particular VDR. So, there remain open questions. It's harder to make the case for any difference, however, when your full legal name is explicitly called out in a DID Doc as merely additional information. This is in contrast to other "authoritative data" in the DID Doc including the DID itself and public key material that can be more readily linked to legal purposes and public interest, etc.

@dhh1128,

I do believe in individuals having public DIDs, and putting endpoints in the associated DID docs -- but I don't believe that requires a ledger. Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, etc). They still have all the characteristics of security and control you need, but they don't incur any right-to-be forgotten issues if the individuals publish them in places they control.

Yes, but this approach is what @csuwildcat is railing against as being insufficient for his use case (which we still need to get more concrete about).

Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, etc).

Guys, please reread this and consider how it is explicitly failing to solve for the needs I have. To help, I will restate the comment above in the scope of the use case: "Dan, you can create decentralized social networks, decentralized secondhand sales networks, decentralized gig economy exchanges, etc. that don't require centralized intermediary services, like Twitter, Craigslist, and Uber, by creating unregistered, uncrawlable DIDs, and simply attaching them to your Twitter, LinkedIn, and Uber accounts"

GDPR aside, putting a DID that refers to a person on a public registry is
problematic for the same reason putting their Social Security Number or
their facial biometric on a public registry is problematic. In all three
cases, "rotating" the identifier is difficult, or in the case of
biometrics, impossible. That means that we need to consider how DIDs, SSN,
and biometrics are used rather than just worrying about the right to be
forgotten.

DID documents are meant to be updated with rotations but the whole point is
that the DID itself is forever whether it's on a public registry or held
among peers. Otherwise, did:key works for cases where updates are not
needed.

  • Adrian

On Sun, Aug 30, 2020 at 12:49 PM Daniel Buchner notifications@github.com
wrote:

To me: "The way you eliminate centralized platforms as the bottleneck for
censorship, interdiction, and gatekeepery is by making it so people still
have to crawl, index, and triangulate intended-public DIDs and
intended-public data through those same centralized bottlenecks of
censorship, interdiction, and gatekeepery"

Me:

[image: external-content duckduckgo]
https://user-images.githubusercontent.com/131786/91664717-deff5400-eaa5-11ea-8620-6ffd37e8bea9.gif

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-683443275, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YL6BJQ5FPBPGQEDQB3SDJ7KJANCNFSM4QOKAB3A
.

@csuwildcat : I don't think your sarcastic memes are appropriate for this community.

My comment that a DID could be published on Twitter or Facebook is quite different from saying that DIDs must be published on those platforms. It goes without saying that what can be published on social media can also be published in any other convenient way: by putting it on slides at a conference, by attaching it to your github profile, by putting it on your physical or digital business card, by listing it next to your name in your professional publications, etc. My point was that there are plenty of ways of publishing a DID besides ledgers; I'm sorry that my examples threw you for a loop.

Now, you clearly believe that the specific other ways I cited (Twitter, FB) are undesirable because they're centralized. But supposedly decentralized ledgers do not have magical pixie dust that makes everything they touch decentralized, and supposedly centralized, proprietary systems do not have magical demon dust that makes everything they touch centralized. A single global ledger where all discovery is conducted is centralized. Its permissioning and method for accepting commits may be decentralized, but it is not decentralized in the patterns of its reads. Likewise, a system that includes optional recording of DIDs in centralized platforms is not centralized if it also records the data in many other places. I was not proposing that we give an exclusive franchise to Twitter and Facebook to record public DIDs; that's the only distortion of what I said that deserves the scorn you projected.

Without a decentralized substrate for globally iterable ID/routing info that can be assembled without third-party reliance, you cannot deterministically locate all entities who wish to be included in exchanges without requiring specialized setups, third-party interdiction points, or ad hoc out-of-band coordination. I have no issue with entities who want to rely on centralized parties or resign themselves to sharing their information through indirect mechanisms, I'm simply opposed to anything that would force more barriers, centralized intermediaries, or friction on entities who desire to participate in an open broadcast substrate that is as decentralized as possible.

@csuwildcat I may have missed it, but could you respond to this question about your use-case? https://github.com/w3c/did-core/issues/382#issuecomment-683330079

@csuwildcat :

Without a decentralized substrate...

I largely agree with this statement, but:

  1. I disagree that perfect enumeration is a requirement from the people who want to use public DIDs. They are fine with ad hoc coordination, because it's easy to get their data visible to the parties they want to connect with (just as it's easy to publicize your email address if you want to). The problem arises when external data consumers want to index/crawl/access all of this chaotic data universe of personal data as if it were a coherent corpus that they can mine. That is NOT a requirement of the person who wants a public DID; it's a requirement of those who want to consume those DIDs. And I don't buy it.

  2. Even if we disagree about #1, and accept your proposed requirement that there must be a single place where global discovery can occur, I disagree with your assumption that this means we need a ledger. I shared a way to do privacy-preserving, decentralized discovery/enumeration without a ledger in a previous CCG discussion. It doesn't have the GDPR problem, it is more private than a public ledger, and whatever data you choose to publish is discoverable as long as you want it to be, and not one instant longer. See this doc.

@csuwildcat I may have missed it, but could you respond to this question about your use-case? #382 (comment)

I honestly feel like we need a special topic call just to go over the decentralized app concepts in general, because I feel like this recurring question is a symptom folks who may not yet see past the 1% of identity that is credentials. But if we can't do that, here is the 'use case' (aka: entire world of use cases it represents): We need a system capable of decentralizing the vast majority of apps you have on your phone today. Part of doing that is having an open, decentralized, direct, uninterdictable crawl substrate, whereby any developer can write app code that iterates the substrate and asks the DID entities on it, whatever they are, for certain types of data they may want to share. You can do this by iterating the global DID registry, finding personal datastore endpoints, and sending a request for whatever data you would like from the entity that owns the DID, for example:

  • Want to have a vibrant, open substrate for ingesting a firehose of all the social posts everyone on the planet intentionally wants to make public for anyone to see? Easy peasy lemon squeezy! --> Crawl all DIDs on the decentralized DID substrate and ask for any SocialMediaPosting objects they would like to share. Boom, no longer need to go through social media company silos to access the world's social media feeds!
  • Want to find all the code packages you could possibly use for your next Node.js project without having to go through a centralized registry? No problem, we gotchu fam! --> Just iterate the DID substrate for all the IDs typed as software projects, and contact their personal datastore to find all their signed SoftwareSourceCode packages. Can you smell the sweet, sweet code package liberation? I can.
  • What's that? Find all the public resumes and other career-related posts in the world? Get a load of this --> Crawl that DID substrate and ask all the personal datastores of all the DIDs if they have any of career data they'd like to share. No more silo for career data, booyah!
  • Want to get the product catalogs for all the companies on the planet? No sweat, you can do it with your eyes closed --> just contact all the personal datastores of the IDs typed as companies of some kind and ask them for any public GS1 Product objects they'd like to share. It's time to make Google's centralized product index look like a cute little toy.

Basically, for any app type that features a need for an open substrate of intended-public data that participants want anyone to be able to find, the same exact same recipe holds. With this we can radically, fundamentally change the entire app and open data ecosystem, putting control back in the hands of individuals, while empowering developers by eliminating many of the barriers that are present in today's world of walled content gardens and information network silos.

Thanks. So, based on the example you give, what is the nature, if any, of
access control to this public information?

In the same vein, what prevents centralized actors from collecting all of
the public information they can get and then adding even more information,
leaked under the common exemptions for supposedly de-identified personal
data that we find in HIPAA, CCPA and pretty much every other regulation. In
healthcare, this kind of involuntary but legal surveillance even has a
name: “referential matching”.

A further problem is that many so-called privacy laws explicitly avoid
regulating “public information” as a restriction of 1st amendment rights.

  • Adrian

On Sun, Aug 30, 2020 at 11:28 PM Daniel Buchner notifications@github.com
wrote:

>
>
>
>

@csuwildcat https://github.com/csuwildcat I may have missed it, but
could you respond to this question about your use-case? #382 (comment)
https://github.com/w3c/did-core/issues/382#issuecomment-683330079

I honestly feel like we need a special topic call just to go over the
decentralized app concepts in general, because I feel like this recurring
question is a symptom folks who may not yet see past the 1% of identity
that is credentials. But if we can't do that, here is the 'use case' (aka:
entire world of use cases it represents): We need a system capable of
decentralizing the vast majority of apps you have on your phone today. Part
of doing that is having an open, decentralized, direct, uninterdictable
crawl substrate, whereby any developer can write app code that iterates the
substrate and asks the DID entities on it, whatever they are, for certain
types of data they may want to share. You can do this by iterating the
global DID registry, finding personal datastore endpoints, and sending a
request for whatever data you would like from the entity that owns the DID,
for example:

  • Want to have a vibrant, open substrate for ingesting a firehose of
    all the social posts everyone on the planet intentionally wants to make
    public for anyone to see? Easy peasy lemon squeezy! --> Crawl all DIDs on
    the decentralized DID substrate and ask for any SocialMediaPosting objects
    they would like to share. Boom, no longer need to go through social media
    company silos to access the world's social media feeds!

  • Want to find all the code packages you could possibly use for your
    next Node.js project without having to go through a centralized registry?
    No problem, we gotchu fam! --> Just iterate the DID substrate for all the
    IDs typed as software projects, and contact their personal datastore to
    find all their signed SoftwareSourceCode packages. Can you smell the sweet,
    sweet code package liberation? I can.

  • What's that? Find all the public resumes and other career-related
    posts in the world? Get a load of this --> Crawl that DID substrate and ask
    all the personal datastores of all the DIDs if they have any of career data
    they'd like to share. No more silo for career data, booyah!

  • Want to get the product catalogs for all the companies on the
    planet? No sweat, you can do it with your eyes closed --> just contact all
    the personal datastores of the IDs typed as companies of some kind and ask
    them for any public GS1 Product objects they'd like to share. It's time to
    make Google's centralized product index look like a cute little toy.

Basically, for any app type that features a need for an open substrate of
intended-public data that participants want anyone to be able to find, the
same exact same recipe holds. With this we can radically, fundamentally
change the entire app and open data ecosystem, putting control back in the
hands of individuals, while empowering developers by eliminating many of
the barriers that are present in today's world of walled content gardens
and information network silos.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-683530573, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YOFBYP7HURUKZX7T7TSDMKGRANCNFSM4QOKAB3A
.

In the same vein, what prevents centralized actors from collecting all of the public information they can get and then adding even more information

Anyone can access a public dataset that the entities are intentionally putting out to the world. This question is like asking "What prevents Google, DuckDuckGo, RSS feed viewers, web browsers, etc. from collecting/rendering your openly published blog posts to anyone who happens to hit a URL?" - because that's entire the point of some data: to be broadcast openly as widely as humanly possible, and encourage anyone interested to come read it, shape it, display it, etc.

The DID substrate is a decentralized, interdiction-resistant, tamper-evasive secure routing system that allows peers to connect to whatever semantic information broadcasts an entity wants them to see - it's a world engine for decentralized information networks that can power a new class of decentralized apps and services.

Few Understand Thisℱ

@csuwildcat and everyone in this discussion would do well to read https://blog.apnic.net/2020/08/31/rfc-8890-the-internet-is-for-end-users/ and consider the consequences of giving even more access to surveillance capitalists. What we need is technology that forces more transparency and control over personal data flows, not less.

@agropper I continue to be baffled that people are arguing I should not publicly, openly post blog entries, tweets, resumes, etc. to anyone who may want to read them. These use cases are fundamentally ones where you want anyone to be able to openly access the content, and do so as freely as possible, without interrogation or gatekeeping, but you're suggesting this is wrong, which defeats the entire point of these use cases. I can't seem to understand what you are presenting as the alternative? We have everyone go through an authorization server before they can read my blog posts, which I don't want or need anyone to be authorized to read? Or is it that you are somehow confusing all data with this type of data I am talking about? Surely you can see where blog posts are different than my private medical data, and that I would gate one, but not the other, surely?

@csuwildcat What I'm proposing is illustrated by the quandary public website operators face with Google Analytics. As a publisher, I want to know as much as possible about who is interested and, if I could, why. A few privacy-resepecting website operators pay good money to avoid Google Analytics. Others just publish blind rather than expose the "requesting parties" to surveillance.

So, yes, as an individual, I want my self-sovereign authorization server to play the role of Google Analytics, without the centralization. That's the essence of SSI as far as I'm concerned.

@agropper I was just able to access your public posts and tweets (http://healthurl.com/www/Blogs_+.html, https://twitter.com/agropper) without any authorization required. These appear to be publicly available resources you want anyone to be able to openly view, is that correct?

@csuwildcat : I feel like I can see both your perspective and that of @agropper . I get that people want to publish stuff and operate publicly. But the other fundamental requirement is that people also want to be able to change their minds. You were operating publicly when you posted a cranky meme in a comment above; you later deleted that meme when I complained about it. This was possible because github provides an "edit" feature -- something that an immutable blockchain does not support, and which you are demanding we de-prioritize even as you use the feature yourself in a public forum.

As long as "operating publicly" is centralized, there's a way to enforce the possibility of "change their minds" (mainly through legal threats). But as soon as you decentralize, the path for protecting this second requirement becomes cloudy. I proposed one possible answer above, but it doesn't have the same crawlability as yours.

Now, I don't advocate a centralization strait-jacket. My suggestion that people could publish something on Twitter if they want wasn't a suggestion that we ignore other alternatives that are better. Where you and I disagree is on the relative value of a centralized crawl. I say it's not very important to individuals; I'd be content if I could create peer DIDs off ledger (the ultimate decentralization) and publish/unpublish in whatever systems I like (centralized or not). That way, I am responsible for the GDPR implications, and I can choose whatever tradeoffs I like.

It seems that what you are hoping for is a single solution that works for everybody. I don't think there is such a thing, and I think you're prioritizing the needs of corporate data consumers over the needs of individual data producers. All of the coolness you touted in your bulleted list above ("ingesting a firehose of all the social posts everyone on the planet intentionally wants to make public"... "code packages"...) is coolness for an indexing/crawling service. Individuals may benefit from such services, but unevenly and imperfectly; it's great to be able to post publicly on FB, until you want to not have your future employer look up your immature behavior in college. The one-size-fits-all-and-once-public-we-get-to-index-it-forever approach necessitated by global crawlers perceives individual customizations as friction to be ground down and eliminated.

Individuals also don't typically run such crawling services directly. I have better things to do with my time and resources than discover the world's social media posts. So who will do it? Answer = an institution that becomes a new point of centralization. A list of all nodejs packages available worldwide, built by crawling a decentralized landscape, is still a centralized list; effectively it's not much different from npmjs, and as a developer, I'd rather consume the curated version.

@csuwildcat : I feel like I can see both your perspective and that of @agropper . I get that people want to publish stuff and operate publicly. But the other fundamental requirement is that people also want to be able to change their minds. You were operating publicly when you posted a cranky meme in a comment above; you later deleted that meme when I complained about it. This was possible because github provides an "edit" feature -- something that an immutable blockchain does not support, and which you are demanding we de-prioritize even as you use the feature yourself in a public forum.

Why are folks trying to tell me about how blockchains are immutable and you can't delete things, while websites have edit features to delete things from databases? Are folks somehow getting confused and thinking I am saying this data should/will be present on a blockchain or within some immutable infra layer in a DID Method? If so, I am not doing that at all, and I am struggling to see how I can make this much more clear, given I have consistently said the DID Document would only route to personal datastore endpoints. Your personal datastore will allow for the exact flow as you described with Github: if I delete any portion of data that was once exposed publicly, it is gone.

@dhh1128 nails it when proposing that individuals should control where our public information is indexed. In the secure data stores context, I advocate for indexes to be treated like any other data resource and kept separate from the documents and streaming interfaces who's metadata is being aggregated and indexed. I would treat both indexes and storage as policy enforcement points (PEP) to be told, in a self-sovereign and un-censorable way where the data subject keeps their PDP.

@csuwildcat, my website you posted displays exactly the problem. Privacy Badger and Disconnect plugins to my Firefox each display only 1 tracker. Guess what, it's Google Analytics. Given that I am still lamely using a web publishing editor that was discontinued by Apple 11 years ago, it's "too hard" for me to get rid of that last tracker. It's up to us in W3C to fix this.

We absolutely must have a topic call on this, because what I thought was a relatively straightforward thing has not been understood as I thought it would. It's clear that the vast majority of folks on it are focused on the 1% of identity that is core ID/credential type data/claims - which is fine, while others are looking to tackle things outside of that. I don't particularly care what folks work on, so long as they don't hinder the work/needs of others. We need to take the time to better understand each other over a topic call because this is such an important thing to get right, else we will be left with a Web that looks a lot like it does today, which would be a tragic lost opportunity.

@dhh1128 nails it when proposing that individuals should control where our public information is indexed. In the secure data stores context, I advocate for indexes to be treated like any other data resource and kept separate from the documents and streaming interfaces who's metadata is being aggregated and indexed. I would treat both indexes and storage as policy enforcement points (PEP) to be told, in a self-sovereign and un-censorable way where the data subject keeps their PDP.

@csuwildcat, my website you posted displays exactly the problem. Privacy Badger and Disconnect plugins to my Firefox each display only 1 tracker. Guess what, it's Google Analytics. Given that I am still lamely using a web publishing editor that was discontinued by Apple 11 years ago, it's "too hard" for me to get rid of that last tracker. It's up to us in W3C to fix this.

None of this makes any sense. You can run your personal datastore wherever you want, and I am not sure why you would install a tracker on your own PDS. You seem to be talking about something that does not apply to the service endpoint routing layer.

@csuwildcat It's not about my PDS. It's about my ability to decide on access to any data store, even public ones, like LinkedIn 2.0 where, along with my user authentication, I could register my PDP, so that LinkedIn MUST refer you or anyone else that wants to see my posts, would first have to visit my UMA or GNAP Authorization Server and get an authorization token. Right now, I have to depend on LinkedIn to interpret and implement my policies. This is exactly what I'm trying to decentralize, make self-sovereign, and fix.

Why wouldn't LinkedIn offer me this feature?

@csuwildcat It's not about my PDS. It's about my ability to decide on access to any data store, even public ones

You encrypt data and set PDS permissions to accomplish exactly this, and you don't need to contact another server, you just get permission/encrypted access to things from the owner of the PDS themselves. I feel like I am in the Twilight Zone right now.

@csuwildcat How would this work with LinkedIn without damaging any of its current significant value propositions? Can you explain the steps that LinkedIn and I would implement?

Why are folks trying to tell me about how blockchains are immutable and you can't delete things, while websites have edit features to delete things from databases? Are folks somehow getting confused and thinking I am saying this data should/will be present on a blockchain or within some immutable infra layer in a DID Method? If so, I am not doing that at all, and I am struggling to see how I can make this much more clear, given I have consistently said the DID Document would only route to personal datastore endpoints. Your personal datastore will allow for the exact flow as you described with Github: if I delete any portion of data that was once exposed publicly, it is gone.

The URI of your personal datastore, if PII (e.g., https://mydatastore.com/csuwildcat), cannot be erased from the immutable ledger. What it exposes can be changed -- but if the ledger stores that URI and supports a historical view, there is no way to delete that identifier for you. It is like carving your phone number in granite; sure, what you say on that phone line can change, but the number itself never can. And all data that you once published for it can be linked to the current version, even if that data is no longer visible. You have no "please delete this" recourse.

@agropper apps like LinkedIn can be designed to crawl for resumes and other public info people/companies intentionally expose to the world, and present that information in whatever UI they believe people will like to use. They could also build features in to 'follow' the DIDs of people/companies you are specifically interested in, which, inside the app, would just mean that the app goes to check that DID's PDS more often, and presents the data more prominently to you as the user. The beauty of this is that now any decent app developer can write their own LinkedIn-style application and create a different type of career experience, without being blocked by some centralized network silo.

The URI of your personal datastore, if PII (e.g., https://mydatastore.com/csuwildcat), cannot be erased from the immutable ledger. What it exposes can be changed -- but if the ledger stores that URI and supports a historical view, there is no way to delete that identifier for you.

First off, I never said the endpoints need to include identity data or human-friendly info, so I regard the first part about csuwildcat in the URL as a strawman that I will leave to whomever was arguing for that - because it certainly wasn't me. No personal info goes on the ledger, we all agree on this, and I really hope we don't have to constantly rehash this every time we talk about the 99% of the DID use cases that decentralized apps and services represent.

The notion that somehow a GUID appearing in a substrate === it is still active and represents you is one I simply reject. You can deactivate it, and it is now completely insecure and unreliable to assert that it represents someone. To illustrate, let's use the example provided:

It is like carving your phone number in granite; sure, what you say on that phone line can change, but the number itself never can.

A phone number != a Subject, a phone number is an ID that links you, the caller, to some entity who is supposedly the owner of a device it routes to. You can carve that number into stones on as many mountains as you'd like, but if I deactivate my cellphone and stop using that medium of communication, I don't know what the number connects to, but it sure as heck isn't me. Furthermore, in this DID case, I am also issuing an enforced directive to the phone network (DID Method) that ensures anyone who resolves that number is explicitly informed that it is no longer connected to me, and you'd be crazy to assume it still represents me.

The phone number == a subject when the element of time is included (it is not possible for a phone number to be allocated to more than one person at a single time). For the period of 2018-2019, the phone did represent you and this is something that cannot be undone. While in 2020 the phone number may no longer represent you, it does not change the fact that the phone number DID represent you and cannot be removed the granite stone.

The phone number == a subject when the element of time is included (it is not possible for a phone number to be allocated to more than one person at a single time). For the period of 2018-2019, the phone did represent you and this is something that cannot be undone. While in 2020 the phone number may no longer represent you, it does not change the fact that the phone number DID represent you and cannot be removed the granite stone.

The Subject the 'phone number' (DID) represents at any point is something that cannot be determined by the ledger/DID Method, it's a claim that another party makes about an interaction they took part in. This is a complete goalpost move of what this entire thread is about, because the fact remains that the DID Document, service endpoints or not, is not what will contain identity data and real world associations.

@msporny as you point out I did not formally object to the resolutions. I voted neutral (0) and simply emphasized that more discussion is needed. Given the amount of discussion documented in this thread, I don't seem to be far off base. I defer to the Chairs regarding how resolutions in special topic calls are handled and brought to the larger group for discussion. My understanding was these "resolutions" are non-binding and simply a straw man to document a broad view of consensus on the special topic call and that discussion would be summarized and conclusions brought back to the larger group for broader discussion. NOT that it served as a binding agreement that interested parties were required to object to under a specific time window. In this sense I think you overstepped your authority in making such an ultimatum. Quoting @iherman "We should remember that any resolution taken on this call is not binding, i’ts not the WG call 
 the WG call must decide"

The DIF Glossary Group would like to report out on some directly-relevant recent work. As the endpoint olympics were just warming up, the WG sent out a survey via various community channels about the nature of endpoints, their relative value, and their standardization. These are the results, in the form of a google spreadsheet open for comments.

The responses fell into 3 classes, based on how many (0, 1, 5+) endpoints they seemed to want standardized. We found different conclusions could be drawn from the divergence:

  1. “Notification” endpoints deemed important by many. It would seem the community needs more clarity on DIDCommv2 notification regime; there is also a persistent confusion around DIDComm v1 versus v2, routing and redirection, the scope of Aries, etc.
  2. We were not being clear about what protocols a given service can be assumed to speak or refuse to speak, what transports would be allowed.
  3. The authorization,mediator, and secure data store endpoints are seen as generally desirable.
  4. The No Endpoint option is confusing and led to a nice discussion here <-- (you are here)

Potential next steps:

  • Reach consensus on a few normative data models to be included in the did-core spec.
  • Consider this post or other such meta-analyses of public/private distinctions at the DID-core level, as a potential consensus-structuring perspective on agents and wallets, particularly as related to the normative storage and authorization endpoints.

Thanks, and we hope this perspective is helpful!

@dhh1128,

The URI of your personal datastore, if PII (e.g., https://mydatastore.com/csuwildcat), cannot be erased from the immutable ledger. What it exposes can be changed -- but if the ledger stores that URI and supports a historical view, there is no way to delete that identifier for you. It is like carving your phone number in granite; sure, what you say on that phone line can change, but the number itself never can. And all data that you once published for it can be linked to the current version, even if that data is no longer visible. You have no "please delete this" recourse.

Yes, thank you. This.

@csuwildcat,

First off, I never said the endpoints need to include identity data or human-friendly info, so I regard the first part about csuwildcat in the URL as a strawman that I will leave to whomever was arguing for that - because it certainly wasn't me. No personal info goes on the ledger, we all agree on this, and I really hope we don't have to constantly rehash this every time we talk about the 99% of the DID use cases that decentralized apps and services represent.

I tried to make the problem clear above. This isn't about whether you're arguing for a human-meaningful URL -- that's great that you're not. It's about whether or not the immutable VDR can tell the difference between a human-meaningful URL and one that is not. Because the VDR needs to prevent the human-meaningful ones from being recorded. Please re-read that sentence -- it's the crux here.

The Subject the 'phone number' (DID) represents at any point is something that cannot be determined by the ledger/DID Method, it's a claim that another party makes about an interaction they took part in.

I agree that a random identifier that can only be made meaningful by connecting it to other pieces of information is different from one that implicitly contains human-meaningful data, such as your full legal name. One important difference is in who is in control of those other pieces of information, and whether the burden could potentially shift to them. A GDPR related ruling found that an IP address recorded by a German government website can be considered PII because the German government had the ability to demand an ISP reveal the person behind it. DID Methods generally don't have this sort of authority or capability, and this changes the calculus for certain identifiers, IMO.

Consider a DID method's immutable VDR that represents the canonical registry of DIDs for that method, and that only allows you to store DIDs and cryptographic material -- and no human-meaningful identifiers. Here, all other potential linkage happens externally from that system. This may be considered acceptable under certain legal regimes and, given a public interest in the need for the persistence of such identifiers, the fact that the only way to ensure this is to prevent their deletion may allow yet another avenue for a "right to be forgotten" exception.

However, these arguments are hard to make for identifiers that are not merely meaningful because of "linkage" and that do not originate from the DID method themselves, but are identifiers that themselves contain PII. Without some other line of argumentation justifying their presence, these should stay off of immutable VDRs.

Again, this means that an immutable VDR needs to either be able to determine the difference between a human-meaningful service endpoint URL or disallow all service endpoints. This is the problem I'm trying to highlight to you and you seem to be ignoring/overlooking/dismissing it.

One solution to this problem is to turn to a different decentralized registry (or protocol) that does not have the same immutability requirements. Such a registry could be a common place to look for VCs from DID controllers that express service endpoints. Such a registry could also support multiple DID methods (or potentially even be DID method agnostic). I know you have stated that you don't want "yet another registry" -- but I keep trying to highlight that the requirements are different for the different registries. That's why we may need "yet another (decentralized) registry/protocol" if the crawling use case is important. I think the non-crawling use cases can be handled in other ways as previously mentioned.

One solution to this problem is to turn to a different decentralized registry (or protocol) that does not have the same immutability requirements.

The use cases themselves demand the convergence of what you may view as competing requirements (I don't), thus moving the endpoints doesn't change anything for me. There are also a host of game theoretical issues with doing this, but I don't think it will profit us to go into them on this thread.

this means that an immutable VDR needs to either be able to determine the difference between a human-meaningful service endpoint URL or disallow all service endpoints. This is the problem I'm trying to highlight to you and you seem to be ignoring/overlooking/dismissing it.

No, it means I reject the assertion that this is what a DID Method must do and I reject the assertion that what is being prescribed here is the right course. The tech should remain open, flexible, and generative - we can haggle with other parties about how it is used in the meatspace venues where those debates belong.

I am fine if we just agree to disagree. Some folks here can use features others choose not to, and that's perfectly fine.

There will always be another registry... the question is, how much information in a decentralized public registry should be usable to correlate with private ones....

The more stuff we put in a VDR that is not public key material and deterministic transformations (hashes) of it.... the easier it is to create de-anonymized registries... which have value proportional to the use of the "pseudonymous" ones... consider that DHS would never have funded the monero tracing work if monero had no value.

I'm not opposed to people having bitcoin based dids with service endpoints, and i'm not opposed to people tweeting their home router IP address... I would not recommend it.... but maybe you are running a honeypot and want to catch the badies that crawl the public registry....

end of the day, we can't prevent people from having facebook accounts, tweeting private keys, or connecting to their dark market web server without TOR and VPN... sometimes we like when people make security mistakes, it makes them easier to hunt :)

I don't consider a data model a security mistake... describing what a private key is, does not create a vulnerability... in fact, security through obscurity (generally considered to be a bad thing) relies on NOT telling the attacker exactly how the system works...

However, there is a difference between security through obscurity and least privilege... if someone doesn't need to know a service endpoint, they shouldn't.... and if the whole world needs to know a service endpoint, thats ok too, but thats probably never true / lazy security engineering.... just make sure you know how to secure it, or pay a big cloud provider who understands security, enough money to make sure it is secured and available.

If you can't secure firearms, you cannot have them.... you would be a danger to yourself and everyone around you. The same applies to a did document with a service endpoint on public immutable VDRs.

my conclusions from this thread:

  1. we should define an abstract data model for services like we have for verification methods
  2. we should warn about them in the privacy and security considerations
  3. we should warn about them in an implementation guide (if one ever gets created... if not... good thing we are committed to 2).

I've taken some time to read through this. I was hoping there might be some movement towards a consensus--and I don't think I'm in the larger consensus on what I'm about to say--but I don't see one emerging. So, apologies for taking this even further afield...

First, I agree with Manu that service endpoints in the DID Doc are an anti-pattern. I have raised my privacy concerns before, with limited result.

Service endpoints impact privacy not just because people will inadvertently, or unwillingly be forced to, put PII in service endpoints. They will, and they will be.

They are a problem because of they correlate the Subject with specific, concrete services. Even a single service endpoint will inevitably lead to unintended discoveries. However, multiple service endpoints is even worse: we are not only providing a correlation between the Subject and these services, we are correlating these previously unrelated service endpoints with each other when we place them in the same DID Document.

While this is fine in a context where the user has explicit management over access authority, it is NOT fine when the architecture itself is designed to be publicly accessible. DIDs are useless if you can't resolve the DID, so there is a natural demand for DID Documents to be publicly resolvable. However, encouraging the publication of service endpoints through DID Documents is, by definition, encouraging systematic publication of correlatable information.

More importantly, it is a trivial matter to separate the identifier proofing mechanism of DIDs from the service discovery. To be clear, I totally get the value of the directory that handles service discovery. There is a reason that Google has a higher market cap that the entire DNS infrastructure industry combined. That value proposition--the directory--is dangerous both because it will encourage rent-seeking DID Method operators to force all of their DID Documents to support directory functionality out of the box and because it is an unsolved problem to do decentralized discovery. So, I get the perceived value in shoehorning directory capabilities into the DID Document. But it is 100% possible to separate the layer of DIDs, DID Documents, & the resulting provable control of an identifier without reliance on a trusted third party, from the directory capability that is so desired.

The most privacy preserving approach would be to create twice-wise unique DID for every accessor to every service endpoint. You want my phone #, I mint a DID that will let you reach me at a particular endpoint, which I can later disable at any time. You want my email address, I mint a different DID that lets you reach me at a different endpoint. There is no reason that you need to use the same DID that is on your driver's license to share your physical address.

If we don't build this architecture out to not only encourage that level of privacy, but to engineer the appropriate UX and technical safegaurds, then DID Method operators and DID users are going to take the easy way out, jam everything into the same DID Document and completely undermine the privacy that could have been possible.

The only use case that has been able to withstand this scrutiny is that of the portable data hierarchy. That is, the ability to have a DID that acts as the root of a hierarchical set of resources which can be moved from service endpoint to service endpoint without breaking URLs based on those DIDs.

THAT use case demonstrates a compelling argument for a single service endpoint which enables dereferencing to a portable, yet arbitrarily complex hierarchy of resources. But it does not support the requirement for MULTIPLE service endpoints. They only use cases that seem to support that have been directories or directory-like services. (Please correct me if I'm wrong).

So, I can support a compromise of a single service endpoint which itself serves as the point of authorization and consent for accessing actual end resources. That would allow this sort of portable resource hierarchy without undue burden.

DIDs can work without directories. They can also work with directories. There is no need to bake the directories into DIDs. Since this can be separated, I argue that they should be.

I'll go even more afield from consensus and point out that much of the privacy problems we have with DIDs is a result of our languaging and mental model that DIDs refer to a Subject.

Yes. The mistake we are making is framing DIDs as referring to a Subject.

DIDs are symbols. Labels. Identifiers. As such, they can be used to label anything and the meaning of those labels can change over time, intentionally and unintentionally. This is the nature of language itself, as explored exhaustively by Shannon, Goedel, and Chomsky. You can see this in modern discourse today where certain names and labels are "dog whistles" that a subset of the audience interprets in very specific ways--but which appear to have benign meaning to outsiders.

As such, it is fundamentally unknowable by an outsider what a DID actually refers to.

What you can do is use DIDs for a proof-of-control ceremony that mathematically proves that a party has access to presumably secret key material. This proof-of-control is the hook on which DID-based verifications, authentications, and authorizations are based. NOT on the proof that the candidate-in-question is a particular Subject, but rather whether or not they have access to intentionally secret information.

THIS proof-of-control ceremony is absolutely required for DIDs to fulfill the vision we have for them. Directories are not. I don't need to be in a directory to prove control over a cryptographic identifier. That's the whole point. Centralizing directories should be options at a different layer, just like DNS is a different layer than IP. Encouraging DID Methods to set themselves up as directories is the real anti-pattern, one that conflates layers in the architecture that should remain separate.

So, yeah, Service Endpoints in the DID Doc is an anti-pattern. If we have to have them, we should restrict it to a single service that MUST be capable of handling access in such a manner as to support privacy, both regulatorily and ethically. And I will support any work people are doing to solve that problem of decentralized, privacy-respecting discovery.

I'd rather there be NO service endpoints in a DID Document, but I can live with just one. I oppose encouraging multiple service endpoints and, in particular, will continue to advocate against that practice on privacy grounds, both within and outside the working group.

On a more pragmatic level, I don't believe we can resolve the privacy issues of service endpoints in the timeframe we have to publish a DID spec. MAYBE what some of this group wants could be accomplished through means I have yet to discover, without additional privacy burden. MAYBE. But we can provide a cryptographically secure way to prove control over an identifier without reliance on a third party in the time available. I'd much rather we focus our attention on the work that is actually tractable given our timeframe.

Perhaps we should have a topical call where we work through those particular use cases where people believe service endpoints are required to be in the DID Document. I have yet to see anything that is actually required, but have seen much that makes certain business models convenient. Let's work through it--like we did with the portable hierarchy use case--and see if the actual desired value, in fact, depends on service endpoints in a way that can't be realized by a separate directory layer.

@jandrieu That is a really helpful comment. Thank you. It has stimulated some neurons in my brain, I think.

I have been arguing that correctly implemented service endpoints are necessary because they allow cryptographic control of the DID to extend to control over metadata about that DID. Essentially, I want it to be possible to perform the rough equivalent of a database transaction, where keys and metadata are changed together, atomically, or are not changed at all -- or at least I want a strong guarantee of ordering between key updates and metadata updates, such that I always know with 100% certainty which keys are in control when metadata is changed (thus allowing those keys to reliably authorize the metadata update). It is intolerable, IMO, to have a system where a company's public DID keys are guarded in a vault behind 9 layers of protection, but the webpage that announces how you can talk to the company using that DID can by hijacked by DNS, a CDN operator, or the admin of your load balancer or web server.

It is this feeling that made me reject @msporny 's assertion that there are perfectly good ways to communicate endpoints already. However, what you (Joe) said about a single endpoint is resonating partly for me. This is exactly how DIDComm works: it is a single endpoint that takes its security from DID key control and that allows you to discover all other services and run all other protocols. Privately.

Now, maybe DIDComm is too politically encumbered to be the single endpoint you said you could tolerate. That's not the thrust of my comment; I'm just mentioning that what you described is pretty close to something with a concrete impl today. And that got me thinking...

What if the real problem here is that we need to split the construct we currently call a "DID doc" into two pieces: a "DID control doc" (pure control key data) and a "DID descriptor doc" (metadata), and what if we stipulated that:

  1. DID control docs are exclusively for the keys used to prove control of the DID (authentication). No other verification methods are allowed.
  2. All other verification methods, service endpoints, and whatever else someone wants needs to go into the DID descriptor doc.
  3. Changes to the descriptor doc need to be justified (authorized) by a specific version of the state in the DID control doc.

Because of item 3, it now becomes possible to publish DID descriptor docs anwhere, anyhow -- as long as the published data is accompanied by a signature. Ledgers can provide this feature, but so can websites, private chat channels, etc. Also, descriptor docs can be crawled, if people feel like putting them in a public place. Discovery is never a use case for DID control docs, however.

I think if we gave metadata some reliable place to go, rather than just declaring it an antipattern, the resistance to eliminating serviceendpoints from a DID doc would evaporate. At least, that's true for me. But I haven't thought this through deeply, so I'm not advocating this answer strongly--just thinking out loud. What other ideas does that spark among readers of the thread?

tagging @SmithSamuelM and @kdenhartog and @peacekeeper re. the comment above. ^^ You can see how this relates to KERI principles and to some stuff we just discussed about network-of-networks interop.

I really appreciate @jandrieu review and maybe I understand @dhh1128. I would be happy to converge on a single, optional serviceEndpoint that points to a policy decision point. That way, when I use a DID to authenticate using my private keys I am also able to provide a place for the requesting party RqP to try and continue the "first contact".

In no way should this one service endpoint be a directory. The directory, if one is involved is how the RqP discovered my DID in the first place. The RqP can present their credentials, desires, and protocols if they want based on whatever public information I choose to post or not at the serviceEndpoint.

I don't see how I can get this ability to keep my private policies as secret and as reliably bound to my private keys any other way.

In addition, we might normatively allow the one service endpoint to be a Mediator where I can decide if the First Contact looks like a communication, an authorization, or maybe a data store. I don't see the harm in allowing "turtles all the way down" but it could lead to the kind of privacy problems that @jandrieu is describing to us so I'm happy to say that the only service endpoint is a PDP.

For me the privacy question comes down to how you are disclosing PII. If I am using KERI and I want to disclose PII in a private way then I would never use a Public Mechanism to do so. If a DID is resolvable through a public DID resolver and that provides a DID:DOC then I am hosed. I already made my private information unprivate. Its too late.

If on the other hand I am using public infrastructure (DID Ledgers, DID resolvers) to disclose public information that later by other means becomes correlated to PII then all I am doing by restricting what I get to put in the public infrastructure is attenuating the correlation coefficient. I don't prevent it I just slow it down. And once it becomes correlated I want to de-correlated it. So if I want to de-correlate it then I have to be able to erase either the correlated PII data or the correlating data. In my public infrastructure. If the correlating data is on an immutable ledger than cannot be erased then I have to erase the PII data which I can't because well its PII and erasing it means erasing me. So ultimately we have to be able to erase correlating data which means absent some more clarity from the Data Rights Privacy regulators means we shouldn't be using ledgers because well they are immutable. So this is the cognitive dissonance of ledgers. They are inherently un-de-correlatable PII privacy violations. Its not if but when. So I understand the tension. Anyone using a ledger for their DID is faced with the problem of attenuating as much as possible correlating data. But its not a solution its just a half-measure. Eventually all data on an immutable ledger will be correlated. Its just the correlation time constant that you arguing about.

Because KERI consists of un-intertwined hash chained data structures, (i.e. not ledgers in the conventional sense of the word) the correlating information can be erased. You can de-correlate.

To @dhh1128 point there is no point is having DIDs if we cannot use them to securely bootstrap the exchange of information. Given that the internet security model (DNS/CA) is broken we will only ever be able to fix it by replacing it with a better security model. So saying service endpoints can be had by other means is a dissimulation. It presupposes that service endpoint can be had by other means the are equivalently secure to the DID mechanism. Its punting the problem without resolving it. Its saying we pretend we can get their by some other means so lets not address the issue head on. So we either need to provide a secure layer that provides communication parameters that is secure or we have just given up on the problem. Its like we have been saying all these years, DIDs are more secure, decentralized private etc. But then we go nope. The internet is all we really needed all along. This is the same type of reasoning that resulting in multiple DID:methods. Punting the hard problem so we can focus on the easy one.

@dhh1128,

I have been arguing that correctly implemented service endpoints are necessary because they allow cryptographic control of the DID to extend to control over metadata about that DID. Essentially, I want it to be possible to perform the rough equivalent of a database transaction, where keys and metadata are changed together, atomically, or are not changed at all -- or at least I want a strong guarantee of ordering between key updates and metadata updates, such that I always know with 100% certainty which keys are in control when metadata is changed (thus allowing those keys to reliably authorize the metadata update). It is intolerable, IMO, to have a system where a company's public DID keys are guarded in a vault behind 9 layers of protection, but the webpage that announces how you can talk to the company using that DID can by hijacked by DNS, a CDN operator, or the admin of your load balancer or web server.

It is this feeling that made me reject @msporny 's assertion that there are perfectly good ways to communicate endpoints already.

If this wasn't clear before, the assertion was to use VCs to express any information (such as service endpoints) beyond the DID itself, verification methods, and verification relationships. So we already have a mechanism to do what you discuss thereafter -- and we don't have to change anything, I don't think. All we need to do is encourage people to express service endpoints using VCs -- and proofs on those VCs can be checked against assertionMethod verification methods from the DID document. The abstract data model for service endpoints allows for them to be expressed in VCs in a supported syntax.

That does not magically resolve my concern Dave. The issue is that I don't
believe individuals as issuers is a good idea from a privacy perspective
with any of the solutions we have today. So we have a loss of functionality.

@dhh1128,

The issue is that I don't believe individuals as issuers is a good idea from a privacy perspective with any of the solutions we have today.

I don't understand how using a VC that states the DID as the issuer and includes only the service endpoint is meaningfully different from publishing a service endpoint in a DID document directly -- from a privacy perspective. Other than that the VC doesn't have to be on the immutable VDR, of course. What are the privacy concerns?

I might be able to get around my concern if every single interaction where
an individual wants to give such a VC is a new VC, rather than reused. But
I think in some ways that's exactly the same privacy temptation that
disclosing PII on a ledger constitutes: I think people are likely to do it
wrong.

Setting aside the privacy question, I feel funny about creating a
technical dependency where it is reciprocal. DIDs depend on VC's, and VC's
depend on DIDs...

@dhh1128,

VC's depend on DIDs...

VCs do not depend on DIDs.

I might be able to get around my concern if every single interaction where an individual wants to give such a VC is a new VC, rather than reused.

I would expect this to be common in a lot of cases, actually, to turn over a brand new, short-lived VC through a communication channel that has been established to transfer the DID itself as well. If the VC is going to be longer lived, then that's the sort of VC that would live on these decoupled registries that have been discussed in this thread anyway. It would carry the same sort of properties that putting a service endpoint directly into a DID document would, except that it could expire, and be deleted, and so on.

@SmithSamuelM I'm not sure I understand this assertion:

So saying service endpoints can be had by other means is a dissimulation.

If I can use the DID Document to verify the authenticity of a affirmative statement about endpoints (e.g., through signature or encryption), wouldn't that make the security independent of the transport? And hence it is perfectly reasonable to assert that "service endpoints can be had by other means"?

I share that goal of securely bootstrapping communications. But that doesn't mean my service endpoints need to be in the DID Document. It just means that the ability to verify the authenticity of an initial assertion about service endpoints must be possible with information in the DID Document.

In fact, when we encourage service endpoints to be in the DID Document--and that DID Document is unsigned per the spec--then we are implicitly deferring trust in the authenticity of that document to the DID Method. So let's be clear: any DID Method could insert a service endpoint without the controller's intention. So... putting endpoints in the document expands the authority question rather than removes it. Which could be removed if we added signatures to the DID Document, but that seems like a bigger change than we can muster at this stage.

I'd suggest that we minimize the dependency we place on the DID Method, and rely on them only for the MINIMUM amount of data required to securely bootstrap communications, which to my mind is the ability to cryptographic verify a single piece of communication. HOW that piece of communication gets communicated is a protocol for a different spec. I could write it on a wall. I could put in a web site. I could use smoke signals. HOW you find my service endpoint: whether or not I give it to you or someone else does, is a different problem from verifying that the service endpoint is, in fact approved/intended/open for use for that DID.

Which raises one potentially interesting value point for some DID Methods: they provide a way to know that at a given definition of NOW, what the authoritative state is for a given DID. You could timestamp a given assertion of endpoints (such as by putting a hash on a ledger), but that would only state a point in time after which we can know that the assertion existed. We cannot know that the assertion has been superseded by a subsequent assertion, which is the equivalent of not knowing if there is a newer DID Document.

So, for ledger-based DIDs, there might be value in providing in a DID Document a hash of a VC containing "current endpoint specifications". Then, you if you get ahold of that VC, you can verify that it is current.

Anyway, I'm curious, Sam, if I'm understanding what you meant about getting service endpoints from somewhere other than the DID Document.

@dlongley :

VCs do not depend on DIDs.

I agree that this is technically true, as the VC spec allows identifiers for issuers (required) and holders (optional) to be URIs or identifiers having the same properties as DIDs. But AFAIK, that distinction only exists in theory, not in practice. How many members of this group have built VC handling stacks that are DID-free? My concern is practical.

DIDs are low-level plumbing, only one step above cryptography. VCs are a higher-level construct with (IMO) a much stronger affinity for JSON-LD, extensible schemas, rich semantics, and so forth. If I need to be able to issue and verify credentials to talk to someone who has a DID, it feels to me like we have a complexity and dependency inversion problem.

I'm lost.

  • I want to invite contact by "appropriate" requesting parties. (e.g. I
    have depression and am worried)
  • I want to keep the policies for what is appropriate secret, just like I
    keep my private keys secret. (e.g. I will only respond to males in my ZIP
    code)
  • I want to decide how I'm discovered by entities that may or may not have
    any idea of my policies. (e.g. I choose three dating services, some open,
    others with access restrictions that may or may not imply something about
    me or my policies)
  • I post "something" in directories, along with some metadata. (e.g.: the
    metadata says I'm a late 60's female, not yet retired)
  • The something is a DID that I control.
  • Bob gets the DID from somewhere and wants to communicate his attributes.
    (e.g. Bob has age, sex, and address credentials derived from his DMV)

What does Bob do with my DID?

  • Adrian

On Tue, Sep 1, 2020 at 7:20 PM Daniel Hardman notifications@github.com
wrote:

@dlongley https://github.com/dlongley :

VCs do not depend on DIDs.

I agree that this is technically true, as the VC spec allows identifiers
for issuers (required) and holders (optional) to be URIs or identifiers
having the same properties as DIDs. But AFAIK, that distinction only exists
in theory, not in practice. How many members of this group have built VC
handling stacks that are DID-free? My concern is practical.

DIDs are low-level plumbing, only one step above cryptography. VCs are a
higher-level construct with (IMO) a much stronger affinity for JSON-LD,
extensible schemas, rich semantics, and so forth. If I need to be able to
issue and verify credentials to talk to someone who has a DID, it feels to
me like we have a complexity and dependency inversion problem.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-685183672, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YMXCNI7HVZPAUIEOU3SDV6SHANCNFSM4QOKAB3A
.

Drawing some examples to further this line of reasoning from @dhh1128

What if the real problem here is that we need to split the construct we currently call a "DID doc" into two pieces: a "DID control doc" (pure control key data) and a "DID descriptor doc" (metadata)...

For discussion purposes only, _not_ a recommendation from me at this time.

DID Control Document

{
  "@context": "https://www.w3.org/ns/did/v1",
  "id": "did:publicchain:0x89205A3A3b2A69De6Dbf7f01ED13B2108B2c43e9",
  "authentication": [ ... ],
  "srv": "didsrv:gdprcompliantchain:0x89205A3A3b2A69De6Dbf7f01ED13B2108B2c43e9" // optional
}

DID Service Document

{
  "@context": "https://www.w3.org/ns/didsrv/v1",
  "id": "didsrv:gdprcompliantchain:0x89205A3A3b2A69De6Dbf7f01ED13B2108B2c43e9",
  "service": [
    {
      "id": "didsrv:gdprcompliantchain:0x89205A3A3b2A69De6Dbf7f01ED13B2108B2c43e9#openid",
      "type": "OpenIdConnectVersion1.0Service",
      "serviceEndpoint": "https://openid.example.com/"
    }
  ],
  "proof": {...} // OR whole thing as JWT
}

Pros

  • DID Method impls should be able to recycle much of DID Control Doc Resolution for DID Service Doc Resolution, therefore have theoretically the same immutability guarantees.
  • Can segregate data persistence characteristics for authentication vs. capabilities as per @SmithSamuelM
  • didsrv can implement additional authorization or even be "shot over" like a signed JSON patch, either through some communication channel or hacked into some URL.
  • Keep 1 "service endpoint" for discoverability in the main DID Control Doc that cannot have a path, query, or fragment to prevent PII spillage.

Cons

  • Standard becomes a lot more complex if we include didsrv scheme, a lot less useful if we don't. This point is pretty significant and probably can't be overstated.
  • Two schemes instead of one.
  • New can of worms related to DID SRV resolution and introduced correlation risks

Other thoughts

  • This seems to in the same direction as JSON patches and using VCs to define service endpoints (all we're missing is a valid issuer property).

I don't understand how using a VC that states the DID as the issuer and includes only the service endpoint is meaningfully different from publishing a service endpoint in a DID document directly -- from a privacy perspective. Other than that the VC doesn't have to be on the immutable VDR, of course. What are the privacy concerns?

One is repudiability. VCs are not repudiable (unless you use ZKPs, and even then, it's only an option, not the default). Unsigned DID docs (e.g., as shared with peer DIDs in DIDComm) are. What this means, in practical terms, is that when Alice receives Bob's DID doc, she can't prove to Carol with the DID doc itself that it's truly Bob's DID doc; unauthorized sharing loses assurance unless Alice archives a transcript of her interaction with Bob and plays it back to Carol. That transcript might include safeguarding terms and conditions, for example. But when she receives a VC from Bob about his endpoint, she can prove to Carol, without Bob's permission, that it's Bob's endpoint. She just has to display the VC in isolation.

Another issue is revocation. When a key on a DID is rotated, it does not retroactively invalidate all the signatures that the key created. Thus, rotating the key doesn't invalidate VCs that claimed the service endpoint was X. To me, that suggests that we have a feature gap that, AFAIK, can only be plugged with real VC revocation. I'd like to believe that we don't need revocation of VCs that convey service endpoints -- but if we do, then we are asking individuals who want DIDs to manage revocation lists, or to expose a phone home service. Either has privacy implications (not to mention logistical problems).

I don't know if either of these matter a lot. I'm not claiming to have done a deep analysis of this problem. I am just challenging the easy claim that VCs-with-endpoints are a direct equivalent of endpoints-in-did-docs. I think privacy is likely to be a dimension where differences surface. Maybe I'll do some more thinking about this, and come to different conclusions.

@dhh1128 you say

When a key on a DID is rotated, it does not retroactively invalidate all the signatures that the key created. Thus, rotating the key doesn't invalidate VCs that claimed the service endpoint was X.

Why do you say this? As in, is there spec text I missed that actually states or implies this?

To my understanding if I get a VC created by an Issuer using a DID, signed by something that does not have a matching entry in that Issuer's CURRENT DID Document, it will fail verification.

My expectation is exactly the opposite of what you said. Did I misunderstand what you meant?

Whew! What a thread. I just read the whole thing because on this morning's DID WG call this issue was pointed to as "the outcome of the special topic call on service endpoints". I had no idea it had grown to this length.

I will keep this short. I hate to say it, but so far the content in this thread almost completely misses the two main reasons for keeping service endpoints in the spec provided that we include all appropriate privacy flags and warnings of course.

Reason Number 1: Public DID documents for public entities (like corporations, governments, NGOs, universities, churches, websites) who want to publicly advertise not just their DID and keys but their service endpoints. For these entities:

  1. We want to make it really easy.
  2. We want to make it one-stop/one-hop to post the original DID document and make updates.
  3. There are no GDPR or other privacy concerns.

Reason Number 2. Innovation. Who are we to say that we know all the right and wrong ways to safely use a service endpoint? DIDs are just entering the world. DIDComm is still an infant at the crawling stage. Yes, we should provide all the privacy warnings and guidance that we can. But to suggest that we remove the feature or artificially restrict a DID document to a single instance of a service endpoint seems a little like TBL predicting all the things URLs will be used for back in 1994.

Unless someone can decry these two motivations for service endpoints—which is why they have existed in the spec since the first draft four years ago—I suggest we move forward with Orie's suggestion:

  1. we should define an abstract data model for services like we have for verification methods
  2. we should warn about them in the privacy and security considerations
  3. we should warn about them in an implementation guide (if one ever gets created... if not... good thing we are committed to 2).

@talltree:

ezgif-7-09a6b9b3daf1

@jandrieu

So saying service endpoints can be had by other means is a dissimulation.

If I can use the DID Document to verify the authenticity of a affirmative statement about endpoints (e.g., through signature >or encryption), wouldn't that make the security independent of the transport? And hence it is perfectly reasonable to assert >that "service endpoints can be had by other means"?

But those other means must be defined as part of some standard for interop sakes not merely hypothetically referenced. That’s my point about punting.

I share that goal of securely bootstrapping communications. But that doesn't mean my service endpoints need to be _in_ the DID Document. It just means that the ability to verify the authenticity of an initial assertion about service endpoints must be possible with information in the DID Document.

I agree but the DID spec is useless for DIDs if we don’t do that someplace. Indeed in other places I have suggested the tension with DID Docs is that instead of a one place fits all model we should be using a layered approach.

Layer 0: Control Establishment (authoritative signing keys for DID)
Layer 1: Authorizations of communication parameters (routing and encryption) and service endpoints
Layer 2: Other stuff aka DID Doc or Ersatz verifiable credential. (Because layer 0 and 1 have bootstrapped us)

So I would wholeheartedly support a layered approach. But if we are just punting the problem then we are not doing anybody any favors. It’s an abrogation of our responsibility to the concept of DIDs to punt communication parameters and service endpoints.

In fact, when we encourage service endpoints to be in the DID Document--and that DID Document is unsigned per the >spec--then we are implicitly deferring trust in the authenticity of that document to the DID Method. So let's be clear: any DID >Method could insert a service endpoint without the controller's intention. So... putting endpoints in the document expands the authority question rather than removes it. Which could be removed if we added signatures to the DID Document, but that seems like a bigger change than we can muster at this stage.

Whoa! DID documents are unsigned. When did that happen?
If so then they would be useless.

@jandrieu @dhh1128

My expectation is exactly the opposite of what you said. Did I misunderstand what you meant?

This is a very revealing comment. If the DID Doc spec is so ambiguous that there is not a clear understanding of how authoritative statements work wrt to rotating keys then we have messed up big time.

One of the reasons I wrote KERI was to precisely define control establishment in rigorous way. Control establishment must include key rotation and what that means in terms of control establishment. A reasonable rule which KERI employs is:
1) a signed statement using the current authoritative set of keys at the time of the signature is valid until revoked or rescinded.
This means that merely rotating keys does not revoke or rescind the validity of prior signed statement. Otherwise every time your rotate keys your would have to reaffirm (reissue) every prior statement signed with the now absolute keys. This is an impractical rule. So in general the rule 1) is a the most reasonable rule but certainly not the only possible rule.

The alternative is as follows:
2) All statements issued/signed with a given set of keys are automatically revoked when the authoritative keys are rotated.

As a side note I had a discussion with a security practitioner who asserted that one could not use rotatable keys to issue verifiable credentials because then one would have to reissue every verifiable credential issued with the old keys. This person was assuming rule 2)

This view (rule 2) of mandatory revocation (reissuance) is a common rule in token based security approaches where all tokens issued under a given set of keys are automatically revoked when you rotate keys.

In order for Rule 1) to be practical one needs to maintain a log of statements signed with a given set of keys or at least a cryptographic commitment to the hashes of the log of statements (merkle tree or hash chained data structure) so that one can verify that a statement was signed with the then authoritative set of keys.

So if one is not using a log (ledger, etc) of signed statements then Rule 1) is unworkable and Rule 2) is the reasonable one.

A hybrid would be:

Rule 3) Only logged signed statements use rule 1) and all other signed statements use rule 2) Any presentation of a signed statement includes a reference to its location in the log to determine the authoritative keys at the time (location) in the log. If log reference is absent then one checks the current authoritative keys and if they differ then the signed statement is stale (invalid).

Clearly the issuer-holder-verifier model of VCs is problematic with rule 2) especially at scale issuance of large numbers of credentials especially credentials that are time expiring because your have coupled your key management (rotation recovery) to the expiration rules for your VCs.

Thus rubric for a DID method would include which rule 1) 2) or 3) is to be used to verify signed statements associated with the keys for that DID including the DID:Doc

But in order to support 2) or 3) a verifiable log is required.

Each DID method should explicitly define which rule 1) 2) or 3) is to be used when verifying signed statements. As far as I know no DID method explicitly does this. Its implied.

@dlongley :

A DID on its own does not necessarily identify a person. This depends on its use outside of the VDR. However, a URL that includes a person's full name identifies a person, all on its own.
@dhh1128
A DID that has as its subject a person is PII, according to legal experts who've studied PII+GDPR+SSI carefully. (Or perhaps more precisely, experts I've talked to say that they believe legal rulings will eventually formalize this legal conclusion.) The fact that some DIDs have subjects that aren't individuals is irrelevant. Putting a DID that identifies a person onto a public ledger is putting PII onto that ledger, even if it is not obvious to an outside observer that the DID in question has an individual as its subject. Obviousness is not a definitional criterion of PII, and does not eliminate the right-to-be-forgotten requirement.

+1. GDPR considers cryptographic hashes and identifiers PII when they are correlated to PII. This is severe problem for all ledgers that use DIDs. to arbitrarily consider service endpoints as potential PII but ignore that fact that the DIDs themselves are potential PII is a problematic view.

@dhh1128
Key rotation versus signed statement revocation. The authority of a signed statement is imbued to it by its signature and the keys used to create the signature. Is a signed statement authoritative/authorized after the keys used to sign it have been rotated? If not then the statement is effectively revoked as not longer being an authoritative/authorized statement. If the statement is still authoritative/authorized after the keys used to sign it have been rotated then is it not effectively revoked by the rotation itself but requires a separate signed revocation statement the rescinds/revokes its authoritative/authorized status. This revocation statement is signed by the current set of authoritative keys that may be different from the keys used to sign the statement being revoked.

Authorization tokens which are a form of signed statement often employ the rule 2) that when the keys used to sign the token have been rotated that this implies that the token’s authorization is revoked. Effectively the token is always verified by the current set of signing keys so it will fail verification after rotation. Whereas in Rule 1) the verification is w.r.t the set of signing keys used to create the signature at the time the statement was issued and signed. This means the verifier has to have a way if determining what the history or lineage of control authority was via a log or ledger to know that a statement was signed with the authoritative set of keys at the time. This means that the log or ledger must not only log the lineage of keys (key rotation history) but the statements signed by those keys (a digest of statement is sufficient). Otherwise a compromise of the current signing keys (which rotation protects from) would allow an exploit to create verifiable supposedly authorized statements after the keys have been rotated. So it either must be rule 1 or 2 or 3. And non-automatic revocation of signed statements requires a log of both the key rotation history and signed statement history.

Obviously if keys are not rotatable, then any signed statement may not be revoked by merely rotating keys but instead a revocation registry may be used to determine if a signed statement has been revoked by explicitly using a revocation statement. So non-rotatable keys may use a modified rule 4) where there is no key rotation history log or signed statement log but merely a revoked statement log. Although typically non-rotatable keys are used for ephemeral identifiers, in which case, revocation log is not used. Instead of rotating keys for ephemeral identifiers you just rotate the identifier (make a new one with a new set of keys) and abandon the old identifier and all its signed statements.

For anyone reading this thread who becomes interested in the side-topic of rotation vs. revocation (what I, Joe, and Sam mentioned), I created a separate issue to move the discussion into its own context: https://github.com/w3c/did-core/issues/386

@SmithSamuelM,

...to arbitrarily consider service endpoints as potential PII but ignore that fact that the DIDs themselves are potential PII is a problematic view.

Emphasis mine. I agree with your statement here, but the distinction written about above is not arbitrary.

I created an issue in the did-spec-registries which argues that there should not be a centralized registry of services types, I think that's loosely related to this issue here: https://github.com/w3c/did-spec-registries/issues/125

@SmithSamuelM said:

GDPR considers cryptographic hashes and identifiers PII when they are correlated to PII.

This is exactly why we should NOT discuss DIDs as identifying particular subjects. Every framing that does so risks compliance complications. Because DIDs do their magic without anyone needing to know the actual subject in any other context (we don't need to know the physical or legal person it refers to). When we think about and advocate DID uses that persistently permanently refer to an individual, as in this note in section 3.1 https://www.w3.org/TR/did-core/#did-syntax :

That is, a DID is bound exclusively and permanently to its one and only subject. Even after a DID is deactivated, it is intended that it never be repurposed.

When we frame it this way, we are begging for DIDs to be treated as PII.

We would do well to avoid that mistake.

DIDs enable demonstrable proof-of-control over an identifier without reliance on a third party. What that identifier is ABOUT is entirely a construct of the statements made about that DID and how those statements are interpreted by recipients. This is fundamental to language. A DID is a signal, which only has meaning in so far as the signaller intends AND the receiver understands. Note that this is NOT about the Controller: the Controller also doesn't get to decide what a DID is about. People using the DID do.

Consider this hypothetical. Consider did:joe:SuperThing which starts out referring to a weekend project. That grows and becomes a business, a sole proprietorship. I later add a partner and did:joe:SuperThing now refers to a partnership. Later we turn the partnership into an LLC. Then, as we lay the work for an IPO, becomes a C corporation. In each of these stages, did:joe:SuperThing is a different, legally distinct entity, even though there is a sense in which the meaning of did:joe:SuperThing is consistent across that lifecyle: its this Super Thing I created. But if you were to incorrectly assume that did:joe:SuperThing referred to any one of those specific legal entities, you wouldn't necessarily be wrong: it did refer to those specific legal entities at different times. You just have to use additional cues to figure out that did:joe:SuperThing is NOT actually the specific legal entity, but rather the conceptual notion of a project with a life of its own. THEN you have to apply that knowledge to interpret statements that may be made about that DID in different stages.

Just as HODL doesn't mean what it's coiner intended.

Just as MTV no longer means what it used to.

Just as the People's House no longer means what it used to.

Just as "Karen" no longer means what it used to. Or Lincoln. Or Christ. Or ANY identifier that has any temporal staying power.

Semantic shift is a fact of human language. So, while some of us deeply crave the illusory certainty that a DID in fact refers to a specific Subject, the fact is that signals refer to whatever the signaller meant them to refer to, and then only when the recipient shares some notion of that same meaning.

What DIDs do allow is demonstration of proof of control, which can be used, in the context of true "secrets", as a form of identity assurance that the entity performing the proof-of-control is the same entity that it was the last time proof-of-control happened.

The nice thing is that this still works with did:joe:SuperThing. The semantic shift doesn't happen when you accept that all proof-of-control means is that the current party controls the proof secrets of did:joe:SuperThing. Which implies the current party is acting as did:joe:SuperThing, but that's it. It can't even be assumed that was asserted as true of did:joe:SuperThing at some point in the past (perhaps embodied as a VC) applies to the current did:joe:SuperThing. Consider the business permit issued to the sole proprietorship using that DID. That permit does NOT apply to the partnership nor the LLC nor the corporation.

I'm boggled by why so many people are literally advocating for features that will make DIDs effectively unusable for individuals due to privacy concerns. Treating DIDs as identifying particular individuals and using the DID Document as a correlation point for information about individuals are problems we should be working to avoid, not "features" to defend.

@jandrieu -- Could you put together a PR with some concrete text changes to the section of the spec you quoted to address the above issue? I think it would be helpful for the group to debate concrete changes to the particular problem you raised in a PR as you have raised good points. It would be good for us to try and separate that concern off from the rest of what is happening here, we may be able to reach consensus on it more quickly than the issue here, and it may help create a foundation for finding consensus here.

(Trying out the anti-statement strategy that seems to be working in the SDS authorization discussion).

PROPOSED: A DID is just for authentication and related control and security issues. A DID SHOULD not raise privacy issues.

The SHOULD means that if there is any way to avoid Service Endpoints in the DID Document we should do that. We know from the Glossary group work that there are at least a few service endpoints of interest, notification and authorization among them.

If it makes sense to decouple notification and authorization from the DID Document resolution process then maybe we should. That would mean that a DID controller would authenticate to some service provider, (e.g. a secure data store or the car rental company in https://github.com/w3c/did-use-cases/issues/101) and CRUD their notification and/or authorization service endpoint without changing the DID Document.

Is this reasonable? Will it solve our privacy issues? Will it help adoption of SSI?

If it makes sense to decouple notification and authorization from the DID Document resolution process then maybe we should. That would mean that a DID controller would authenticate to some service provider, (e.g. a secure data store or the car rental company in w3c/did-use-cases#101) and CRUD their notification and/or authorization service endpoint without changing the DID Document.
Is this reasonable? Will it solve our privacy issues? Will it help adoption of SSI?

Yes, this is what some in the thread are arguing -- it is reasonable and will solve a variety of privacy issues and will help adoption of SSI.

I'm warming up to the idea.

Based on my proposed Alice Rents a Car use-case, Alice's agent might have a DID of its own. Her service providers (in this context, defined as anyone that has agreed to let her authenticate with a DID), would then post publicly what agent protocol they support (e.g. OAuth3 / GNAP) and allow Alice to either:
a) register her agent service endpoint itself, or
b) register the DID for her service endpoint,
Either way, Alice would assume that any service provider (DMV, insurer, bank) asserting Gold Button would be compatible.

In case of a) the service provider would need to verify the capability that was being presented (by the rent-a-car service) was delegated by Alice's authentication DID. How do they do that?

In the case b) The service provider needs to verify the association of Alice's authentication DID with Alice's agent DID in order to verify the capability being presented by the rent-a-car service. How do they do that?

Here's a sequence diagram that separates authentication from authorization. Alice creates a did:key for authentication with a bank service provider. She then registers a semi-autonomous payment agent that supports _a mutually acceptable authorization protocol_ Alice's agent has been pre-programmed with policies that say that any company in the Fortune 500 can be paid $200 or less automatically because Alice is sure she can get her money back in case of dispute.

Later, Alice registers with a rent-a-car service provider using a different did:key. Alice also registers her agent, the same one she registered with the bank. Alice's agent issues a capability to the rent-a-car company that results in payment by the bank and Alice receives a capability to access the car.

There are some privacy issues with this simple sequence. The bank gets to know that Alice is renting a car and Alice's agent endpoint is a correlation risk. However, Alice deems these to be acceptable under the circumstances. The bank's tracking could be mitigated if Alice's agent has access to (digital) cash. The agent correlation risk can be mitigated if Alice uses a mediator to hide her agent endpoint from the rent-a-car.

In the general case, the bank and the rent-a-car are just somebody's secure data stores.

Can we fit a (standard) authorization protocol to complement DID as a pure authentication method?

@agropper and @msporny :

The SHOULD means that if there is any way to avoid Service Endpoints in the DID Document we should do that.

Here is where I diverge. I believe this statement of Adrian's makes a logical leap that is unwarranted. Yes, we should avoid privacy problems with DIDs. But it does not therefore follow that we should take Service Endpoints out of the DID document. Rather, it follows that we should: A) describe service endpoints in a way that preserves privacy, OR B) we should take them out. You are short-circuiting by ignoring the first (A) branch of the ORed statement.

Peer DIDs with service endpoints do not have a privacy problem. They take branch A.

Any DIDs with herd privacy endpoints do not have a privacy problem. They take branch A.

Manu seems to be arguing that there are equally good ways to communicate service endpoints outside a DID doc. I disagree. As far as I can tell, all the ways Manu has proposed so far lose the characteristic that I want, which is the ability to strongly associate a service endpoint value with a particular key state, updating them together or not at all, with a DB-transaction-like atomicity. I claim that without this, hackers can drive a truck through system security.

I agree with @dhh1128 ... I think that if we don't describe branch A, DID Method Authors will just make up their own way of doing it, which will not be standard, and might not address the security concerns raised by the group.

But I also agree with the SHOULD.... if you don't need an insert arbitrary property in a did document... it SHOULD NOT be there.

@dhh1128 I totally agree with you. That's why I suggested that proponents of removing them "should" fill out the sequence diagram for how we associate a DID without a service endpoint, like did:key with either an authorization, notification, or mediation service.

@dhh1128,

As far as I can tell, all the ways Manu has proposed so far lose the characteristic that I want, which is the ability to strongly associate a service endpoint value with a particular key state, updating them together or not at all, with a DB-transaction-like atomicity. I claim that without this, hackers can drive a truck through system security.

You seem to be suggesting that if some information X is not atomically bound with a particular key state via the DID Document then there is an insurmountable system security problem.

I've intentionally called this information X here instead of "service endpoint" to highlight that what you're arguing is that all Xs must be in the DID Document. Forget about using VCs, for example -- unless you stuff them into the DID Document. So, I disagree -- and I think, instead, that we've got fairly useless technology if everything we care about from a security perspective has to live in the DID Document. We have to support partitioning or this system will collapse under its own weight.

It's also worth noting that the identity behind a particular DID is inexorably partitioned from the DID itself already.

@agropper,

...how we associate a DID without a service endpoint, like did:key with either an authorization, notification, or mediation service.

For example, send someone this VC:

{
  "@context": ["https://www.w3.org/2018/credentials/v1", "some-context-that-defines-service-endpoint-terms"],
  "id": "urn:vc:12321345",
  "type": ["VerifiableCredential", "ServiceEndpointCredential"],
  "issuer": "did:key:z6mczx79123...4234",
  "issuanceDate": "2010-01-01T19:73:24Z",
  "expirationDate": "2010-02-01T19:73:24Z",
  "credentialSubject": {    
    "id": "did:key:z6mczx79123...4234",
    "service": [{
      "id": "did:key:z6mczx79123...4234#service-x",
      "type": "NotificationService",
      "serviceEndpoint": "https://example.com/something"
    }]
  },
  "proof": {
    "proofPurpose": "assertionMethod",
    "verificationMethod": "did:key:z6mczx79123...4234#z6mczx79123...4234",
    "..."
  }
}

You can also send them a zCap to access the service endpoint at the same time:

{
  "@context": "https://w3id.org/security/v2",
  "id": "urn:uuid:14931-24982-23342-423-234342",
  "parentCapability": "https://example.com/zcaps/something",
  "invocationTarget": "https://example.com/something",
  "invoker": "did:key:zm239823432...35423523",
  "allowedAction": ["read"],
  "expires": "2010-01-02T19:73:24Z",
  "proof": {
    "proofPurpose": "capabilityDelegation",
    "verificationMethod": "did:key:z6mczx79123...4234#z6mczx79123...4234",
    "..."
  }
}

@dlongley thats nice, but I want to be able to crawl all the DIDs in the VDR and build a database correlating them to other websites, people, devices and data sets.... so your use of did:key and privacy preserving approach to this problem gets in the way of my business model... can't we just mandate my ability to make money selling correlation data?

/s

thats nice, but I want to be able to crawl all the DIDs in the VDR and build a database correlating them to other websites, people, devices and data sets.... so your use of did:key and privacy preserving approach to this problem gets in the way of my business model... can't we just mandate my ability to make money selling correlation data?

Yes, this is a fun joke -- but I also don't want people to think we're being dismissive of their concerns (or lumping them into a group/use case where they don't belong or that they don't support). We're all listening here and trying to find the best way forward (including @OR13 who is an excellent collaborator).

@agropper wrote:

@dhh1128 I totally agree with you. That's why I suggested that proponents of removing them "should" fill out the sequence diagram for how we associate a DID without a service endpoint, like did:key with either an authorization, notification, or mediation service.

You simply sign a VC that states the endpoints and make that VC available to those you wish to use those endpoints.

There is no sequence diagram necessary.

From what I can tell, it seems like some of us have a hidden requirement to automatically be able to do all sorts of magic with DIDs, whether that's a directory service or a resource delivery mechanism. These efforts tend to violate the layered architecture that gives DID their privacy-enabling features.

Once you have a viable root authority from a DID Document, it is trivial to secure (in-band) any communications you might have with those acting on behalf of the Subject. In particular, you can secure the content independent of the communications channel.

If you want to look up someone's communication channels, use a directory, with appropriate controls for compliance and privacy.

I for one, don't want every DID that gets created to advertise service endpoints. The only thing that does, IMO, is give data aggregators and bad actors a means to scrape data without my permission.

Automated discovery is the problem, not a feature.

You want to know how to reach me, ask me.

@jandrieu There is an assumption being made that any DID Doc is by default discoverable. Its only discoverable if the DID method makes it discoverable. A perfectly good way to "tell" someone when they ask about one's data is to deliver to them a DID:Doc. Did Resolvers do not have any way of discovering a DID:doc unless the controller of the DID:doc publishes to the DID resolver or the DID method pulls it from a public verifiable data registry or ledger. DID resolvers should not cache DID Docs unless they have signed consent from the controller of the DID Doc or unless the did method makes that implicit.

This is a case where the right to be forgotten would be enforceable against someone hosting a DID resolver. The verifiable controller of a did doc could request a did resolver that inadvertently or maliciously cached a did doc without consent to delete it. The resolver would be liable under GDPR for not deleting it upon request.

@jandrieu But I do agree that given resolver meta data provides proof of control authority, a did doc could be replaced by a verifiable credential to provide the same information.

The assumption isn't that it is discoverable, but that it is resolvable (either directly from the DID itself, from a registry, or directly from a peer).

I would argue a different distinction than your comment suggests.

The DID Document SHOULD only be that which provides proof of control authority. And the current, authoritative DID Document is only attainable through the means defined in the DID Method.

Any other transmittal of a DID Document is, by definition, non-authoritative.

@dhh1128 @jandrieu I think Daniel Hardman should respond as I believe your proposed rule would be a problem for did:peer

As long as did:peer provides a way to definitively get the authoritative DID Document from the peer, we're good. Because did:peer is useless outside that peer relationship, whatever the current document, as provide by the peer, is, I believe, definitive.

Of course, that begs the question about caching... but at least did:peer avoids the complication of different DID Documents existing that could each be "definitive" because whatever the most recent DID Document communicated is definitive and only has validity in the context with that peer. DID Documents for that DID given to a different peer are not in the current context, so there is no conflict.

@dlongley and @jandrieu - I would hope to avoid using VCs at this level, it
just seems too heavy as compared to zCap or GNAP but I could be wrong.

The sequence diagram is a huge help to my understanding what's going on as
we cross from authentication with did:key to authorization in a specific
use-case. Could you please fill out the flow steps using the zCap
consistent with the precedent authorization steps?

Thank you

On Mon, Sep 21, 2020 at 2:48 PM Joe Andrieu notifications@github.com
wrote:

As long as did:peer provides a way to definitively get the authoritative
DID Document from the peer, we're good. Because did:peer is useless outside
that peer relationship, whatever the current document, as provide by the
peer, is, I believe, definitive.

Of course, that begs the question about caching... but at least did:peer
avoids the complication of different DID Documents existing that could each
be "definitive" because whatever the most recent DID Document communicated
is definitive and only has validity in the context with that peer. DID
Documents for that DID given to a different peer are not in the current
context, so there is no conflict.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-696301366, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YLLE5OXLPT7CPXL463SG6NXLANCNFSM4QOKAB3A
.

@gropper I looked at your sequence diagrams, but I'm not understanding why you have an agent in the loop. Or, for that matter, what any of this has to do with a DID Document and service endpoints as did:keys don't have service endpoints and the DID Document isn't present in your flow.

That said, I put together a flow for how I would do what I think you are asking... with the exception of the Agent and the last interchange with the car key as that doesn't seem to have any bearing on the payment authorization.

Simply get a zCap from the bank and delegate it to the rental agency with an appropriate caveat.

No need to register anything with the bank other than the initial DID for issuing the zCap from the bank.

There is a LOT more complexity that one could expect in this kind of scenario: for example, the rental agency probably needs a way to verify that the capability, in fact, is what Alice says it is before giving her the rental. I don't believe there is yet a standard way to do that.

There are also tons of ways you could integrate an agent, but I'm not sure that the agent is doing in this case. The delegated zCap gives the rental agency everything they need to retrieve payment. What is the agent doing in the process?

Arguably, an even better way to achieve this use case would be to use a lightning channel with a $100 in it that the rental agency can close out when Alice is done with the car. That would need neither an agent nor a bank, but it would be dependent on bitcoin.

However, if you want to delegate banking operations AND your bank is willing to use a delegatable zCap for that, then there is no need for Alice's agent (or anyone else) to get involved. Alice just uses her keys (and wallet software) to delegate the zCap from the bank with appropriate caveats.

I'm also not seeing how any of this applies to service endpoints.

With one exception, if you imagine a payment/invoice service endpoint that the rental agency might use, then that idea is understandable, but leaves out how you authorize the rental agency to use that endpoint for a specific amount. Or the agent, for that matter. Are you imagining that the agent has full authority to authorize any transaction for you? This seems like an unnecessary risk.

zCaps fundamentally include the invocation target, defined at the point of issuance. So, from one perspective, the service endpoint is embedded in the zCap. No need at all to publicly list some sort of invoicing service endpoint.

@jandrieu Thank you for engaging and for framing the discussion in terms of
zCaps.

If I understand your proposed flow, Alice uses two did:key to interact with
the Bank, did:key:1...3 to authenticate and did:key:xyz to control zCap
(A). Alice uses some other DID to authenticate to Bob's Rent a Car and
Bob's creates did:key:def in order to control their ability to cash in zCap
(A) at the Bank.

In terms of the general Alice to Bob authorization use-case, Bob as
Requesting Party is approaching Alice with a triplet:

  • Purpose: to rent car X
  • Bob's credentials: I'm a Fortune 500 company
  • Data Request: a $100 capability at a reputable bank or a secure data
    store with $100 cash

In the general Alice to Bob case, Alice has to do a lot of work here. (1)
Evaluate the purpose for the request relative to her policies. (2) Verify
Bob's credentials and compare to her policies, and (3) Authenticate and get
a $100 voucher from her bank.

The reason for the agent in the loop is to keep Alice self-sovereign.
Self-sovereign identity is only the beginning. Today's sad reality is a
vast asymmetry of power between the individual and service providers,
platforms, and other data brokers. We live in an "attention economy" where
almost all of the technology is controlled by others and used to manipulate
us. I tend to look at complex and important systems from a medical
perspective. When faced with illness (or a legal dispute) we don't assume
that the patient (or the defendant) will face the institutions directly. We
introduce an agent (doctor, defense lawyer) chosen by the patient, and with
expertise they can put at the patient's disposal in a fiduciary capacity.
Alice needs technology she chooses to deal with the Bank and with Bobs.

The sequence diagram
https://www.websequencediagrams.com/?lz=dGl0bGUgU2VwYXJhdGluZyBBdXRoZW50aWNhdGlvbiBmcm9tAA8Fb3JpegAOBQoKcGFydGljaXBhbnQgQWxpY2UgYXMgQQAKDUJhbmsAAw5vYidzIFJlbnRcbmEgQ2FyIGFzIEIAORInc1xuQWdlbnQATwVBCgoKAF0FLT5CYW5rOiBSZWdpc3RlcnMgZGlkOmtleToxLi4uMwpub3RlIG92ZXIAgQsGLAB8BTogTGF0ZXIsAIEeB2dldHMgYW4gYWdlbnQASQ5TaWduLWluIGEASBBCYW5rLT4AgWAFOiBjaGFsbGVuZ2UAgQYOc2lnbmVkIHdpdGgAgQcPAIEmFiBteQB-BiBhc1xuaHR0cHM6Ly9hbGljZS5leGFtcGxlLmNvbQoAgUITAIFDD3JlbnRzIGEgY2FyAIIeCQCCEBQyLi4udyBmb3IgYXV0aCduABgVAHAZADQFcGF5bWVudApCLT5BQTogcGF5ICQxMDAKQUEAbwVvayB0bwAMCkIAgygIAAgKAIJEBUI6ABcJQUE6IGNhciBrZXkgY2FwYWJpbGl0eQBLBQCCaAdrZXkgZm9yd2FyZGVkIHRvAIRlB3ZpYSBlbWFpbAo&s=default
is a simplification of the Alice Rents a Car use-case that I'm hoping to
add to our Use Case document. See
https://github.com/w3c/did-use-cases/issues/101 It's an attempt to
understand interoperability in human terms.

My take-away at this point is that zCaps could work in the general
Alice-to-Bob authorization case even if Alice has an agent but that we
might prefer GNAP because it will promote human-centered interoperability.
Either way, with or without an agent and with zCaps or GNAP, authorization
does not seem to require a service endpoint in an authentication DID. How'm
I doing?

On Mon, Sep 21, 2020 at 7:18 PM Joe Andrieu notifications@github.com
wrote:

@Gropper https://github.com/Gropper I looked at your sequence diagrams,
but I'm not understanding why you have an agent in the loop. Or, for that
matter, what any of this has to do with a DID Document and service
endpoints as did:keys don't have service endpoints and the DID Document
isn't present in your flow.

That said, I put together a flow
https://www.websequencediagrams.com/cgi-bin/cdraw?lz=dGl0bGUgU2VwYXJhdGluZyBBdXRoZW50aWNhdGlvbiBmcm9tAA8Fb3JpegANBih6Q2FwcykKCnBhcnRpY2lwYW50IEFsaWNlIGFzIEEACg1CYW5rAAMOb2IncyBSZW50XG5hIENhciBhcyBCCiMAOxEnc1xuQWdlbnQAUAVBCgoKbm90ZSBvdmVyAGcGLABYBToAcwdpcyBhbHJlYWR5IGEAgTgKZWQgaW50byB0aGUgYmFuaydzIHN5c3RlbQoAgSsFLT4APQZHZXQgY2FwYWJpbGl0eSAKQmFuay0-AIFOBTogUmVxdWVzdCBESUQgZm9yIG5ldwAhDCh3LyBjaGFsbGVuZ2UpAEoORElEIChkaWQ6a2V5Onh5eikgd2l0aCBzaWduZWQALAoAYg5kZWxlZ2F0YWJsZSB6Q2FwIChBKQCBaRQ6IExhdGVyLACCdAdyZW50cyBhIGNhcgpCLT5BOgCBMAhwYXltAIJCBQCDOQwKQS0-QgCBShIAFxUAgVMQQgCCEgkAgVMNZGVmAIFIGACCZgcALggAgVsGAIFXB0EgdG8gADkLIACCHgYkMTAwIGxpbWl0IChBKwCCTgoANQpkLACCGRIiQSsiAIIWFlRoZW4AgiEKdHVybnMAgiMIAIRBBkludm9rZQCDewxmb3IAgQEGAII7BwCEEAdCOgCBGAUKCg&s=default
for how I would do what I think you are asking... with the exception of the
Agent and the last interchange with the car key as that doesn't seem to
have any bearing on the payment authorization.

Simply get a zCap from the bank and delegate it to the rental agency with
an appropriate caveat.

No need to register anything with the bank other than the initial DID for
issuing the zCap from the bank.

There is a LOT more complexity that one could expect in this kind of
scenario: for example, the rental agency probably needs a way to verify
that the capability, in fact, is what Alice says it is before giving her
the rental. I don't believe there is yet a standard way to do that.

There are also tons of ways you could integrate an agent, but I'm not sure
that the agent is doing in this case. The delegated zCap gives the rental
agency everything they need to retrieve payment. What is the agent doing in
the process?

Arguably, an even better way to achieve this use case would be to use a
lightning channel with a $100 in it that the rental agency can close out
when Alice is done with the car. That would need neither an agent nor a
bank, but it would be dependent on bitcoin.

However, if you want to delegate banking operations AND your bank is
willing to use a delegatable zCap for that, then there is no need for
Alice's agent (or anyone else) to get involved. Alice just uses her keys
(and wallet software) to delegate the zCap from the bank with appropriate
caveats.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-696429680, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YI2XVB5FUQFHGJQX3LSG7NKXANCNFSM4QOKAB3A
.

@dlongley

You seem to be suggesting that if some information X is not atomically bound with a particular key state via the DID Document then there is an insurmountable system security problem. I've intentionally called this information X here instead of "service endpoint" to highlight that what you're arguing is that all Xs must be in the DID Document.

No. "Service endpoint" is not a stand-in for "any kind of data about the DID" in my thinking. I mean exactly and only service endpoints, not a generic x. Characterizing it generically turns my assertion into a straw man. Knowing where to talk to a DID controller (inbound: the endpoint) and knowing how to authenticate the DID controller (outbound: the keys) are the two pieces of info that I claim must have mutual integrity to ensure security. Not other stuff. The reason I am concerned is because I believe the lack of synchronization between these two particular pieces of data can be exploited by creating race conditions in a way that is not risky for other data.

To understand why, consider this thought experiment.

Alice and Bob are acquaintances who wish to carry on conversations with high security. They exchange phone numbers (endpoints) and they also agree on passwords (keys) that will authenticate one to the other. Important to the thought experiment, Alice and Bob might also interact in other ways besides official endpoints (e.g., Alice could meet Bob at a conference and hand him a love letter, encrypting it with her password). The security guarantees and sequencing in communication mediums other than phone calls is undefined -- sometimes it may be good and fast, other times not. These other communication mediums represent Manu's posited alternative methods for communicating endpoints. They are also implied by Joe A's comment that if we want to communicate an endpoint, we just give the other party a VC. They are one of the ways, besides phone calls, that this VC could be shared. We don't know anything about them or their security and timing properties except that they exist and are not the same as the endpoints. (Such channels will always exist; it's impossible to design a system that prevents them.)

So Alice wants to change her phone number from A.endpoint[1] to A.endpoint[2]. Great. She calls Bob at B.endpoint[1] and gives him her existing password, A.pass[1]. Now that he knows it's her, she says, "I'd like you to call me on a new number, A.endpoint[2]". Everything is great. No ambiguity, no security problem. This represents the simple model that Manu and Joe are advocating. (BTW, notice that it doesn't use the alternative communication channel. That's why it's so clean and safe -- and so unsatisfying to me.)

The problem is, reality is messier than that. What if Alice has multiple devices, and so does Bob? What if each has multiple endpoints and multiple keys (as most orgs do)? And what if Alice and Bob are software, not human beings, and they're carrying on multiple conversations in parallel, at a mixture of machine and human speeds, at the same time?

Now you can have race conditions:

Conversation 1, step 6

Alice and Bob are in the middle of negotiating a mortgage that began when Bob reached out to A.endpoint[1]. The negotiation is driven over http endpoints by software, at machine speed. Bob is waiting at B.endpoint[1] for Alice's message where she commits to pay him back $1M over the course of 30 years.

Conversation 2, step 1

Alice emails Bob to switch to A.endpoint[2], signing the message with A.key[1].

Conversation 3, step 1

Alice changes her DID doc such that A.key[1] is replaced by A.key[2]. Suppose she writes this change to a ledger that typically has 10-60 seconds of global latency.

Conversation 1, step 7

Alice sends to Bob a nonrepudiable commitment, signed by A.key[1], that she will pay the mortgage back.

Can you see the problem? Bob could choose to believe the mortgage commitment is valid (imputing an order where Conversation 1, step 7 precedes Conversation 3, step 1). Depending on latency of the ledger and Bob's ledger cache, this might be quite rational. If he's worried about the sequencing, he could contact Alice to confirm -- but does he do that at A.endpoint[1] or A.endpoint[2]? That depends on the relative order he imagines for Conversation 1, step 7 and Conversation 2, step 1 (and maybe the relative order between 2.1 and 3.1, too). And note how easy it is to make this problem worse if Bob and Alice each have multiple endpoints and multiple keys rather than an assumed single state apiece. And don't even get me started on N-party conversations... (Don't tell me that all Bobs and Alices will centralize so they've reconciled all internal views of their own and everybody else's state. I'm not opposed to someone doing that, but I'm opposed to an imagined universe that requires that centralization. We have to allow the decentralization or what's the point of DIDs?)

Now, you can say, "It doesn't matter. Such decisions are out of scope for the spec. Bob will make whatever decisions he wants to make, and either decide he's satisfied that the mortgage is valid, or it's not. Nobody else cares, and the spec shouldn't, either." But I disagree. Can you see how a malicious attacker that can't see or alter the plaintext of any of these messages can still influence Bob's interpretation of reality by delaying or dropping some messages, or by monkeying with Bob's cache timing, and how Alice could deny a reality Bob believes in? (Alice to the judge: "No, judge. I rotated my key precisely because I was worried that a hacker who had co-opted my key [and, optionally, take your pick: endpoint[1] or endpoint[2]] would agree to that mortgage. And I told Bob so by updating my DID doc on the global ledger.") The attacker can't forge a signature, but (s)he can certainly cause messages to be seen in a different order. At the very least, this can be used for denial of service or faked misbehavior, and depending on message content, the stakes could be higher ("launch missile A", "launch missile B", "belay that order"... WHICH order?). For that matter, Alice herself could be malicious and influence Bob's interpretation. A mortgage is something that needs to be litigatable in a court of law, and if Bob's basis for accepting Alice's commitment is indeterminate, we've built a system on a foundation of shifting sand.

As long as we keep explaining our use cases with simplistic assumptions about a single conversation between two humans, at human speed, with no nefarious actors, we will continue to come to the erroneous conclusion that communicating service endpoints out of band to key changes is fine. But the only way I know to resolve this problem when we face the true complexity is to force ambiguity out of the relative order of changes between service endpoints and keys. This doesn't drive all ambiguity out of the system -- message order is still a bit unpredictable -- but it's no longer possible to play games based on different lines of control for the endpoint and the keys. (Those different lines of control are the essence of the problem: we use keys to control the state of a DID doc, but if we use something else to communicate about endpoints -- even if we sign our communications with keys -- we've created a split brain scenario that's exploitable.)

Look at how the interpretation of the parallel conversations changes if service endpoints are in the DID doc. Conversation 2 (changed endpoint) and Conversation 3 (changed key) are no longer independent, because they share the same source of truth. A malicious attacker (or decentralized chaos) can still wreak havoc with order of delivery between the mortgage conversation and the DID doc update. But now a well designed protocol can require Alice to sign not just a commitment to pay the mortgage, but a hash of the DID doc at the time of the signing. That hash now includes a service endpoint. And Bob can contact Alice at that service endpoint to get a confirmation, and Alice can't deny that she agreed to pay the mortgage back. There's no split brained Alice.

@agropper said:

@dlongley and @jandrieu - I would hope to avoid using VCs at this level, it
just seems too heavy as compared to zCap or GNAP but I could be wrong.

I believe this is similar to my other basic concern, which is that communicating a service endpoint by VC basically makes DIDs dependent on VCs, which in turn depend on DIDs. It's a circular dependency.

@dlongley replied by saying that VCs don't depend on DIDs. This is technically true but practically false. VCs allow any kind of URI as the identifier for a DID subject and the identifier for an issuer -- but I don't believe we have an abundance of production stacks where these identifiers are anything other than DIDs.

I assert that DIDs are a (much) lower level construct than VCs. Communicating the one piece of metadata about DIDs that is likely to be ubiquitous -- how to talk to the controller -- using VCs doesn't make DID docs totally useless on their own. But it means that any meaningful impl of DIDs must also have support for VC validation. That's like putting hostnames rather than naked IP addresses in an IP packet header, making IP depend on DNS. It's a BAD idea from a software architecture perspective.

I actually agree with the general sentiment behind Joe's and Manu's and Dave's comment -- that DID docs should be as simple as possible. I am fine taking out lots of things. Indeed, this is why I have never believed in JSON-LD-style extensibility for them. However, if taking out service endpoints introduces race conditions or obnoxious dependencies, we've gone too far.

I find @dhh1128 arguments for having essential service endpoints in the DID
Doc convincing.

I'm reminded of how we write contracts in general, not just mortgages. The
contract binds together the non-repudiable identities of the parties
(typically through a notary), the points of notification, the terms, and
the jurisdiction where the contract will be interpreted in case of dispute.
All four of these components are necessary.

In our DID case:

  • Alice is identified by a DID and the associated non-repudiable signature
  • the jurisdiction is in the method and resolution
  • the point of notification to Alice is a service endpoint
  • the terms are delegated by Alice to an authorization server via a service
    endpoint

The question then becomes: Can we collapse Alice's notification endpoint
and her authorization endpoint into a single endpoint and if so, is that a
good idea?

On Tue, Sep 22, 2020 at 7:18 PM Daniel Hardman notifications@github.com
wrote:

@agropper https://github.com/agropper said:

@dlongley https://github.com/dlongley and @jandrieu
https://github.com/jandrieu - I would hope to avoid using VCs at this
level, it
just seems too heavy as compared to zCap or GNAP but I could be wrong.

I believe this is similar to my other basic concern, which is that
communicating a service endpoint by VC basically makes DIDs dependent on
VCs, which in turn depend on DIDs. It's a circular dependency.

@dlongley https://github.com/dlongley replied by saying that VCs don't
depend on DIDs. This is technically true but practically false. VCs allow
any kind of URI as the identifier for a DID subject and the identifier for
an issuer -- but I don't believe we have an abundance of production stacks
where these identifiers are anything other than DIDs.

I assert that DIDs are a (much) lower level construct than VCs.
Communicating the one piece of metadata about DIDs that is likely to be
ubiquitous -- how to talk to the controller -- using VCs doesn't make DID
docs totally useless on their own. But it means that any meaningful impl of
DIDs must also have support for VC validation. That's like putting
hostnames rather than naked IP addresses in an IP packet header, making IP
depend on DNS. It's a BAD idea from a software architecture perspective.

I actually agree with the general sentiment behind Joe's and Manu's and
Dave's comment -- that DID docs should be as simple as possible. I am fine
taking out lots of things. Indeed, this is why I have never believed in
JSON-LD-style extensibility for them. However, if taking out service
endpoints introduces race conditions or obnoxious dependencies, we've gone
too far.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-697031873, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YOB7KFMTMZHPQADV3TSHEWELANCNFSM4QOKAB3A
.

I have found none of the comments arguing for removal of service endpoints compelling or sufficient to address the valid use cases that require them. While the conversation is interesting, what, at this point, is the intended outcome of this Issue in the absence of anything approaching consensus for removal of this feature?

I completely agree with @csuwildcat. I was amazed to see this thread had grown so long I had to invoke a search to find the comment I posted 21 days ago.

I was even more surprised to find that, after 3 more weeks of discussion, every point I made in that comment remains: a) true (it has six thumbs up), and b) unaddressed by any subsequent discussion.

Folks, we have a spec to finish. At the very start of the special topic call on service endpoints this Thursday (noon ET), I am going to make the following proposal (originally made by @OR13):

PROPOSAL: In DID Core we shall define an abstract data model for service endpoints the same way we have for verification methods. In that section we shall include a special warning about privacy considerations. In the Privacy Considerations section we shall include a more extensive warning. Lastly, in the Implementation Guide we shall also cover this topic in depth.

The scope of the abstract data model for a verification method allows a
controller to:

  • authenticate somewhere
  • sign something
  • rotate or recover cryptographic materials
  • assert a service endpoint(s)

What will be the scope for the abstract data model we’re defining for a
service endpoint?

On Wed, Sep 23, 2020 at 3:06 AM Drummond Reed notifications@github.com
wrote:

>
>

I completely agree with @csuwildcat https://github.com/csuwildcat. I
was amazed to see this thread had grown so long I had to invoke a search to
find the comment I posted 21 days ago
https://github.com/w3c/did-core/issues/382#issuecomment-685283132.

I was even more surprised to find that, after 3 more weeks of discussion,
every point I made in that comment remains: a) true (it has six thumbs
up), and b) unaddressed by any subsequent discussion.

Folks, we have a spec to finish. At the very start of the special topic
call on service endpoints this Thursday (noon ET), I am going to make the
following proposal (originally made
https://github.com/w3c/did-core/issues/382#issuecomment-684905119 by
@OR13 https://github.com/OR13):

PROPOSAL: In DID Core we shall define an abstract data model for service
endpoints the same way we have for verification methods. In that section we
shall include a special warning about privacy considerations. In the
Privacy Considerations section we shall include a more extensive warning.
Lastly, in the Implementation Guide we shall also cover this topic in
depth.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-697176633, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YJDTFF6T7U6VIUOGK3SHGM5PANCNFSM4QOKAB3A
.

PROPOSAL: In DID Core we shall define an abstract data model for service endpoints the same way we have for verification methods. In that section we shall include a special warning about privacy considerations. In the Privacy Considerations section we shall include a more extensive warning. Lastly, in the Implementation Guide we shall also cover this topic in depth.

You don't need to make that proposal... we already have that proposal agreed to via the initial resolutions we've made:

RESOLVED: Discuss in a non-normative appendix how one might model Service Endpoints that preserve privacy.
RESOLVED: Define an abstract data model for serviceEndpoints in normative text, like we have done with verification methods.
RESOLVED: Define how you do service endpoint extensions using the DID Spec Registry.

It's a no-op, and it's not the point of contention.

I'll also point out that the following sorts of statements are not helpful for those arguing for service endpoints in DID Documents:

@csuwildcat wrote:

I have found none of the comments arguing for removal of service endpoints compelling or sufficient to address the valid use cases that require them. While the conversation is interesting, what, at this point, is the intended outcome of this Issue in the absence of anything approaching consensus for removal of this feature?

To be crystal clear -- the point of contention is if service endpoints should be in DID Documents at all... if we lack consensus around that, the default is to remove the feature... not keep it.

Let me repeat that, because it seems like folks are missing the point wrt. consensus: If we can't agree on a feature being a good idea -- it gets removed. This is true for all features in the specification. Yes, formal objections can be overruled, but we really don't want to go down that road.

Also, to be clear, I'm not arguing for the removal of service endpoints (I'm pointing out that others might)... I'd just rather we specify at least one that we think would work well for GDPR and CCPA.

For example (I'm using bad but descriptive names on purpose -- we can bikeshed those later):

   "service": {
      "id": "#seeAlsoService",
      "type": "PrivacyProtectingCredentialService",
      "serviceEndpoint": "https://example.com/"
  }

The protocol for the PrivacyProtectingCredentialService allows one to tack the DID on to the serviceEndpoint and get more information related to the DID, expressed as Verifiable Credentials. So, doing this:

GET https://example.com/dids/did:example:123abc

will give you all the public VCs associated with did:example:123abc (including self-issued ones). The endpoint will only allow POSTS from did:example:123abc (authz'd via DIDAuth, or similar). The endpoint has to only be a domain (to ensure that PII isn't written to the ledger). DID Methods may further enforce restrictions (like, only a handful of PrivacyProtectingCredentialService domains known to pass privacy tests are allowed by a particular DID Method).

Defining this would enable a mechanism compliant with GDPR/CCPA today and provide a concrete example of the type of design that would pass muster from a privacy perspective. If you don't like this PrivacyProtectingCredentialService, you don't have to use it... you can use something else that you feel is better.

So, the counter-proposal that could allow us to keep service and close this issue could be:

PROPOSAL: Define a privacy-protecting service endpoint in the Service section of the DID Core specification.

That would allow us to start work on a PR that would address this issue and work out the details in that PR.

@msporny are you proposing we provide documentation for this term https://github.com/w3c/did-spec-registries/blob/master/contexts/did-v1.jsonld#L19

is the documentation essentially?:

"Use one or more proxies which strip HTTP headers, timing data, any fingerprinting vectors and provide denial of service mitigation in front of any service exposed via the did document".

"Recommend having a single service endpoint which is used to grant access to additional services in an automatic manner"

"Recommend not adding lots of services to a did document, similar to browser extensions, multiple services will be used to correlate users and attack their privacy".

"Recommend not exposing any services that do not have DNS privacy protection, or that expose an IP address in proximity to the DID subject".

"Recommend TOR....".... etc

@msporny are you proposing we provide documentation for this term https://github.com/w3c/did-spec-registries/blob/master/contexts/did-v1.jsonld#L19

Yes, or something like it. That term is a placeholder for this discussion.

is the documentation essentially? ...

Yes, that is one set of rules we could suggest. We'd work which ones to include in the PR, probably starting with a small set and seeing how restrictive we can get. Just my $0.02 below:

"Use one or more proxies which strip HTTP headers, timing data, any fingerprinting vectors and provide denial of service mitigation in front of any service exposed via the did document".

Yep... as a SHOULD/RECOMMENDED.

"Recommend having a single service endpoint which is used to grant access to additional services in an automatic manner"

I'm not quite sure what you mean by this one... need more information because I can express an opinion on it.

"Recommend not adding lots of services to a did document, similar to browser extensions, multiple services will be used to correlate users and attack their privacy".

Yep.

"Recommend not exposing any services that do not have DNS privacy protection, or that expose an IP address in proximity to the DID subject".

Yep.

"Recommend TOR....".... etc

Yep.

Again, the smaller the SHOULD set we start with, the easier it'll be to get to something better than what we have today (which is effectively no example on what a good, privacy-preserving service endpoint looks like).

I'll also note that much of this doesn't apply to peer-wise DID Methods or other non-DLT methods.

If we can't agree on a feature being a good idea -- it gets removed.

If you're asserting a feature that was added via consensus years ago, is used by implementers (in production-level implementations), and is relied on by other specs, gets removed by default because a subset of folks take issue with how it could be used in a subset of cases literally years afterward, I just don't agree with that on any level.

I was also under the impression we had decided on a past call that this wasn't a question of whether SEs were in or out, but how we message certain privacy considerations. It's ridiculous to me that the assertion now appears to be that those were all basically tangential proposals with no bearing or implication on the retention of SEs.

I just don't agree with that on any level.

Then you disagree with the W3C Process, which is fine, but I can't do anything about that.

Please focus on putting forward concrete proposals that can achieve consensus. I've put one forward in https://github.com/w3c/did-core/issues/382#issuecomment-697362108 -- if you'd be ok with that, we can resolve this issue and move on.

I cannot support the addition of ANY specific Service Endpoints in the spec that cast legal/political shadows on others by inference. We should not codify ANY techno-legal formulations of specific features and their applied external uses/impacts. This very bad idea I believe falls outside the bounds of what we should be doing as a technical specification body.

I cannot support the addition of ANY specific Service Endpoints in the spec that cast legal/political shadows on others by inference.

Please put forward a proposal that will achieve consensus.

Proposal: Along with codification of the general data model/format of Service Endpoints, add to an appendix or usage guide the considerations one should take into account when constructing Service Endpoint entries that must account for varying levels of privacy, as illustrated through examples.

^ To clarify what I mean in the last part of the text above: some uses of Service Endpoints do not require involve the same privacy considerations others do. An example of that would be a company with a DID that wants to note the Web domain origins it also controls. Other cases, where a DID owner is trying to remain as anon as possible, you may face an entirely different set of considerations, for which we can provide examples of what to do/not to do.

Proposal: Along with codification of the general data model/format of Service Endpoints, add to an appendix or usage guide the considerations one should take into account when constructing Service Endpoint entries that must account for varying levels of privacy, as illustrated through examples.

To break that proposal down:

Along with codification of the general data model/format of Service Endpoints

We resolved to do this a few weeks ago -- https://github.com/w3c/did-core/issues/382#issue-688206221

add to an appendix or usage guide the considerations one should take into account when constructing Service Endpoint entries that must account for varying levels of privacy,

We resolved to do this a few weeks ago -- https://github.com/w3c/did-core/issues/382#issue-688206221

as illustrated through examples.

It's arguable that we've resolved to already do this as well, but if we assume we didn't yet agree to do that: I'll take the only remaining part of your proposal that we haven't already agreed to as "PROPOSAL: Add concrete examples to demonstrate how service endpoints might account for varying levels of privacy requirements."

Is that aligned with your mental model?

It would be best if an abstract data model for service endpoints did not dilute the privacy features of DIDs.

So, I ask the experts again https://github.com/w3c/did-core/issues/382#issuecomment-697283182.

@dhh1128,

Can you see the problem? Bob could choose to believe the mortgage commitment is valid (imputing an order where Conversation 1, step 7 precedes Conversation 3, step 1).

...

Can you see how a malicious attacker that can't see or alter the plaintext of any of these messages can still influence Bob's interpretation of reality by delaying or dropping some messages, or by monkeying with Bob's cache timing, and how Alice could deny a reality Bob believes in? (Alice to the judge: "No, judge. I rotated my key precisely because I was worried that a hacker who had co-opted my key [and, optionally, take your pick: endpoint[1] or endpoint[2]] would agree to that mortgage. And I told Bob so by updating my DID doc on the global ledger.") The attacker can't forge a signature, but (s)he can certainly cause messages to be seen in a different order. At the very least, this can be used for denial of service or faked misbehavior, and depending on message content, the stakes could be higher ("launch missile A", "launch missile B", "belay that order"... WHICH order?). For that matter, Alice herself could be malicious and influence Bob's interpretation. A mortgage is something that needs to be litigatable in a court of law, and if Bob's basis for accepting Alice's commitment is indeterminate, we've built a system on a foundation of shifting sand.

...

A malicious attacker (or decentralized chaos) can still wreak havoc with order of delivery between the mortgage conversation and the DID doc update. But now a well designed protocol can require Alice to sign not just a commitment to pay the mortgage, but a hash of the DID doc at the time of the signing. That hash now includes a service endpoint. And Bob can contact Alice at that service endpoint to get a confirmation, and Alice can't deny that she agreed to pay the mortgage back. There's no split brained Alice.

I think there are multiple threats being conflated above. One is from an attacker that can only prevent, delay, or reorder messages. Another is from an attacker that can sign with Alice's keys. These are very different threats. It's hard to analyze the entire scenario as written because it doesn't get down to the primitives and keep the threats clear. Either Alice signed that mortgage commitment with A.key[1] or she didn't -- either Bob has to consider it could have been Carol instead or not.

TL; DR: In the given scenario, someone signed that mortgage commitment and Bob always has to consider that it could have been Carol, within some risk profile, not Alice. If Alice signed it, then she intended to make the commitment and there's no problem.

There should be no way for an attacker that is limited to preventing, delaying, or reordering messages to cause Bob to think Alice agreed to a contract that she didn't agree to. I, of course, agree with this. Here we are assuming that Alice's keys have not been compromised but that she can also change those keys at will. The idea that the attacker here could make Alice agree to a contract she didn't want is a problem with the semantics of the messages she signs, not their order. It doesn't matter where the messages come from -- as the trust is in the key used to sign and that it is under Alice's control -- which it is here, by assumption.

If Bob trusts the key in the DID Document, then he trusts whatever it signs as authentically coming from Alice. If signing an endpoint is important, include that in what gets signed. It doesn't have to be in the DID Document to do that. This trust is rooted in the use of the key, not in whatever additional data is present in the DID Document. If it were any different, then it would be as I said above, any X that Bob wants to trust as being bound to a key must be put into the DID Document. If Alice can simply say "Well, that endpoint wasn't in my DID Document at the same time as the key I signed with" then what about the other things she signed? The contract said she agreed to a 30 year term but that wasn't in her DID Document at the time, it was only in the signed contract, so it doesn't count?

No, you may say -- this doesn't apply because it isn't about the communication channel. I think that's an arbitrary distinction. It's getting inserted in here because we're considering an attacker that can only prevent, delay, or reorder her messages -- but a protocol should not allow that to change the semantics of her messages. Either Alice signed that mortgage commitment or she didn't. This is a protocol and data modeling problem. It is not solved by binding an endpoint to the DID Document. Remember, we're assuming Alice's keys are not compromised in this case. We can introduce that next so that Bob may not know whether or not Alice is Alice.

So the other threat here is from actual key compromise. No amount of binding will solve the problem here. Because this problem cuts at the core assumption for trust in the system (the security/control of the key). Everything rests on that, so throwing out the assumption that the key is not compromised creates an unsolvable problem.

Suppose Alice's keys and endpoint were compromised before the "DID Document hash" was signed. Carol is actually the one that signed that hash and responded on the same service endpoint in the affirmative. This all happens before Alice notices. Carol completes everything with Bob, fully impersonating her. Only then Alice updates her DID Document with a new key and service endpoint because she suspects there could have been a breach. Alice will later deny that she agreed to pay the mortgage back. Whether or not the service endpoint was in the DID Document is irrelevant. There are other variants here as well. You can just keep shuffling these events around and creating problems not because the service endpoint is or isn't in the DID Document, or any other X for that matter, but because the core guarantee of the integrity of the system got its knees knocked out -- the key was compromised. Alice can claim that at any time -- a judgement would need to be made on other factors to determine the legitimacy of her claim.

There's no way to ensure a scenario in which Alice cannot possibly say her key was stolen -- there are no protections against it, precisely because it is an assumption in the system. Alice isn't her key -- and there's no brand of atomicity you can add to the system to make her the same as her key. Carol can use it too if she gets access to it.

So, I don't believe the sorts of problems can be avoided by simply having a "single state apiece". It also seems to ignore what would happen when the endpoint itself is a privacy-preserving router/proxy/negotiation mechanism for arriving at some other endpoint for communications. This stuff is all asynchronous and other measures should be taken to avoid confusing the semantics of certain messages to address the first threat.

In short, I don't buy this line of argumentation for requiring service endpoints in DID Documents.

It's arguable that we've resolved to already do this as well, but if we assume we didn't yet agree to do that: I'll take the only remaining part of your proposal that we haven't already agreed to as "PROPOSAL: Add concrete examples to demonstrate how service endpoints might account for varying levels of privacy requirements."

@msporny Thanks for the clarifications above. I support this proposal. If we can see if there is consensus on this proposal at the start of tomorrow's special call on this topic, then we can spend the majority of the call agreeing on what a PR (or PRs) need to include—and who is going to do them. How's that sound?

Okay, Dave. This is fascinating. I think that another assumption might be getting in our way. Let me pare back all the cruft and see if I can expose it. And I apologize for adding another tome to this incredibly long thread, but this is important stuff to understand, even if we retain diverged perspectives when it's done.

In my worldview:

An endpoint isn't guaranteed to provide duplex (two-way) communication.

This is quite different from the mental model in classic web services. You call an endpoint, and the server communicates back to you over the same socket. Messages/payloads/data can flow either direction (either as a request from the caller, or as a response from the server).

I want to support endpoints that are one-way. Imagine an endpoint that's a sink for a message queue, for example: amqp://192.168.4.25:2948. Such an endpoint is a sink where listening occurs, but it's not a way to talk back. Talking and listening are two different activities, sometimes but not always combined. I think endpoints MUST be a place to listen but only MAY be a place to talk back. (Simplex support is required for highly asynchronous HTTP, for partial tunnels and mixed-transport usage, for some onion routing, for passive modes, and for lots of transports that aren't HTTP.)

Anyway, the reason this is relevant is that it seems to me (please correct me if I'm wrong) that commenters on this thread may be assuming different things about the relative capabilities of participants in the system. You, Joe, Manu, and maybe others seem to be assuming that if Alice can say something to Bob, then she can also receive Bob's responses, and vice versa.

On the other hand, I'm assuming that if Alice can say something to Bob, that doesn't imply anything about whether Bob can say something back. The thing that turns a simplex channel into a duplex channel is having the service endpoint of the other party, not the fact that the active endpoint's URI begins with "https". Alice can say something to Bob over Bob.endpoint; Bob is only guaranteed to be able to say something back to Alice if he knows Alice.endpoint.

In such a model, control over keys is only helpful when saying things (outbound). It doesn't help when listening (inbound). Yet a DID controller needs to exercise control over how they listen IN ADDITION to how they speak. And -- THIS IS THE CRUCIAL POINT -- it's really problematic to exercise control over how you listen by controlling how you speak, if communication isn't guaranteed to be duplex.

Think about this in the physical world. You control how you speak by impulses to your vocal cords. How do you control what you hear? Well, you could also do that via vocal cords: "Alexa, play 'Some Kind of Blue' by Miles Davis." This is the analog to publishing a VC (speaking) to change your endpoint (listening). But this doesn't always work in the physical world. If the environment is noisy, or you don't speak the right language, or you have an uncooperative guitar soloist in the next room, what do you do? You exert control in a different way, by clapping your hands over your ears, getting headphones, unplugging the amp, or taking a walk.

It doesn't always work in the virtual world, either. How do you deliver a VC announcing your new endpoint, when the only channel that exists is incoming simplex? Answer: use another channel that's simplex the other way. But...

Often, it's desirable to coordinate the timing or sequencing of talking and listening. Hard to do when there's no mutexing mechanism between the two channels. My extended scenario above tried to explain why NOT coordinating could have security consequences in the context of DID docs. You are right that I'm combining multiple threats, but I'm not conflating them. Most exploits take advantage of how multiple weaknesses come together to expose a composite vulnerability. Just because they're not individually dangerous does not mean they're harmless in the aggregate.

You say that the entire system we've built rests on the assumption that keys are not compromised. Which of the following do you mean?

The entire system rests on the assumption that keys are not compromised BEFORE THEY ARE ROTATED.

Or the more ambitious variant:

The entire system rests on the assumption that keys are not compromised EVEN AFTER THEY ARE ROTATED.

I am assuming the former, because why would we ever rotate keys otherwise? But it seems you might be claiming the latter, because that's the only model that justifies a claim that whether Alice's key signed -- not when it signed -- is the only relevant question. If you buy the first perspective, then relative timing matters. (Yes, there's always the possibility that disputes can arise, even if you can prove relative timing. Alice can claim her child grabbed her phone and signed the mortgage. But it's now possible to constrain a signer's accountability to a time range. This is quite valuable -- again, otherwise why would we rotate keys?)

And there's the nub. How do you announce either rotation or compromise? By emitting a signed statement of some kind. In other words, you talk (vocal cords, outbound comms). How do you confirm the order of a signed statement relative to some interesting event, when inevitable complications arrive in a decentralized world? By responding to a question -- which requires you to listen. But you can't respond to a question in a determinate sequence if saying something and exercising control over how you listen are bifurcated. In a duplex channel, you get a natural ordering. But that's the very thing that gets lost if endpoints are removed from a DID doc (given my claim that you can't assume duplex for arbitrary channels that might be used).

The simplex - duplex perspective is useful. As a DID-controller, I control if and how my DID Doc will be discovered. I might send my DID to Bob or I might post my DID in a public directory along with metadata that, for instance might allow me to get targeted offers based on my Zip-code. Neither of these is a privacy problem in themselves because the intention is clear.

The next thing that happens is either Bob resolves my DID document or Bob looks up my DID in some well-known place, like Google the way we Google for UPS Tracking Numbers rather than bothering to prepend http://ups.com/ . Alice can't intentionally control this step without creating a new DID at every opportunity because Alice has absolutely no control over what metadata anyone our there attaches to any DID.

This means that the only thing Alice can do intentionally is to control a service endpoint in the DID document. This does not solve the problem of Bob or anyone else posting her DID along with some metadata but at least it decreases the incentive for people to Google DIDs because it's easier than resolving them. In may cases, I prefer to Google an identifier because I hope to find out negative or unintended associations that contribute to the reputation of the subject of that identifier.

Either way, once Bob has either resolved a DID or Googled it, Bob has an endpoint to use. Bob now has to formulate a message or a request. Alice's service endpoint bears the cost of dealing with the spam.

When the spam is too much, Alice decides to deal with her endpoint for incoming messages the way she would deal with a compromised private key - rotate it. How does Alice convince Google and the other thousand data brokers to rotate or forget the service endpoint they have on file?

My point is that it's not enough for Alice to control her DID document. She must also be able to run a very powerful spam filter, one that imposes significant costs on Bob to produce credentials, a data scope, and a purpose for any incoming communication to any endpoint that Alice might be listening to.

The default service endpoint in a DID document needs to be an authorization server or a mediator because those are the only two types that actually give Alice a prayer for filtering the spam.

Resolutions from https://www.w3.org/2019/did-wg/Meetings/Minutes/2020-09-24-did-topic

Resolution 1: The ability for a controller to optionally state at least one service endpoint in the DID Document increases their control and agency

Resolution 2: Add concrete examples to the Privacy Considerations section to demonstrate how service endpoints might account for varying levels of privacy requirements.

Resolution 3: Add privacy guidance that establishes that there is a privacy spectrum and publication strategies along that privacy spectrum of how service endpoints might be published.

Resolution 4: Add privacy guidance that discourages services from being expressed in DID Documents that are published to Verifiable Data Registries unless a DID Method specification have given specific guidance about how privacy concerns are addressed.

These resolutions help us get to closure on this issue by:

  • Establishing that DID Core will specify the service property.
  • Establishing that we will document the arguments for and against publishing service endpoints in DID Documents on VDR registries, as well as their privacy implications in DID Core.

This issue can be resolved by writing a PR that addresses the resolutions raised by the group. This issue is waiting for an editorial PR to be written, thus is a low priority to get done before CR.

Wow, what a thread and subsequent meeting on 09/24. I learned a lot, so thanks to everyone for their great contributions and thoughtful discussion.

At risk of opening a can of worms that has seemingly been shut, I see value in introducing an optional hiddenService (or unpublishedService?) core property that defines a single service endpoint for external users to request access to a set of hidden serviceEndpoints that must be requested at a point in time.

This hiddenService endpoint would only support a sub-set of common protocols (HTTP...) and auth methods (TBD). The endpoint would be expected to return a signed DID document listing all the service endpoints visible to the requester.

This allows the spec to:

  • be explicit about distinguishing between visible / hidden service endpoints
  • provides a means to "discover" hidden service endpoints if you only have a user's DID. If the endpoint is hit with no authorization a list of public endpoints can be returned, but they are never published so the controller can remove them at any time
  • enable dynamic endpoints to exist, whereby a requestors credentials could determine a subset of private service endpoints to be returned (ie: I'm okay with letting Google contact me via Twitter, but not via my PornHub account). As discussed in this thread, I could alternatively provide many VC's to Google. However, imagine if I have 50 different accounts, that's a lot of data to send (you can't embed all that in an onboarding URL!), plus anytime that information changes I need to resend every serviceEndpoint to Google (and others). I'm better off providing Google with an auth token to access my hidden service endpoints, dynamically controlling which endpoints are returned by Google's auth token (or equivalent).
  • avoid breaking the existing services property, so the spec can clearly state that services is a list of public endpoints and should be used with extreme caution.
  • state that any serviceEndpoints returned from the hiddenService should not be published / indexed and doing so could put the publisher in breach of various laws due to PII information.

How would hiddenService be different from a GNAP-protected resource?

On Fri, Nov 27, 2020 at 6:40 AM tahpot notifications@github.com wrote:

Wow, what a thread and subsequent meeting on 09/24. I learned a lot, so
thanks to everyone for their great contributions and thoughtful discussion.

At risk of opening a can of worms that has seemingly been shut, I see
value in introducing an optional hiddenService (or unpublishedService?)
core property that defines a single service endpoint for external users to
request access to a set of hidden serviceEndpoints that must be requested
at a point in time.

This hiddenService endpoint would only support a sub-set of common
protocols (HTTP...) and auth methods (TBD). The endpoint would be expected
to return a signed DID document listing all the service endpoints visible
to the requester.

This allows the spec to:

  • be explicit about distinguishing between visible / hidden service
    endpoints
  • provides a means to "discover" hidden service endpoints if you only
    have a user's DID. If the endpoint is hit with no authorization a list of
    public endpoints can be returned, but they are never published so the
    controller can remove them at any time
  • enable dynamic endpoints to exist, whereby a requestors credentials
    could determine a subset of private service endpoints to be returned (ie:
    I'm okay with letting Google contact me via Twitter, but not via my PornHub
    account). As discussed in this thread, I could alternatively provide many
    VC's to Google. However, imagine if I have 50 different accounts, that's a
    lot of data to send (you can't embed all that in an onboarding URL!), plus
    anytime that information changes I need to resend every serviceEndpoint to
    Google (and others). I'm better off providing Google with an auth token to
    access my hidden service endpoints, dynamically controlling which endpoints
    are returned by Google's auth token (or equivalent).
  • avoid breaking the existing services property, so the spec can
    clearly state that services is a list of public endpoints and should
    be used with extreme caution.
  • state that any serviceEndpoints returned from the hiddenService
    should not be published / indexed and doing so could put the publisher in
    breach of various laws due to PII information.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-734816706, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YIIBPQPMHK43ICWYZDSR6M5DANCNFSM4QOKAB3A
.

As far as I understand GNAP, hiddenService could well be a GNAP-protected resource with the spec defining the structure of the returned resource.

I don't see the value of hiddenService as a separate field... but I do see the value of a service type which accomplishes the same functionality, also, related to hidden services: https://github.com/BlockchainCommons/did-method-onion

DIDComm supports the disclosure of supported protocols at the discretion of the DID owner for all the reasons stated above, without a separate hiddenService field. Given that disclosure is likely to be interactive in some way, specifying the method to do so in the spec seems limiting to the development of new and improved ways to accomplish that disclosure.

@TelegramSam agree, I would personally like to see a service type of DIDComm or something similar... so that you can start interrogating the service directly...

@OR13 I thought DIDComm was transport agnostic. How would you know what transport to use? Or is there a standard http API that provides a DIDComm binding for URLs?

@jandrieu assuming the serviceEndpoint=https://example.com/... you would know its HTTP ready. AFAIK the did core spec has no examples other than HTTP, but I assume there might be service definitions that might express type=DIDComm transport=bluetooth... DID Core would be responsible for defining services sufficiently to support transport agnosticism, IMO its not doing a great job of that today.

@OR13 Ok. That matches my expectation. The addition of a transport property might do the trick, but I'll leave that to DIDComm folks.

assuming the serviceEndpoint=https://example.com/... you would know its HTTP ready

Yes.

Or you could have...

  • type=DIDComm and endpoint=mailto:[email protected] OR
  • type=DIDComm and endpoint=kafka:kafka.agentsrus.com/DIDComm OR
  • type=DIDComm and endpoint=bluetooth:mydeviceid OR
  • type=DIDComm and endpoint=post:123+main+street+Anywhere+USA+12345 OR
  • type=DIDComm and endpoint=s3:bucketid OR
  • type=DIDComm and endpoint=tor:foo.onion/xyz ...etc

In all cases, the encryption/packaging/security guarantees are identical. The logical bytes of the messages are also identical, although they may be MIME-encoded for email or use transfer chunk encoding with HTTP POST. This is what is meant when we say that DIDComm is transport agnostic.

DIDComm runs arbitrary protocols, so you never need more than one endpoint. One of the protocols you can run is a feature discovery protocol that lets you discover what other protocols the other party supports/is willing to engage in. A hidden service endpoint is thus unnecessary; any agent gets to decide what services it wants to expose to each party that contacts it there.

A hidden service endpoint is thus unnecessary; any agent gets to decide what services it wants to expose to each party that contacts it there.

That makes sense, so the "feature discovery protocol" could be used via DIDComm to expose additional services.

However, the definition of those services would differ from the serviceEndpoint spec within DID Core. Is that inconsistency acceptable? Is the dependency on DIDComm for discovery of these additional services acceptable?

>

However, the definition of those services would differ from the
serviceEndpoint spec within DID Core. Is that inconsistency acceptable?

I'm not sure. Could you say more about what inconsistency you're noticing?

Is the reliance on DIDComm for discovery of these additional services
acceptable?

Probably not for everyone. I wasn't arguing that everyone should adopt
DIDComm; I was just explaining why, if you assume DIDComm, you don't need a
solution for this additional challenge.

>

Probably not for everyone. I wasn't arguing that everyone should adopt DIDComm; I was just explaining why, if you assume DIDComm, you don't need a solution for this additional challenge.

I think that's the heart of what I'm trying to say.

While it's technically possible to use DIDComm (or another type), we can't assume everyone is going to use DIDComm to communicate a "hidden serviceEndpoint".

As it seems very useful to support the concept of a hidden serviceEndpoint (or similar), I would prefer to see such capability explicitly defined in the spec.

It’s a ‘chicken and egg’ type of problem for interoperability.

  • Does the transport come first (to an inbox, for an authorization request)?
    or
  • Does the publication come first (to reveal a resource somewhere that you
    may or may not be authorized to access)?

The chicken arrow goes in one direction and the egg arrow goes in the
other direction relative to the DID controller subject. That’s the “hub” of
the problem.

On Thu, Dec 10, 2020 at 5:19 AM tahpot notifications@github.com wrote:

Probably not for everyone. I wasn't arguing that everyone should adopt
DIDComm; I was just explaining why, if you assume DIDComm, you don't need a
solution for this additional challenge.

I think that's the heart of what I'm trying to say.

While it's technically possible to use DIDComm (or another type), we can't
assume everyone is going to use DIDComm to communicate a "hidden
serviceEndpoint".

As it seems very useful to support the concept of a hidden serviceEndpoint
(or similar), I would prefer to see such capability explicitly defined in
the spec.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/382#issuecomment-742426599, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABB4YPQHL5LOJIFA75B2ILSUCOEFANCNFSM4QOKAB3A
.

I have authored PR #511 to address the resolutions the WG made here: https://github.com/w3c/did-core/issues/382#issuecomment-703119593

This issue can be closed once PR #511 is merged.

Made suggested changes to https://github.com/w3c/did-core/pull/511#pullrequestreview-556071256 that I believe are consistent with the four resolutions.

I turned Adrian's latest suggestions into PR #515, which is an alternative embodiment of the resolutions made here. If accepted, this would supersede PR #511.

This issue will be closed once PR #515 is merged.

PR #515 has been merged, closing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shigeya picture shigeya  Â·  4Comments

brentzundel picture brentzundel  Â·  7Comments

dmitrizagidulin picture dmitrizagidulin  Â·  6Comments

rhiaro picture rhiaro  Â·  3Comments

TallTed picture TallTed  Â·  3Comments