Did-core: DID Doc Encoding: Abstract Data Model in JSON

Created on 26 Nov 2019  ·  84Comments  ·  Source: w3c/did-core

DID Doc Encoding: Abstract Data Model in JSON

This is a proposal to simplify DID-Docs by defining a simple abstract data model in JSON and then permitting other encodings such as JSON-LD, CBOR, etc. This would eliminate an explicit dependency on the RDF data model.

Universal Adoptability

For universal interoperability, DIDS and DID-Docs need to follow standard representations. One goal of the DID specification is to achieve universal adoption. Broad adoption is fostered by using familiar representations or encodings for the DID and DID Doc. The DID syntax itself is derived from the widely adopted and highly familiar URI/URL identifier syntax. This takes advantage not only of familiarity but also the tooling built up around that syntax. Likewise greater adoption is fostered to the degree that the DID Doc representation or encoding uses a familiar widely adopted representation with extant tooling.

The only reason not to use a highly familiar representation is if the requirements for representation demand or greatly benefit from a less familiar representation. The appendix at the end of this document provides some detail about the main purposes of a DID Doc. This shows that a complex representation is not required and may not be beneficial.

In addition, having only a single representation or encoding, albeit highly familiar and widely adopted, may be insufficient to achieve universal adoption. It may require multiple representations or encodings.

Multiple encodings require a standard base encoding from which they may be derived. Or in other words the least common denominator from which other encodings may be derived.

One way to accomplish this is to use an abstract data model as the standard encoding and then allow for other encodings. This was proposed in the following issue:
https://github.com/w3c/did-core/issues/103#issuecomment-553532359

The problem with an abstract data model is that the syntax is expressed in some abstract modeling language, typically a kind of pseudo code. Pseudo code is usually less familiar than real code. This means that even in the major case the spec is written in a language that is unfamiliar. This runs counter to fostering broader adoption. A solution to this problem is to pick a real language encoding for the abstract data model that then provides both an abstracted standard encoding that other encodings can more easily be derived from and also provides the lowest common denominator standard encoding.

Clearly given the web roots of the DID syntax itself as a derivation of URL syntax, JSON's web roots would make it the ideal candidate for an abstract data model language. Of any encoding available, JSON is the closest to a universally adopted encoding. JSON is simple but has sufficient expressive power to model the important data elements needed. It is therefore a sufficient encoding. Annotated JSON could be used to model additional data types such as an ordered mapping (in the event that they are needed). Many of the related standards popular among implementors such as the JWT standards are based on JSON. Casual conversations with many others in the community seem to suggest that a super majority of implementors would support JSON as the standard encoding for the combined abstract data model and default encoding.

Given JSON's rampant familiarity, it should not pose a barrier to implementors of other optional encodings such as JSON-LD or CBOR. Compared to pseudo-code It should be just as easy if not easier to translate JSON to another encoding.

The Elephant in the Room

The result of this proposal would be to make JSON the standard encoding for the DID Doc specification and demote JSON-LD to be an optional encoding. The current DID spec uses JSON-LD as the preferred encoding but does not prevent the use of naive JSON as an encoding. However the DID spec mandates JSON-LD elements that show up as artifacts when using JSON that a JSON implementer must handle specially. Moreover, the semantics of JSON-LD are much more restrictive than JSON. This results in a lot of time being expended unproductively in community meetings discussing the often highly arcane and non-obvious details of JSON-LD syntax and semantics. The community is largely unfamiliar with JSON-LD. It is clear that JSON is sufficient to accomplish the main purposes of the DID Doc. Although JSON-LD may provide some advantages in some cases, its extra complexity runs counter to the goal of fostering more universal adoption. This proposal does not exclude JSON-LD but would encapsulate and isolate discussion about the esoteric syntax and semantics of JSON-LD to that subset of the community that really wants JSON-LD. Each optional encoding including JSON-LD would have a companion specification to the DID spec that defines how to implement that encoding. This structure will make it easier to implement other encodings in the future because JSON is much closer to a lowest common denominator data model than JSON-LD.

The relevant questions up for decision are:

  • Is JSON a sufficient encoding for the purpose of DID Docs ?
  • Would JSON foster greater adoption than some other encoding such as JSON-LD ?
  • Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

The purpose of this proposal is not to debate the general good and bad of JSON-LD and RDF. There is much good in JSON-LD for many applications. But, relevant here is that JSON-LD is not as well aligned as JSON with the goal of fostering universal adoption. More specifically the RDF model employed by JSON-LD complicates the implementation of other encodings that do not share the RDF data model and RDF semantics. JSON does not suffer from this complication. This complication has the deleterious effect of slowing adoption.

Appendix

Purpose of DID-Doc

The current DID specification includes a specification for a DID Document (DID-Doc). The main purpose of the DID-Doc is to provide information needed to use the associated DID in an authoritative way.

A distinguishing feature of a DID (Decentralized Identifier) is that the controller (entity) of the DID obtains and maintains its control authority over that DID using a decentralized root of trust. Typically this is self-derived from the entropy in a random number (expressed as collision resistance) that is then used to create a cryptographic public/private key pair. When the identifier is universally uniquely derived from this entropy then the identifier has the property of self-certifiability. Another somewhat less decentralized root of trust for an identifier is a public ledger or registry with decentralized governance.

In any event, a more-or-less decentralized root of trust only has value if other entities recognize and respect that root of trust. Hence portable interoperable decentralized identifiers must be based on an interoperable standard representation. Hence the DID standard.

In contrast, "administrative" identifiers obtain and maintain their control authority from a centralized administrative entity. This control authority is not derived from the entropy in a random number. This statement may be confusing to some because administrative identifiers often use cryptographic public/private key pairs. To explain, PKI with public/private key pairs and cryptographic digital signatures enables the conveyance of control authority via signed non-repudiable attestations. But the source of that control authority may or may not be decentralized. Thus an administrative entity may convey trust via PKI (public/private keys pairs) but does not derive its control authority therein. Whereas a decentralized entity may derive its control authority over a DID solely from the entropy in the random seed used to generate the private key in a PKI public/private key pair.

A key technology under pining DIDs is cryptographic signatures by which the control authority over the associated DID and affiliated resources may be verified by any user of the DID. In contrast an administrative identifier always has, as a last recourse, appeal to the authority of the administrative entity and to whatever means that authority is established.

Indeed, given the foregoing explanation, the most important task facing a user of a DID is to cryptographically verify control authority over the DID so that the user may then further cryptographically verify any attestations of the controller (entity) about the DID itself and/or affiliated resources. The verifications must be cryptographic because, with a decentralized root of trust, the original control authority was established cryptographically and the conveyance of that control authority may only be verified cryptographically. With DIDs it's cryptographic verification all the way down.

From this insight we can recognize that a DID-Doc should support a primary purpose and a secondary purpose as follows:

  • Primary: Aid the user in cryptographically verifying the current control authority over the DID.

  • Secondary: Aid the user in discovering and verifying anything else affiliated with the DID based on the current control authority.

If the user cannot determine the current control authority over the DID then the information in the DID Doc cannot be authoritatively cryptographically verified. _Consequently, absent verified control authority, any use of the DID Doc for any purpose whatsoever is at best problematic._

Process Model for Establishing Cryptographic Control Authority

As mentioned above a fully decentralized identifier is self-certifiable. Other partially decentralized identifiers may be created on a ledger or registry with decentralized governance. The first case is the most important from a process model point of view. The second case is less informative.

The root of trust in a self-certifying identifier is the entropy used to created a universally unique random number or seed. Sufficient entropy ensures that the random seed is unpredictable (collision resistant) to a degree that exceeds the computational capability of any potential exploiter for some significant amount of time. Currently 128 bits of entropy is considered sufficient.

That random seed is then converted to a private key for a given cryptographic digital signature scheme. Through a one-way function, that private key is used to produce a public key. The simplest form of self-certifying identifier includes that public key in the identifier itself. Often the identifier syntax enables it to become a self-certifying name-space where the public key is used as a prefix to a family of identifiers. Any attestation signed with the private key may be verified with the public key. Because of its universal collision resistance no other identifier may be associated with a verifiable attestation. This makes the identifier self-certifying.

Furthermore, instead of the public key itself the identifier may include a fingerprint of the public key. In order to preserve the cryptographic strength of the root of trust in the random seed, the fingerprint must have comparable collision resistance to the original random seed. The application of further one-way functions can be applied successively to produce successive derived fingerprints. This is similar to how hierarchically deterministic key chains are generated. To restate, a one-way function may be applied to the public key producing a derived fingerprint and then another to that fingerprint and so one. The collision resistance must be maintained across each application of a one-way function.

Instead of merely deriving a simple fingerprint, one could take the public key and use it as a public seed that when combined with some other data may be transformed with a one-way function (such as a hash) to produce yet another fingerprint. As long as the process of creation of any derived fingerprint may be ascribed universally uniquely to the originating public/private key pair, the resultant derived identifier may be uniquely associated with attestations signed with the private key and verifiable with the public key. This makes the eventually derived identifier also self-certifiable.

Rotation

The problem is that over time any public/private key pair used to sign attestations becomes weakened due to exposure via that usage. In addition, a given digital signature scheme may become weak due to a combination of increased compute power and better exploit algorithms. Thus to preserve cryptographic control of the identifier in the face of exposure, the originating public/private key may need to be rotated to a new key pair. In this case the identifier is not changed, only the public/private key pair that is authoritative for the identifier is changed. This provides continuity of the identifier under changes in control of the identifier. This poses a problem for verification because there is no longer any apparent connection between the newly authoritative public/private key pair and the identifier. That connection must be established by a rotation operation that is signed by the previously authoritative private key. The signed attestation that is the signed rotation operation transfers authoritative control from one key pair to another. Each successive rotation operation performs a transfer of control.

State Machine Model of Control Authority

To summarize, control authority over a decentralized identifier is originally established though a self-certification process that uniquely associates an identifier with a public/private key pair. Successive signed rotation operations may be then used to transfer that control authority to a sequence of public/private key pairs. The current control authority at any time may be established by starting at the originating key pair and then applying the successive rotation operations in order. Each operation is verified via its cryptographic signature.

The process and data model for this is a state machine. In a state machine there is a current state, an input event and a resultant next state determined by state transition rules. Given an initial state, a set of state transition rules, replaying a sequence of events will always result in the same terminal or current state. This is a simple unambiguous process model. The data model is also simple. It must describe the state and the input events. There is no other data needed. The state is unambiguously and completely determined by the initial state, the transition rules and events. No other context or inference is needed. A simple representation will suffice.

Once the current control authority for a DID has been established to be a given key pair (or key pairs) then any other information affiliated with that DID may be cryptographically verified via a signed attestation using the current key pair(s). The important information needed to establish the authoritative stature of any additional information such as encryption keys or service endpoints is the current authoritative signing key pair(s) for the identifier and that the version of the information in the DID Doc is sourced from the controlling entity of the current key pair(s). This means the DID Doc may benefit from an internal identifier that corresponds to the latest rotation event that establishes the current key pair(s) or some other identifier that associates the DID Doc with specific signing key pair(s). This process of first establishing authoritative key pair(s) greatly simplifies the cryptographic establishment of all the other data.

There are various mechanisms that may be employed to maintain the state and associated event sequence. These could be as simple as a set of servers with immutable logs for the events/states that also run the code for the state transition logic. A more complex setup might rely on a distributed consensus ledger to maintain the state.

The DID Doc in and of itself, however, is insufficient to fully establish the current authoritative key pair(s). Other infrastructure is required. Merely including a set of rotation events in a DID Doc only establishes control authority up to the latest included rotation event. But other rotation events may have happened since that version of the DID Doc was created. Consequently a DID Doc's main role in this respect it to help a user discover the mechanisms used to establish current control authority. This must be done with some care because in a sense the DID Doc is bootstrapping discovery of the authority by which one may trust the discovery provided in the DID Doc. Nonetheless in order to be authoritative, the other information in the DID Doc that is not part of discovering authoritative control does not need an event history but merely a version identifier linking it to the authoritative key pair(s) and an attached authoritative signature from the current authoritative key pair(s).

In other words the DID Doc is used to bootstrap discovery of the current authoritative controlling keys and then to provide authoritative versioned discovery of affiliated information.

RDF Complications

The RDF model uses triples to canonicalize a directed graph. This graph may be used to make inferences about data. This model attaches a context to a given DID Doc that must be verified as authoritative. This expansion complicates the process of producing an authoritative versioned discovery document or an evented state machine. Clearly a clever implementation of a cyclical directed graph could be used to implement versioned discovery documents or evented state machines. Many implementations of RDF, however, use directed acyclical graphs making the implementation of evented state machines at best problematic and versioned discovery documents more cumbersome. This forces a particular potentially unnecessarily more complex-methodology on implementing versioned discovery documents or evented state machines than what might be the easiest or most convenient for the implementer.

Most helpful comment

I am in favor of this proposal. While I recognize that JSON-LD provides some expressive power that ordinary JSON does not, I think the cost-vs-benefit for that expressive power is not a good tradeoff at the DID level. DID docs should be simple; they are a foundation that many things build on, and should not introduce onerous dependencies. Developers shouldn't have to learn JSON-LD to process DID docs.

I think the case for the expressive power of semantic-web-style constructs like RDF/JSON-LD is stronger at the VC level than at the DID doc level.

All 84 comments

@SmithSamuelM Thanks for posting such an exhaustive case for an abstract data model. This was the issue I raised in #103, and I think this largely supersedes that thread.

Although I originally proposed defining the abstract data model using a modeling language like UML, I am persuaded by your argument that doing it in a simple, universal encoding like JSON will make it more approachable to developers and thus better for adoption.

I fully understand that this is ripping off the bandaid on the tension between JSON and JSON-LD for DID documents. Given how low DIDs and DID documents are in the trust infrastructure stack, I am heavily in favor of "the simplest thing that could possibly work"—above all because of the need for this layer to be as rock-solid as possible from a security standpoint.

I am in favor of this proposal. While I recognize that JSON-LD provides some expressive power that ordinary JSON does not, I think the cost-vs-benefit for that expressive power is not a good tradeoff at the DID level. DID docs should be simple; they are a foundation that many things build on, and should not introduce onerous dependencies. Developers shouldn't have to learn JSON-LD to process DID docs.

I think the case for the expressive power of semantic-web-style constructs like RDF/JSON-LD is stronger at the VC level than at the DID doc level.

From a requirements perspective the simplest necessary and sufficient representation should be preferred over any unnecessary but sufficient representation especially if the later is more complex than the former. This proposal does not forbid the later but merely enables the former.

I'm very concerned that we don't have good examples of DID Methods that don't use JSON-LD at all. So people who don't understand JSON-LD, just kind of hack around it which leads to weakening JSON-LD...

I'm happy to help clarify this issue.

I think we need to provide some clear examples for how to use DID Core spec without JSON-LD and with it, and how to not muddy the waters, and improve the security understanding for either decision.

To be clear, I'm actually a huge fan of JSON-LD, and intend to keep using it with did:elem... I just want to be able to explain better to those who don't want to use it, how to do so in a way that protects JSON-LD and the implementers.

_...to actually address this issue proposal directly_

I'm in favor of this proposal, and I would like to see JSON-Schema used to help provide better clarity on what is and isn't allowed.

My question is: do we intend to define a DID document exhaustively, i.e., will we define all keys (terms) that can be used in a DID document, or do we envisage that other actors (methods, applications, controller, whatever) may _add_ keys to a DID document that are not defined in this spec?

The power of JSON-LD comes if we allow for the latter. On the other hand, if we want to define _all_ possible keys a DID document may contain then the advantages of using JSON-LD becomes a question.

@iherman I don't believe that we need to exhaustively define all keys up front. JSON is an extensible self-documenting data format that supports hierarchical mapping constructs. This makes it possible to discover extended content. The NoSql database world is filled with examples of document oriented databases where this is a standard practice. The RDF construct imbues a specific semantic that many find useful especially if one is building a graph databases but a graph database is not necessary to provide extensibility especially at the low level where DID Docs operate. Verifiable Credentials on the other hand are a different story. But my concern is that RDF has become a greedy paradigm that at least for the DID spec has resulted in unwarranted complexity and moreover due to its unfamiliarity causes unproductive confusion. This proposal does not preclude a JSON-LD implementation, it merely facilitates a specification that does not have the RDF data model as a dependency in order to better foster universal adoption.

@OR13

+1 Exactly. I think this is the next step. In many previous attempts to do this we have become bogged down by the complications of the "right way" to do this in RDF as opposed to not using RDF as the mental model. IMHO given the primary purposes of a DID Doc outlined above, the cryptographic considerations are paramount.

@iherman,

My question is: do we intend to define a DID document exhaustively, i.e., will we define all keys (terms) that can be used in a DID document, or do we envisage that other actors (methods, applications, controller, whatever) may add keys to a DID document that are not defined in this spec?

The power of JSON-LD comes if we allow for the latter. On the other hand, if we want to define all possible keys a DID document may contain then the advantages of using JSON-LD becomes a question.

I think we absolutely intend to do the former. This is self-sovereign technology with an aim at decentralized extensibility. I disagree with the premise that there is a "lot of time being expended unproductively in community meetings" on this subject. I would also argue that we will spend significantly more time rewriting/reinventing the parts of the JSON-LD standard that we're using here to accomplish the same goals. Either that, or we will have to head in an entirely different direction, and start assuming we know everything about how things should work and close off innovation at the edges. In other words, while I think this proposal is well intentioned, I suspect, if we were to adopt it, the outcome would be a need to duplicate significant complexity into our own spec instead of relying upon the work others have already put in (and that has already been standardized). All of this would also come at the cost of interoperability.

I don't think people realize all of the benefits we're getting from piggybacking on top of JSON-LD (e.g., SS/decentralized extensibility, generic data model that can be understood by tools (already) written once, ability to reference objects in the data model by ID using an existing standard, hash resolution rules, and more would come to light as we painfully discover what we've lost....). Taking any other approach will be necessarily closed world or a reinvention of the wheel. Furthermore, I think our spec already insufficiently expresses all of the things we're assuming work a certain way and we're working hard to improve this. To cut out the layers it depends on would only increase this burden as the benefits we assumed we had slip away.

@SmithSamuelM,

I don't believe that we need to exhaustively define all keys up front. JSON is an extensible self-documenting data format that supports hierarchical mapping constructs. This makes it possible to discover extended content. The NoSql database world is filled with examples of document oriented databases where this is a standard practice.

This is all siloed data that cannot be combined with anything else. This is exactly what we want to avoid and exactly why having a more generic data model that expresses relationships is useful for decentralized extensibility.

@dlongley

There is much value in what has already been done. This need not and should not be discarded. The problem is that the full syntax and semantics of the RDF model are not replicable in other encodings, at least not without major effort. Consequently we want just the good stuff. The essential constructs that are both valuable and universally applicable. An abstract data model does this and what is proposed is that this abstract data model be expressed in JSON. It certainly can have the "right" semantics that may be essentially the same as JSON-LD without requiring all that JSON-LD requires. This makes it not siloed. Siloing is not the same as not using JSON-LD. Any standard representation with agreed upon syntax and semantics is not siloed. An extensible hierarchical mapping data construct is perfectly adequate for expressing interoperable semantics. The process of defining those semantics is important. This allows for extensibility over time. Attempting to canonicalize a universal data graph up front is a difficult if not impossible task and is one reason not to be drawn into an RDF approach.

@iherman hits the vital point. If we want a DID Document to be extensible without namespace conflicts, we need JSON-LD (or its equivalent). If we want to define a concise and limited set of specific properties that define a DID Document, JSON alone is fine.

There may be other JSON-LD features we'd lose (I seem to recall something about language-specific things like character order), but it is the extensibility that appears to be the most significant.

One thing I keep seeing as a point of confusion from advocates of JSON is that UNLESS someone exercises extensibility, JSON-LD is JSON. So all of the tools and practices for a fixed-schema JSON work just fine with an un-extended JSON-LD serialization. As long as the context is unchanged and the JSON properties are of the constrained set, then you can treat JSON-LD as JSON. It is only when the document is extended that you need to evaluate the contexts. Which is exactly when JSON alone runs into trouble.

That makes the real question the one I started with. Is extensibility important?

@SmithSamuelM's last comment came in as I wrote this and I'm not sure how to interpret his comments on extensibility. No one is proposing a universal data graph up front. Certainly not the JSON-LD advocates. The point of advocacy is an open world data model where extensibility is afforded from the start. "The process of defining those semantics" sounds like you mean that a DID v2 could extend the specification. Yes, that's true, but you could only do so through testing non-compliant implementations unless you start out with an extensible serialization. It is that limited definition of properties implied by JSON only, that I believe @dlongley means by siloed.

@SmithSamuelM,

The problem is that the full syntax and semantics of the RDF model are not replicable in other encodings, at least not without major effort. Consequently we want just the good stuff.

You can encode the RDF model in JSON (this is what JSON-LD is) -- and the argument here is to use JSON. JSON-LD is JSON. Could you provide a concrete example of the problem you're highlighting?

An abstract data model does this and what is proposed is that this abstract data model be expressed in JSON. It certainly can have the "right" semantics that may be essentially the same as JSON-LD without requiring all that JSON-LD requires.

My reading of this is exactly what we want and already have... but it translates to: use JSON-LD and keep the core simple for JSON-only consumers. Someone treating JSON-LD as any other JSON (unless they want to use the extensibility features) shouldn't notice any difference. This is the same approach we took with VC with success.

Attempting to canonicalize a universal data graph up front is a difficult if not impossible task and is one reason not to be drawn into an RDF approach.

There are libraries to do this and specs in the works for future standardization (Note: I don't think we say anywhere that you must do this anyway). I don't think this is a strong reason to avoid the approach, especially given the other benefits we get from it. But, again, I feel like we are already where we need to be with respect to getting extensibility from JSON-LD/RDF and simplicity from JSON.

I see this more from a philosophical perspective than from a practical one. I don't think it's super hard to process JSON-LD if you only have plain JSON tools and knowledge, and vice versa I don't think JSON-LD provides that much extra needed functionality for DID documents that can't also be done with plain JSON. So in terms of how hard, or secure, or extensible it is, I think it doesn't matter that much.

For me the main purpose of DIDs is to try and model digital identity in a way that approximates as much as possible how identity works in the physical world. This is why I'm a big fan of @SmithSamuelM 's KERI work, where the root of trust is entropy alone which is available to everyone without dependency on anything else.

This also means that for me, the question of data format of the DID document is primarily about describing who you are in the digital world, and how to interact with you. This is also why it's important to talk about metadata about the DID subject vs metadata about the DID document (https://github.com/w3c/did-core/issues/65), about httpRange-14, and similar very theoretical topics.

From this perspective, I believe a description of (the core of) my physical identity in the digital world can be more appropriately done with a semantic RDF graph model, than with a plain JSON object tree of keys and values. So I like JSON-LD DID documents better than plain JSON DID documents. I believe getting these conceptual foundations right is more important than mass adoption.

I'm also in favor of describing the data model in an abstract way and then allowing different formats such as JSON-LD, plain JSON, CBOR, XML (https://github.com/w3c/did-core/issues/103). But the abstract description should be in UML or in English, not in JSON, because that wouldn't be abstract anymore.

But the abstract description should be in UML or in English, not in JSON, because that wouldn't be abstract anymore.

+1 to that. A JSON serialization is a very concrete one, not abstract.

@peacekeeper Unfortunately, I disagree in the strongest terms with these two statements:

the DID document is primarily about describing who you are in the digital world

and

a description of (the core of) my physical identity ...

This is the wrong mental framing.

If you see DID Documents as about the Subject, you are creating a privacy nightmare. Period.

DIDs and DID Documents only present ways for securely interacting with Subjects. They MUST say nothing about the Subject except these are alleged means of secure interaction.

DIDs are NOT your physical identity--online or off. They are a means to communicate with a counter party how to bootstrap secure interactions. I give you a DID and, in theory, I'm giving you a way to interact with the Subject. That's it. FULL STOP.

DIDs should never be tied to a specific person, because that can change. Yes. If you didn't get that, you need to understand that a given DID's Subject can change from one physical person to another. If that's outside the scope of what you have imagined so far, simply consider a DID with the Subject of the King of England. Not only has that Subject changed from time to time, it actually doesn't refer to any specific person at this moment in November 2019. Sometime in the next decade or two, it almost certainly will. And that is completely independent of whatever might be recorded in a DID Document.

Similarly, DID Documents should never contain information about a specific person other than that which enables specific secure interaction. I've made this argument already. Imagining the DID Document as about the Subject, without filter, will absolutely create privacy harms. Real ones. And when we achieve the scope of ambition we have for these identifiers, those harms will escalate to loss of liberty and even life. Don't imagine for a minute that privacy leaking DID Documents won't eventually kill someone.

This is EXACTLY why many definitions of "persistence" as a goal for DIDs is flat out wrong.

I've voiced this before and I'll voice it until my dying breath.

DIDs are intentionally, and should always be, a fundamental separation of concerns between the physical and the digital. Framing it any other way paves the path for exceptional abuses of this technology.

@jandrieu

DIDs and DID Documents only present ways for securely interacting with Subjects. They MUST say nothing about the Subject except these are alleged means of secure interaction.

Agreed, but those means of secure interaction can still be considered statements about the person, semantically not so different from saying what your name or address is. I am not saying DID documents should contain any more than the minimum amount of information for secure interaction, but semantically, DIDs are still identifiers for the DID subject, They are more than just something like an IP address for reaching the DID subject.

At least that's my own personal perspective, I won't insist on it strongly. I can also understand the arguments for simple, constrained, robust, plain JSON documents that are similar to DNS records, and that fulfill their well-defined purpose on a lower, separate layer than the actual "identity layer" that establishes your digital self.

DIDs should never be tied to a specific person, because that can change. [..] simply consider a DID with the Subject of the King of England. Not only has that Subject changed from time to time,

Are you suggesting we drop the "persistence" principle of DIDs? How would you be able to cryptographically prove that control of the DID has been transferred to the new King? The traditional thinking has been that in this case, the new King would have have a different DID than the old King. The old King's Verifiable Credential would get revoked, and a new Verifiable Credential would get issued to the new King.

Don't imagine for a minute that privacy leaking DID Documents won't eventually kill someone.

Agreed that it's super important to avoid privacy leaking DID documents. I think I could also argue that if the DID is only seen as a lookup key for some technical metadata, not as the root of your digital existence, then that wouldn't fully set you free and make you "self-sovereign". But I can also understand your view, see above. I am not sure if there's any contradiction here, does the question of the DID document format have anything to do with the goal of avoiding privacy leaking data in those documents?

I can also understand the arguments for simple, constrained, robust, plain JSON documents that are similar to DNS records, and that fulfill their well-defined purpose on a lower, separate layer than the actual "identity layer" that establishes your digital self.

I just wanted to chime in to say that I agree with this limited conception of DID Documents, which I think is close in spirit to the one Joe is arguing for. I do not agree with a richer conception that overloads them with lots of meaning and infinite extensibility. I think other resources, accessed through service endpoints, is where that belongs.

Simpler is better, at the relatively primitive communication-enabling level where DIDs belong.

DIDs should never be tied to a specific person, because that can change. Yes. If you didn't get that, you need to understand that a given DID's Subject can change from one physical person to another. If that's outside the scope of what you have imagined so far, simply consider a DID with the Subject of the King of England. Not only has that Subject changed from time to time, it actually doesn't refer to any specific person at this moment in November 2019. Sometime in the next decade or two, it almost certainly will. And that is completely independent of whatever might be recorded in a DID Document.

Can we get more precise? If I parse this statement very, very carefully, I don't disagree with it--but a lighter reading gives what I consider a faulty impression.

Here is what the DID spec currently says about persistence:

4.9 Persistence
A DID is expected to be persistent and immutable, i.e., bound exclusively and permanently to its one and only subject. Even after a DID has been deactivated, it is intended that it never be repurposed.

Now, Joe's statement doesn't say that the DID subject can change; it says that the person associated with the DID subject can change. I agree with that. If a DID's subject is "King of England", then the DID's subject hasn't changed when the person playing the role of "King of England" changes. The subject is stable; the person associated with that subject is what changed. This is more or less how we expect organizational DIDs to work. The staff of a company evolves over time, but the DID's subject--the company--remains constant.

But this is not how DIDs for ordinary people are expected to work. For ordinary people, the people are the subject. And a DID like this can't be an identifier for Alice today, and Bob tomorrow. So when the subject of a DID is a person instead of a role, the person in question is immutable.

Agreed?

Markus: But the abstract description should be in UML or in English, not in JSON, because that wouldn't be abstract anymore.

Dmitri: +1 to that. A JSON serialization is a very concrete one, not abstract.

JSON is a notation. Hence the 'N' in its name.

It is true that it can also be a serialization format--but we do not have to view it that way for the purposes of writing a spec. JSON as a notation is terser, clearer, and easier to work with in text than UML or fuzzy human language. Expressing the hierarchy and sequences of a data model with {...} and [...] makes much better sense to me than deliberately picking something clunky and less precise. As long as we say that the notation can be rendered in various serializations (including JSON-as-serialization, CBOR, etc), I think it's an optimal choice.

@dlongley

A couple of historical complications of JSON-LD
1) namespace collisions with openschema.org. We have changed top level block names to avoid collisions with openschema.org this adds a dependency that complicates at no value to those that do not use openschema.org

2) An unused @context is an invitation to a malicious injection attack on a did method resolver.

Both of these suffer from the complication of making external unexpressed dependencies part of the DID Doc. At least in an abstract data model we can make all dependencies internal. Implementers of a optional JSON-LD encoding could expand their dependency space at their leisure without encumbering the spec for everyone else.

3) From a cryptographic perspective in order to establish and verify authoritative attestations data is needed about those attestations. What is of primary importance is can the attestation be cryptographically verified as emanating from the current set of controlling keys. Whether or not the data refers to a subject (is it the DID, or the controlling entity of the DID) or whether or not it is meta-data with respect to JSON-LD/RDF is of secondary importance. We are lead to make poor crypto choices out of a desire to achieve JSON-LD/RDF purity.

For example if a user sees two different versions of a DID Doc that are both signed with the same key pair(s), how does the user know which one to trust or which one is the most recent? There are many mechanisms for helping the user make this determination such as a sequence number, a hash in a chained set of hashes, a date time stamp, a version number etc. That information needs to be inside the signature. It needs to be unique in the document. But these sorts of questions often take a long time to answer when encumbered by JSON-LD semantics and syntax.

@dlongley

We are using two different definitions of extensible. What I mean by extensible is the the document has a core set of defined contents and may be extended by adding additional contents. What appears to me is that the JSON-LD folks mean extensible to mean the a DID-Doc is extended by an external world model. In other words a DID-Doc is an intensive part of an extensive data model.

The latter definition is the problem. It explodes the dependency space. It makes discussion difficult. We need to discover the authoritative keys for the DID. Once we discover those we need to discover a few other things like how to access service endpoints that provide other functions or resources. But in a cryptographically verifiable way. That discovery needs to be authoritative. There are a few core things we need to know to make that discovery authoritative. Once we have made it authoritative there are a few other things we now know of that are common that we need to discover like service endpoints and how to talk to them. We can define these in JSON and then add others as they become important over time (extend the core contents of the document to allow that) Extending the document to include a world data model is mixing the larger question of identity with the smaller questions of how to do discovery of the authoritative keys and services. I frankly am having a hard time appreciating why a DID Doc has become the source of this greater problem. It makes doing the simple tasks harder. It is mixing concerns. A DID Doc is meta-data to bootstrap authoritative verification of attestations made by the controller of a DiD. All these extended world model usages could much more appropriately be included in a verifiable credential about the controlling entity. Let’s just have a bootstrap to a service endpoint that provides tha verifiable credential. The verifiable credential then has the extensible world model avaiable to it. This is what I call paradigm greed. That is trying to apply the schema centric approach of an extended world model to the bootstrap needed to credibly verify the a document (verifiable credential) describing the intensive part of an extensive world. Ever computation task is not best described via an extensive world data model. We need clean separation of concerns to do secure cryptographic bootstrapping to a state where a verifiable credential can then provide the world model. Many of the things I see being suggested for the DID Doc could be put in a verifiable credential. Let’s do that and keep DID Doc simple.

Indeed, I propose this criteria, any information about the subject entity of the DID that could be provided via a verifiable credential obtained from a service endpoint should not be in the DID-DOC. The only things that should be in a DID-DOC are those items needed to first bootstrap the control authority needed to bootstrap secure communication to such endpoint and validation of said verifiable credentials.

Verifiable credential are wonderful things. Let’s have more of them. But not disguised as a DID Doc.

DID Docs should be "extensible" in the same way and to about the same extent as HTTP headers are extensible: you can add extra stuff without breaking anything, and if the entity you're communicating with groks that extra stuff, fine. Otherwise, it has no effect. We do not need "extensibility" if it means namespacing, a complex resolution/processing model, @contexts, etc.

Those additional complexities are very reasonable when you need a true semantic graph (as with VCs)--but the power of DIDs is tied more strongly to their simplicity than to the semantic power of DID docs. If you want semantic power, use services at endpoints, not the DID doc itself.

The more I think about the above suggestions @SamuelSmithM @dhh1128 the more I am convinced that it cuts to the root of the issue. Do not put anything in a DID Doc that can be provided by a verifiable credential at a service endpoint. Only put in the DID Doc what is essential to access the verifiable credential. With that filter we will have very little left to put in the DID Doc merely the bare essentials and these will hardly need an extensive semantic model. Because if they did then they could be provided by a verifiable credential. We just need the minimum to bootstrap.

It might help crystalize the mental model to change from DID-Doc to DID Discovery Doc or simply DID Discovery.Data

The problem, as I see it, is did:peer - that should not be a DID method. In the context of pairwise communications the semantic issues are vastly different than in a 1:* communication about an identifier - where we desperately need JSON-LD. In pairwise communications we do not need machine processable semantics, the semantics should be determined as part of the communication protocol - but in terms of VCs and 1:* DIDs, general purpose, extensible semantics are critical.

I need to understand a use-case where two communicating parties, or a small group of parties, are communicating and need to appeal to some global semantic mechanism. I just do not see it - and with that, issues like service endpoints, fragment processing, persistent reference get in the way. If I have a DID assigned to a specific pairwise communication, or to a specific credential, the need to discover how to communicate is unclear - it is like calling someone on their phone and then asking them for their phone number. If you can call someone on their phone you do not need a zero-knowledge disclosure process for discerning and validating their phone number - you have that already.

On the other hand, when trying to discover and communicate with people, organizations, and things - when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of "fat DID documents" - and when you deal with fat DID documents you need machine processable semantics - which is what JSON-LD provides. Opting for a 2nd layer of meta-configuration about the semantic milieu adds enormous and unwarranted complexity - avoiding JSON-LD in order to re-create semantic negotiation adds tremendous complexity and inhibits adoption.

We need to separate out PIDs (Peer/Pairwise DIDs) and DIDs (public decentralized identifiers) - if a ledger is involved, it is a public DID - why else go to the trouble of anchoring it to some global oracle of authoritative state? PIDs are critical - but they are so dramatically different in their use-case domain that trying to get a one-size-fits-all DID document leads to exactly the sort of confusion we are struggling with.

I want to see DID:peer -> PIDs and I want to see pairwise DIDs removed from our lexicon - let JSON-LD rule the landscape of interoperable, multi-system, multi-platform identification. They can share some root utilities - like KERI, but these are apples and oranges.

@ewelton : That's a fascinating take. Initially I hated it, but now I'm stepping back and trying to evaluate more thoughtfully.

I'm curious about the broad claim that "when trying to discover and communicate with people, organizations, and things--when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of 'fat DID documents.'" This seems doubtful to me, because we have systems like this today (only not as decentralized), and I don't see them as needing what you claim. But say more about that; maybe you can lead me along...

(Possibly we should drop off this comment thread, though, if we veer too much into a tangent from Sam's original intent for this issue...)

I agree with this statement:

With that filter we will have very little left to put in the DID Doc merely the bare essentials and these will hardly need an extensive semantic model. Because if they did then they could be provided by a verifiable credential. We just need the minimum to bootstrap.

and @dhh1128 - I believe, strongly, in your vision of pushing computation to the edge. I remember attending your presentation in Basel, and that picture with client-server and client-blockchain stuck in my head. I think you are right.

but I also believe strongly in JSON-LD - at a recent meeting with a government I was pitching DIDs as an alternative to centralized governmental certificate authorities. One of the selling points was that "you don't have to be bound to semantics, @context gives you control"

With Sam's proposal, I lose the flexibility that, just last week, I used to try to sell DIDs to a government in lieu of a centralized authority.

What it comes down to is "why are you resolving a DID" - the reason is that you want to engage it - either to perform authentication, or to open a communication channel. That is completely reasonable in the context of people, organizations, and things who participate in a society using DIDs. Participation in a society, especially when that crosses borders requires semantic negotiation - and I think that JSON-LD is about the best offering on that front in the last few decades.

On the other hand - there is absolutely no need for that level of semantic capacity when in a "micro-society" - one of the whole points of a private communication is the benefit of a shared semantic. This is what drives "inside jokes between friends"

In fact - I've been working (since Basel) on an actual mathematical result around this - it should be possible to exceed the naive shannon information capacity of a channel through "inside jokes" - between friends you can benefit from a form of steganography, so that communication remains secure even if the raw crypto is cracked. This is because the sender/receiver have a semantically tuned system - pairwise communications should not just be about syntax, it should be semantically pairwise - and that means that JSON-LD is useless overhead.

I understand what @ewelton is saying, however I strongly disagree, both about peer DIDs needing separate treatment and about the requirement to have an extensible semantic graph model at the DID level of decentralized infrastructure.

Ironically peer DIDs prove the point that @SmithSamuelM is making: all DID-to-DID communications require bootstrapping a cryptographically secure connection—whether the connection is peer-to-peer or one-to-many. The same underlying mechanisms—persistent identifiers, public keys and service endpoints—are needed in both cases. Sam's argument is that this is all that is needed, and that adoption will be easier and security (and privacy) will be stronger if this is all that is included in the data model (in other words, follow the dictum of the simplest thing that could possibly work.)

On the other hand, when trying to discover and communicate with people, organizations, and things - when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of "fat DID documents".

While I can understand someone coming from this POV, let me make sure it is clear why Sam and I and others on this thread have been arguing the exact opposite: if what you're trying to solve is a generalized discovery problem, then you not only need tools like a semantic graph model, you also need name services, directory services, search protocols, etc. That's a whole different problem space. And there are tools and technologies that already work very well for that problem space. All those tools and technologies need to do is add DIDs to become even more useful for discovery.

If OTOH the problem you are trying to solve is the Decentralized Identifier problem space: entity-controlled persistent decentralized identification and bootstrapping of cryptographically secure communications, you neither need nor want any of those other features.

Think of it like the difference between DNS and Web searching. The former uses a highly constrained type of identifier and very simple flat record format to solve one specific problem very well at scale. The latter uses a rich set of identifier schemes (URIs) and highly extensible markup languages. The latter is where you need a semantic graph model, not the former.

One more point for everyone on this thread: allowing a JSON encoding to be defined in pure JSON without reference to a semantic graph model does not prevent those who want to use a semantic graph model from defining an encoding in JSON-LD, or N-Triples, or N-Quads, or Turtle.

Nor does it prevent the CBOR community from defining an encoding in CBOR.

@talltree I need to apologize for not quite expressing my point well - i'd had a document open in vi for a while, but the conversation is happening at internet speed!

The issue to me is not one of 1:1 or 1:N, it is about the context of the conversation. Context and multiplicity are orthogonal - the semantic is a property of the community, not of the individual. Once upon a time there was a famous poet named Pootie Tang, and, according to his acolytes he was too cool for normal words - so, even though you never knew what he was saying, you always knew what he meant.

That's why I can say "sine your pity on the runny kine" and communicate as much as when I say "sah-dey-tah" - and while these utterances are sometimes joyful, they always seem to land me in secondary at the border crossing en route to IIW. No matter how many times I say "sah-dey-tah" it always takes an hour or more before I am released. I feel this is a "failure to communicate" - and that this FTC is semantically driven.

It is always about context - and that context defines the semantic in play.

If identity is contextual, then so is communication about an identity - and it is the communicative context that picks the semantic. The need for JSON-LD is in communication negotiation.

I don't want to denigrate this - negotiating the context of communication is key - and when unknown parties are bootstrapping communication, semantics are not well defined. If i have agreed to have N parties in a conversation, it seems that picking "we will speak the Queen's English (and use JWT)" is natural.

Ironically peer DIDs prove the point that @SmithSamuelM is making: all DID-to-DID communications require bootstrapping a cryptographically secure connection—whether the connection is peer-to-peer or one-to-many. The same underlying mechanisms—persistent identifiers, public keys and service endpoints—are needed in both cases.

I agree that 'bootstrapping' a conversation requires much of the same structure - KERI for example. On the other hand, service_endpoints and persistent identifiers - I am less convinced. If I have a throw-away pairwise communication DID that is intended for one conversation, why do I need persistence? If am opening negotiations with myself, USCIS/CBP, and a bevy of lawyers - do we need to argue about "what do you mean by 'name'"?

I definitely do not want to see a world where I have a public-DID resolution that requires two levels of semantic processing - one to determine which semantic layer is in play, and a second to determine what the document means in the context of the previously selected semantic milieu. The idea of "just get rid of semantics, and stick with syntax" is equally abbhorent to me.

I think that JSON-LD provides a nice middle ground - what does JSON-LD fail to provide that CBOR or N-Quads or jada-jada doesn't?

@dlongley said in https://github.com/w3c/did-core/issues/128#issuecomment-559199898:

My reading of this is exactly what we want and already have... but it translates to: use JSON-LD and keep the core simple for JSON-only consumers. Someone treating JSON-LD as any other JSON (unless they want to use the extensibility features) shouldn't notice any difference. This is the same approach we took with VC with success.

I think it is very important to understand this point as this may be been lost in the discussion. There are a number of specifications that have been developed with a similar philosophy; to quote two of these that I was involved with (beyond VC mentioned by @dlongley): Web Annotations or Publication Manifest, but that could also be said of the way search engines relying on schema.org operate. What it means is:

  • The specification defines a set of JSON (note: I say _JSON_ and not _JSON-LD!_) terms with well specified meaning and a processing model;
  • The JSON "shape" so defined can also be checked using a JSON Schema defined alongside the specification;
  • However, the resulting JSON "shape" is defined in a way that, _if needed_ a JSON document defined along those lines can also be _seen and processed as linked data_ by way of a JSON-LD syntax. This means:

    • defining a @context that ensures some sort of a "mapping" from the specific JSON shape to the linked data world;

    • it is the Working Group's responsibility to define the terms in such a way that, _if used_, that mapping is semantically sound.

All this requires a little bit more care for the Working Group in defining the underlying JSON, but with no adverse consequence for users or implementers.

As an example, there is a specific section in the upcoming Publication Manifest document that defines how the processing of a manifest should be done by an agent (in that case, e.g., and audiobook reader) and that processing is defined _without any reference to linked data, RDF, etc_. At the same time, the same manifest _may_ be used, if so required, as part of a larger linked data cloud, combining the content with vocabularies defined by very different communities out there.

This approach has always been one of the driving forces for the development of JSON-LD.

I am not taking side whether the DID document should be "pure" JSON or JSON-LD. But we should take this decision understanding what the usage of JSON-LD really means...

@iherman:

The specification defines a set of JSON (note: I say JSON and not JSON-LD!) terms with well specified meaning and a processing model

I believe that the construct that you're calling a "term" here is a JSON-LD-ism, not a JSON-ism, so your note doesn't compute. And I think that is the beginning of the dissonance. To even understand our process of spec-writing, or to read the spec itself, we are requiring people to understand technical definitions that are rooted in JSON-LD. And the notion that we need to "define a processing model" as part of the spec compounds this impression; the processing model for JSON is plenty clear without elaboration in a new spec, if we are not aiming for fancy constructs that JSON-LD needs.

If, instead of defining terms and a processing model, our spec limited itself to constructs so simple and primitive that they required no explanation, and if we knew these could be mapped onto a set of terms and a processing model for those that wanted to do so, that would be different.

The tax on spec developers and (more importantly) spec readers/implementers for JSON-LD as a foundation is not zero. When I asked why our spec demanded that all values of id be fully qualified, I was told that it was because of demands from JSON-LD's processing model. There's a big debate happening about key formats--part of the dissonance relates to JSON-LD's opinion about use of id versus JWK's use of kid, and the semantic mismatches between them. I could cite other examples.

I don't think the viability of starting from JSON-LD but letting JSON aficionados ignore it is the relevant question. Clearly it can be done. The question is whether the juice is worth the squeeze. What features of JSON-LD do we actually need? I would be very interested in concrete answers to that question, rather than theoretical ones. I think that, not discussions about process or theory or precedent, ought to push this issue one way or the other.

Based on lack of good examples so far, I am suspecting that such features, if they exist, may turn out to be uncompelling or just plain wrong-headed. This is based on your own observation, restated by Joe:

If we want a DID Document to be extensible without namespace conflicts, we need JSON-LD (or its equivalent). If we want to define a concise and limited set of specific properties that define a DID Document, JSON alone is fine.

And also Sam's point:

Verifiable credential are wonderful things. Let’s have more of them. But not disguised as a DID Doc.

I'm as well in favour of having a simple JSON based DID core specification.

JSON is fully sufficient to describe the abstract data model. JSON can avoid namespace conflicts by registering new vocabulary (which might be needed for new features) in IANA. While I appreciate some of JSON-LD's extensibility features it does not solve that issue as a whole. Implementers will still have to implement the interpretation of these feature to achieve interoperability. On the other hand, it would be always possible to define additional specs that describe how to use these new features/ vocabulary. The spec authors will then be in charge of registering new vocabulary with IANA, or chose terms that are collision resistant. I agree that JSON-LD introduces an unnecessary overhead for implementers that just want to use features described in this issue.

@dhh1128

I believe that the construct that you're calling a "term" here is a JSON-LD-ism, not a JSON-ism, so your note doesn't compute.

This is not JSON-LD-ism, but Herman-ism... You are right, the official terminology on JSON is "name". However, I hear the word "key", "name", "term" all around me among JSON users; b.t.w., the JSON-LD spec does not use a different terminology either. Blame it on me.

And the notion that we need to "define a processing model" as part of the spec compounds this impression; the processing model for JSON is plenty clear without elaboration in a new spec, if we are not aiming for fancy constructs that JSON-LD needs.

Again, it may be some terminology mismatch, but I respectfully disagree. The original text in this issue, as written by @SmithSamuelM describes, in abstract terms, a process model that defines, in effect, what should happen with the, ehem, names of a JSON representation (if this is the representation we choose for the data model). I do not think that a spec "just" listing names without any specification of what those names should be used for would be o.k.

As I said, I am _not_ taking sides whether we should use JSON-LD or simply JSON. My only goal was to help making things more clearer to everyone. And yes, actually, I do find:

Do not put anything in a DID Doc that can be provided by a verifiable credential at a service endpoint. Only put in the DID Doc what is essential to access the verifiable credential.

(from @SmithSamuelM) compelling.

I would be very happy with this change. It would make it much simpler for developers to deal with it.

It also seems to me to be much more secure to have a document for specifying public keys etc to not have external dependencies such as contexts etc.

Just some additional (intended to be neutral) observations:

  1. Switching to plain JSON will make it impossible to use LD proofs and signatures. This could be considered a good thing since it makes everything even simpler, and JWS could be used instead. But in the DID world, proofs have been used that are much more diverse than what JWS offers, e.g. a "Satoshi audit trail" or Sovrin state proofs. It has been argued in this thread that the core features of DID documents are so simple that they don't need RDF semantics or namespaces, but when we get into proofs (and verification methods), we may see a much greater need for extensibility and open world semantics in the future. Note that there is a discussion whether such metadata even belongs into the DID document or into a separate "DID resolution result" data structure (but would that then be in plain JSON too?)

  2. We would have to define additional rules for the "id" property (or would we change it to "sub", to be compatible with JOSE?). In JSON-LD, the "id" is a built-in construct, it contains the identifier of the RDF subject that is being described. One thing that's nice about the current spec is that "id" is used on the top level for the DID subject, and it's also used for services and public keys. We would have to consider how JWK's use of "kid" would fit in.

  3. We would have to define additional rules for fragments in DID URLs. The media type application/ld+json defines how fragments work (they are directly based on the "id" fields, see above). The media type application/json does NOT define how fragments work, so we would have to pick something. The obvious choice (but not the only one) would then be JSON Pointer. RFC6901 talks about the use of JSON Pointer as a URI fragment. This would mean that DID URLs like did:ex:123#keys-1, which we are using today, would change to something more complex.

  4. DIDs would probably lose their potential compatibility with WebID and Solid. On one hand you could argue that this doesn't matter for the applications most people in this community are working on (OIDC, DIDComm, etc.), since those applications only require simple mechanics for discovering keys and services. But WebID and Solid should still be considered important theoretical work on aligning traditional web architecture and philosophy with digital identity, and it may be desirable for DIDs to be compatible with that architecture.

  5. If the decision is to go with plain JSON, then perhaps we should just extend JRD instead of inventing a new "DID document" format.

I like the specificity of the list of specific impacts provided by @peacekeeper , and I like the simple clarity of the original questions posed by @SmithSamuelM

  • Is JSON a sufficient encoding for the purpose of DID Docs ?
  • Would JSON foster greater adoption than some other encoding such as JSON-LD ?
  • Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

It definitely seems that JSON is up to the task, subject to some changes like those described by @peacekeeper

I also very much like the idea of a DID-Doc being as simple as possible and focused entirely on bootstrapping communication - this is required in all DID use cases. I also very much like the sentiment that we should let VCs do the work of supporting everything beyond communication bootstrapping.

This is where I see a distinction between advertised, subject-oriented, global, persistent, published, discoverable DIDs and DIDs which are used only to secure communication or to tag a device or datum. I see a difference between a DID that represents a persistent, publicly discoverable communication endpoint - like my business or myself - and a DID that is tagging a pencil.

In the former case, when we resolve a DID we immediately need to ask questions such as "how do I communicate with you" or we want to use the DID as part of the persistent address of a resource - probably via fragments. In DNS this was dealt with adding more record types and more complex sub-structures in TXT records. I see the extensibility of the DID-Doc being analogous - it provides a structured pathway for a DID-controller to publish information about the DID and to link that DID into the global data ecology. Moreover, that ability to publish is directly tied to the DID itself - allowing the DID-Doc to be a single, authoritative, source of truth expressing "this is me" - or, more correctly, expressing the mechanics of learning about me.

The alternative model - a simplified DID-Doc - forces a multi-step process - I can't use a DID to point to a resource - i am required to use a far heavier process that involves new communication protocols and a suite of toolkits. The simplification of the DID-Document results in substantially higher complexity. If you are focused on pairwise or small-group communication, then you do not need anything beyond the simplest DID-doc - but if you are advertising a persistent DID that you expect to share generally - e.g. a DNS-like model - then you need more. This is why I think that there is a rough difference along the lines of did:peer and did: - private, pairwise, anti-correlable DIDs are subject to a different set of pressures than public, correlation-positive DIDs - and public, correlation-positive DIDs seem to me to beg for a systematic semantic extensibility.

This is why I don't quite understand the argument that JSON simplifies anything - and, in particular I am not clear about this:

This expansion complicates the process of producing an authoritative versioned discovery document or an evented state machine. Clearly a clever implementation of a cyclical directed graph could be used to implement versioned discovery documents or evented state machines. Many implementations of RDF, however, use directed acyclical graphs making the implementation of evented state machines at best problematic and versioned discovery documents more cumbersome.

What I keep looking for is a concrete example of how JSON-LD gets in the way? What is an example of the simplification that occurs? If I am going to throw away the simplicity of an authoritative and expressive, semantically extensible, public, correlation-positive DID-Document - if I am going to give up my ability to make clear, discoverable, public statements and give up the ability to clearly and unambiguously point to persistent resources I will expect some very substantial and very compelling improvements.

@awoie has suggested a centralized, authoritative registry define and restrict how I can use my DID-Doc. @pelle says

It also seems to me to be much more secure to have a document for specifying public keys etc to not have external dependencies such as contexts etc.

The solution of using JSON simply replaces a formal, explicit, decentralized model (JSON-LD) with a centralized, implicit, and restrictive model. There is no requirement that you actively look up the @context URIs against a network. The external semantic dependency always exists, it is a question of where and how this is done - is it extensible and decentralized, or is it fixed and restrictive.

Before we introduce a hard-coded, dictatorial, centralized, restrictive semantic and remove our ability to use DIDs as persistent resource bases I think it is worthwhile to really see some of this simplification in action. If possible, can someone who feels JSON simplifies provide a short list of specific examples, like @peacekeeper 's list - showing a very concrete example of the processing improvement?

To be clear - I am sympathetic to the sensibility of simplifying the DID-doc to only express a DPKI registration - however, I think we lose a tremendous amount of value in the context of published, correlation-positive DIDs. We owe it to ourselves to be very clear about the cost/benefit tradeoff before we jetison those abilities.

I would also like to respond to @dhh1128 - i think this is a very good question

I don't think the viability of starting from JSON-LD but letting JSON aficionados ignore it is the relevant question. Clearly it can be done. The question is whether the juice is worth the squeeze. What features of JSON-LD do we actually need? I would be very interested in concrete answers to that question, rather than theoretical ones. I think that, not discussions about process or theory or precedent, ought to push this issue one way or the other.

What I want is the ability to say "@context:[the-did-core-bootstrapping-context-uri]" for a DID that has the minimal data required to bootstrap communication. In this case, I can just ignore the JSON-LD and deal with the JSON - hardcoding the semantic. This is particularly useful in pairwise handling - it is particularly useful for anonymity, for transient DIDs, for did:peer dids, and a whole range of use cases. I expect this to be the mainstay.

Alternatively, I might be exploring multi-DID relationships in complex guardianship or custodial relationships. In that case, I can add a context to a specific DID document and include the relevant fields. This can be done by just the subset of DIDs for which it makes sense - and there is no need to try to solve all the possible guardianship or IoT situations and get them into the centrally registered specification.

Perhaps there is other information - versioning information, timestamping, or whatnot that is helpful for a particular DID method - in those cases I can simply add another URI to the @context array. I may want to associate some additional information about a novel form of key-recovery.

That to me is the key value proposition of JSON-LD - the ability to formally declare additional @context elements on a per-DID basis.

It is the case that I can also do all of this with VCs - but then I can't simply use the DID to "point" to a resource - I must always provide a pair - the DID and a 2nd URI for the VC that contains the linked information. And for public statements I want to make about myself, I can always issue myself a VC and find some place to host that for general availability.

There is a certain clarity and elegance to that - but I don't necessarily see it as a simplification - it is, rather, a bit of added complexity.

Perhaps the issue is whether or not DIDs should be thought of as resource anchors - if they are not resource anchors, and are solely for DPKI registration then why not move service_endpoints and perhaps authentication out and make those VCs as well? That does seem a bit extreme - but it is also oddly pure.

Again - that's why I think it comes down kinda along the public vs. private and the peer/published distinction. If I am registering a DID on a global ledger, then I am making a public announcement of a resource base - but if I am making a private DID for secure, correlation resistant, p2p communication then the sense of a published resource base is gone. I bootstrap communications and exchange VCs, and we never, ever need to touch a ledger - in that case we use the minimal set of DID-Doc fields. However, if I am advertising information on a ledger in a sort of generalized DNS style then i want to use @context to formally declare what information is being published - and I want the author to be in control of that.

I think the use of @context gives us the flexibility to deal with those two differing roles of a DID at a relatively low cost - and the did-core-context can be hardcoded and DID documents can be processed as JSON, completely ignoring the JSON-LD if that helps. I just see the @context as a critical pressure-release valve - it is a minimal intrusion which opens the door to a universe of possibilities - it lets us use DIDs as communication anchors and as resource namespaces.

That could be the core of the problem - because I definitely agree w/ @talltree here

While I can understand someone coming from this POV, let me make sure it is clear why Sam and I and others on this thread have been arguing the exact opposite: if what you're trying to solve is a generalized discovery problem, then you not only need tools like a semantic graph model, you also need name services, directory services, search protocols, etc. That's a whole different problem space. And there are tools and technologies that already work very well for that problem space. All those tools and technologies need to do is add DIDs to become even more useful for discovery.

If a DID is only an anchor supporting DPKI registration then I think there is definite value in getting rid of JSON-LD and going with a single, centralized, authoritative specification. There is definite value in that - but there is also then very little significance to did methods - because it does not matter where the DID-document comes from. did:anything is basically the same everywhere - it is the DID-document and not the DID itself which matters.

This impacts the rubric conversation - because did:facebook is just as valid as did:sov - the method simply does not matter - as long as the identifiers are self-certifying, they can not be "taken away" - they can only be unpublished from directory services. Decentralization does not matter - only the integrity of the DID document.

That is why, to me, if we are talking about decentralized registries of information, with multiple methods - then we expect variation around the DID-document itself. That variation is what gives sense to different did methods. The entire topic of Resolution is based on the concept of a public registry. JSON-LD gives us a means of coping with the variation in DID-Doc around multiple registries and purposes.

If we are not talking about the registry issue, and only talking about DPKI, then a self-certifying document and associated processing toolkits are enough - and JSON-LD is clearly too much.

I see real value in an open world data model for certain purposes, e.g., VC data, but I'm questioning if this is needed for features in the DID spec when we consider, we already have registries for DID methods, revocation status, LD Signature Suites and potentially more upcoming. These registries are authoritative to implementers in the same way as IANA registries. And that is not necessarily a bad thing because otherwise we won't achieve interoperability.

If however JSON-LD is needed to include certain parts of the DID community, then I won't be opposed to it. However, I also believe if we cannot express the data model in JSON, then we would have failed. If it is good enough if resolver implementations or the application logic can decide whether to support or require a certain data model (de)serialisation format, then we should consider this. I want to second what @jricher said here https://github.com/w3c/did-core/issues/103#issuecomment-559190860.

TBH, I am not sure what the value is in making a DID-document so dramatically multi-format - that seems like additional overhead of dubious value - especially if the data is a fixed set of fields. You'd have to link in dozens of additional libraries to any system just to handle all the possible formats you might get back - but that is only if you want to allow did:* to work everywhere. However, it is more likely that different methods will gravitate to different environments and use-cases and languages.

Being able to represent a DID-Doc in format A, B, and C is different from saying DID-processing software and methods must be able to handle format A, B, and C interchangeably. The fact that some industry uses XML and another uses JSON does not tell me that all DID-processing MUST support XML and JSON formats interchangeably. It seems like a bigger ask to force a lightweight TypeScript application to include the full XML processing stack just in case I got a DID document in XML.

In fact, in #103 - dhuseby references a rant from an implementer's perspective where he says that JSON-LD is about how and not what should be in the document. It is that 'what' that we want to be extensible - and JSON-LD provides a mechanism for that, although it is apparently painful - and perhaps it is because JSON-LD is too expansive for the little bit of assistance we need in extending "what" is in a document?

For example: consider the descriptions of adding certificates in #69 and #103 - it seems like this might be a prime candidate for a method-specific variation and/or a DID specific variation - e.g. DID-Doc A says "I have standard core material and support cert-type-A and cert-type-B" while DID-Doc B says "I have only standard core material"

This kind of thing could be handled w/o JSON-LD - but the question remains - how do I, the controller of a DID document, extend the DID-document if it makes sense for my use case? I want to somehow, systematically, flag "I have a set of additional attributes with the following meaning and metadata" and I want a way to say "does this DID-doc support X, Y, or Z"

Again - if we restrict the role of DIDs to be DPKI registration alone and block their use as the root of a resource domain then we have a very tight, encoding agnostic, model. If we want fluid extensibility of DID-doc feature sets what alternatives do we have to JSON-LD? Would simply replacing "@context" with "@features" and be encoding agnostic solve it?

I just wanted to chime in to say that I agree with this limited conception of DID Documents, which I think is close in spirit to the one Joe is arguing for. I do _not_ agree with a richer conception that overloads them with lots of meaning and infinite extensibility. I think other resources, accessed through service endpoints, is where that belongs.

Simpler is better, at the relatively primitive communication-enabling level where DIDs belong.

+1

I believe our goal is to foster broad adoption. Any complexity or extensibility that can turn a DID-Doc into some beast that was never intended is not a great idea.

As an example I point to a different domain. OASIS has a standard called Common Alerting Protocol. It is used for Amber alerts, weather warnings, and many more emergency and non-emergency alerts. It allows for an AREA on the planet to be specified - as a circle or polygon.

In the creation of a vessel tracking system a developer decided that since vessels are best represented by points, a circle with zero radius would suffice. Large volumes of vessel tracks were created. That's fine when they stood alone in a system.

But then someone tried to share them "using a standard" (that had been subverted) and caused unknown levels of pain as systems tried to understand the hacked alerts. Two big problems - they weren't alerts and they weren't really using an Area.

Now - this "hack" caused some thinking in the community that a zero-radius circle (i.e. a point) was actually valuable so the community adjusted over a long period of time. However, the confusion created caused system crashes and the reality was that OASIS CAP is not for tracking vessels... There were other standards needed for that type of data.

My point here is that bounding things to keep them simple helps interoperability. Allowing or expecting infinite extensibility may make interoperability far harder than it needs to be.

This issue is talking about too many things, at too abstract of a level of discussion. There are also a LOT of misconceptions in the "Let's use JSON as an abstract data model" proposal and I can't see how it actually translates into a working specification given the ecosystem that has sprung up around Verifiable Credentials.

I'll answer the questions raised by the issue, but am concerned that the answers won't really get us toward a concrete set of text that we can discuss.

Is JSON a sufficient encoding for the purpose of DID Docs ?

No, it isn't, because JSON only deals with local information, whereas DID Docs (and Verifiable Credentials) deal with global information (aka global semantics). JSON-LD was invented to address global semantics while enabling the folks working with JSON tooling to continue to use that tooling.

Going to JSON would be a step backwards that would see us reinventing parts of JSON-LD that took years to come to consensus and standardization on. We'd have to do that work again in this group with a very questionable return on investment on the table.

Would JSON foster greater adoption than some other encoding such as JSON-LD ?

This question is impossible to answer with any amount of certainty, which will lead to us debating using anecdotal evidence. The website design community asked the same question years ago wrt schema.org's use of JSON-LD and publishing machine-readable information... and today 27.5% of websites use JSON-LD.

Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

For what feature set? Again, the question is so vague that it could be interpreted in a variety of different ways by different people given how the question is posed. It's probably also a very bad idea to make this a popularity contest (by asking random web developers what they'd like -- might as well ask them which cryptographic algorithms they'd like to use). There are a set of features that are important for the spec to have due to the requirements, the current spec meets those requirements, moving to a less capable data model and syntax is going to force the group to re-open work that has already been done (years ago).

This issue needs a PR so we can talk about things concretely... let's not get into a philosophical debate, let's see a concrete proposal -- that'll be easier to analyze than what this issue is turning into.

Refocus Disscussion

This discussion has been highly informative and I very much appreciate the detailed thoughtful comments. They have been illuminating and insightful. In this comment, however, I hope to refocus the discussion to the core questions posed by this proposal and re-frame the questions in light of the discussion.

Original Questions

As stated above the original questions are as follows:

1) Is JSON a sufficient encoding for the purpose of DID Docs ?

2) Would JSON foster greater adoption than some other encoding such as JSON-LD ?

3) Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

Stated Assumptions

These questions were based on some stated assumptions in the introduction leading up to the questions. These may be summarized as follows:

A) That better fostering universal adoption was a goal of this community. This means not just giving lip service to the idea of universal adoption but actually designing and expressing the DID-Doc spec in a way that conveniently facilitates broader adoption.

B) Universal adoption requires simplicity in the data model and the corresponding baseline encoding. This maximizes the adoptabilty of the baseline encoding and also maximizes the ease of translating to other encoding and hence the adoptability potential of other encodings.
This combination best fosters more universal adoption.

Given that the community agrees to A) and B) then the next stated assumption may be summarized as:

C) The best approach to fostering universal adoption is to represent the DID-Doc specification as a simple abstract data model that is directly expressible in a baseline encoding that may be conveniently translated to other encodings.

That means an encoding other than JSON-LD or naive JSON with JSON-LD syntactic artifacts as the baseline encoding. As proposed this means clean JSON. This makes CBOR trivial, and makes other encoding like PDF easier.

Given the community agrees with A) B) and C) then questions 1) 2) and 3) become relevant.

What I see above are many arguments that JSON-LD/RDF in an open world as either superior or necessary as the encoding for a DID-DOC

Every argument that says JSON-LD is necessary is an argument against one or all of A) B) and C). Any argument for the unique capabilities of JSON-LD is an argument against A) B) and C)

Issues to Argue

So lets all be fair and argue the issues.

If one thinks that JSON-LD is necessary, then state that and accept the ensuing consequence that encoding DID-Docs in other encodings will be more difficult if not problematic and will likely become more difficult over time as more and more stuff gets put into the DID-Doc spec. As the DID Spec itself states, the open world data model on a semantic graph is more complex. So lets not pretend it isn't.

On the other hand if JSON-LD is not necessary then you accept A) B) and C). So begin your argument there. If you accept A) B) and C) but not 1) 2) 3) then argue that. If one doesn't want JSON as the language for the abstact data model then argue that point and stop arguing for JSON-LD. Argue instead for the best language to represent the abstract data model. UML is a good alternative candidate. I would love to have a discussion on the relative merits of UML vs JSON not endlessly argue why one wants to encode in JSON-LD.

Nowhere is it proposed that one can't use JSON-LD as an optional encoding. The question is not whether or not JSON-LD is good as an encoding for those that want to use it but whether or not universal adoption is critically important and if so whether or not some other encoding would better serve that purpose as the baseline spec encoding.

Focusing on the questions

Because I accept A) B) and C) then I am led to find criteria for deciding 1) 2) and 3).

That means understanding the essential purposes of a DID-Doc. Nothing more. Nothing less.

There is a vast difference between serving the _essential_ purposes and merely serving _useful_ purposes. Arguments for useful purposes are only material if there is no other way to serve those useful purposes.

An optional encoding in JSON-LD will allow all the useful purposes of DID-Doc that JSON-LD provides without encumbering the essential purposes that every encoding must provide.

So arguments for features of JSON-LD are immaterial if they are not essential to every encoding.

The reason for the detailed exposition of purpose of the DID-Doc in the Appendix above was to frame the discussion of what are the essential purposes of a DID-Doc so that we could agree on those as a precursor to deciding if JSON or some other encoding best provides the necessary and sufficient functions of the baseline encoding.

Based on the discussion above it seems that expanding the discussion of essential purposes will be helpful.

DID Subject

One frequent issue in DID WG discussions is what is the DID Subject.

The DID spec https://www.w3.org/TR/did-core/ Defines it thusly.

5.2 DID Subject

The DID subject is denoted with the id property. This is the entity that the DID document is about, i.e., it is the entity identified by the DID and described by the DID document.

DID documents MUST include the id property.

id
The value of id MUST be a single valid DID.

This is much too ambiguous of a definition and is the source of confusion even in this thread. As each imbues their definition of subject for their usage of the DID-Doc with different meanings. Indeed one discussion was about whether the DID-Doc itself could or should be the subject of the DID-Doc.

The dictionary definition of ambiguity is as follows:

  • open to or having several possible meanings or interpretations; equivocal:

  • an expression exhibiting constructional homonymity; having two or more structural descriptions.

So what do I mean by my contention that this definition is too ambiguous. Its too ambiguous if there is a tighter definition that better serves the essential purpose of a DID-Doc.

The only entity that is _essential_ with respect to the primary essential purpose of a DID-Doc is the controller of the private key or keys that are authoritative for the DID.

This entity is the only one we need to care about when determining the authoritative set of keys.

Whether or not that needs or is better expressed as the subject id of a DID-Doc is a good question. It certainly can be implied. There is always an implied authoritative controller entity of the DID-Doc. This entity is authoritative in a cryptographic sense by virtue of its control over the associated key pair or pairs.

Only the controller can make verifiable authoritative nonrepudiable statements about the DID or resources affiliated with the DID. These include statements about the controller itself.

Not recognizing this fact is where the discussion goes off the tracks. A DID Controller may make authoritative statements about anything it chooses, including other entities such as the DID-Doc itself and any resource affiliated with the DID. The DID Controller may also choose to make statements about itself. But most importantly the DID Controller may choose to make authoritative statements about the key pair or pairs by which those authoritative statement may be verified in the future.

What matters foremost is validating if a given statement is authoritative w.r.t. the DID Controller.

A DID-Doc's primary purpose is to bootstrap the user of the DID-Doc to the point where that validation may be made.

Any other purposes are secondary and may be provided in other ways.

A useful and potentially essential other purpose is to bootstrap the discovery of affiliated resources such as those that may be obtained from service endpoints. This useful other purpose could be so practically useful as to be viewed as essential to the meaningful use of a DID.

Given these two purposes in order of importance are provided by a DID-Doc then any other purpose may be met by resources external to the DID-Doc in the form of verifiably authoritative statements made by the DID Controller.

This conclusion does not preclude putting additional resources in the DID-Doc as _useful_ things but they are not _essential_ things and are therefore _optional_ things.

If the _unique_ value of JSON-LD lies in its ability to do the _useful_ things but not unique to doing the _essential_ things then JSON-LD due to its complexity is not a good candidate for the _essential_ or baseline encoding. It is best used as an _optional_ encoding.

DID-Doc as an Ersatz Root VC for the DID Subject

It seems clear that the community is divided over this discussion. What appears to me as the underlying but unstated source of that division is the mental model the two sides have for a DID-Doc.

  • One side has the mental model that the DID-Doc provides intensive semantic knowledge about the DID subject in an extensive world model. This looks like an ersatz root verifiable credential (VC) issued about the subject. The arguments for this mental model fit the arguments that one would make for a self-issued root VC. If it walks and talks like a VC it is is a VC.

  • The mental model for the other side is that a DID-Doc provides a cryptographically verifiable bootstrap that enables validation of authoritative statements made by the DID Controller. A subset of these authoritative statements include VCs or ersatz VMS issued by the DID Controller. This subset includes self-issued VCs about the DID-Controller itself.

The former mental model must still establish the purpose of the latter mental model but does not recognize it as such. It co-mingles both in the DID-Doc. It does not separate the concerns. This complicates things. The former is dependent on the latter. Therefore _they are not equivalent models. This is the cause of the division._

A combination of these two mental models that separately embeds and encapsulates the associated two purposes may be provided in one DID-Doc. The first is to provide the essential bootstrap data needed to validate authoritative statements by the DID controller. Once that is provided the DID-Doc may optionally also provide ersatz VCs issued by the DID-Controller. These could include self issued VCs where the DID Controller is the subject.

Given this clean separation of concerns and associated encapsulation of data by purpose (_essential_ and _optional_), other encodings would not need to be encumbered with the optional embedded ersatz VCs but could merely provide the bootstrap data in the DID-Doc and provide the ersatz VCs in another way such as via a service endpoint.

Thought Experiment

To clarify my own thinking I often conduct thought experiments about meaningful edge cases.

In my thought experiment I wondered what would be a maximally adoptable encoding for Did-Docs. The answer is a QR code. In order to fit a DID-Doc in a QR code I would have to limit the data to the bare essentials. Could I do that and still enable someone to practically use the DID. The answer is yes as long as additional information that may be needed to use the DID could be provided via a service endpoint and this other data could include a self-issued root verifiable credential (VC). So I would only need enough information to validate the inception event for that originating key-pair for the DID and a service endpoint to get everything else.

Now I have a path to truly universal adoption. This path does not preclude someone from using an encoding such as JSON-LD that embeds the ersatz root VC in the DID-Doc. They may go to town with that and benefit form all the goodness that JSON-LD brings. But all other encodings have a simple direct path to full functionality.

So maybe the abstract data model baseline encoding should be a QR Code =) (just joking)

@SmithSamuelM I really think that was tremendously clear - thank you.

I guess, for me, I don't strongly agree with B - we pay a price for B and I think it is a 'might help' sort of priority. In terms of the two mental models I would say that I am somewhere in the middle - but I might use slightly different adjectives and I would not say that one is about the DID-subject and the other is about the DID-controller - which is critical to pay attention to.

To me the difference is like a doorknob and lock. Separation of concerns would say "always use a deadbolt and separate doorknob" - but for a lot of doors in my life, I like the simple fused lock/doorknob combination. When I come home from the store I almost always have bags in my hands, and so I value the ability to turn the key and the knob at once - it really makes things much easier and simpler. An example of "doorknob extension" is whether or not you wanted the door to automatically lock, or if locking required explicit action - there really are a lot of options with a simple door. I may have different keys for a simple lock - but the subject is clear, it is the portal itself, and it is under the control of people with keys.

It could be that a good compromise is dealing with this in the context of resolution. In other words, a DID-Doc with fixed, non-extensible semantics that are easily mapped through any format which is peered, inherently, with a VC and stored in the same medium as the DID-Doc using the same identifier.

Note - this is not a service_endpoint model. To me the "special VC" has two additional properties that separate it from other VCs - it is intrinsic to the method and medium in which the DID-Doc itself is stored - it is the locus for the DID's communication with the method itself and it has no sense other than about the DID/DID-Doc pair (including the distinction between the controller(s) and subject). Those properties are hard to get if you use the service_endpoint approach - and, most importantly, the relevance of these properties is sensitive to the public/private role of the DID in the ecosystem.

Also, just thinking in terms of performance engineering - consider the evolution of HTTP/3 - where minimizing the number of discrete connections and round trips is a primary engineering concern - and what we have with HTTP/3 is a structured fusion of concerns. In terms of scanning blockchains and calculating the docs from transaction chains, or establishing the trust relationships through the resolver chain - that overhead is only done once - I effectively get to ask the resolver for either a "core report" or a "full report about the DID" - i either get (DID-Doc,null) or (DID-Doc,Method-Bound-VC) - and I can do likewise with updates and revocations (which impact both elements of the pair in - ideally - an atomic operation). This is another area where there are different pressures for public/private DIDs - largely because there is only one method for pairwise DIDs that I know of, meaning that if you are focused on peer dids you will not feel the tension in the same degree.

My apologies if I've driven this conversation off track - but I think @SmithSamuelM really captured the friction well - much better than I did. I do think this is all a coherent thread though - it is the cost/benefit analysis of B relative to (x) distinction between DID-subject/DID-controller and (y) the different mental models behind the doc.

Similar to Eric I push back on B. The base assumption that N encodings are
needed doesn't make a ton of sense to me, especially if it is simple.

I have two points related to this that both relate to the evolution of a
specification/standard.

ADVICE POINT 1. Start Simple but Plan for Evolution.

Specifications get started with the best of intentions. Technology areas
with broad goals are attacked, ambitions are stated and rich depth and
breadth are discovered.

Then things start to get a little wonky. Edge cases and corner cases start
to be built before they are truly understood.

Successful specifications start simple and allow for evolution. They do the
bare minimum - often to the detriment of performance early on. Consider the
earliest versions of http, smtp, pop, ftp, amqp, etc. They generally
started out crudely but allowed the job to be done. The community used
them, learned, and extended (hacked) them where they didn't quite work.
Those learnings and hacks were examined, and where appropriate, built into
subsequent versions of the specifications. Often the specifications evolved
quite quickly.

I suggest we do our best to keep things as simple as possible and allow
optional extensions. After we see more adoption we will likely learn where
the broad community needs something (build that into the next version) or
where a specific need is unique to a community (leave that in another
specification or in the optional extension).

Supporting multiple encodings doesn't fit the simple category. I think that
the JSON-LD portions could easily be supported in an extension area. Over
time the parts that make the cut can be built into subsequent versions of
the specification.

ADVICE POINT 2. Multiple encodings slow things down - and may make future
evolution impossible.

I'll throw in OASIS CAP (again) as a warning. CAP 1.0 was dead simple and
has an XML encoding. CAP 1.1 learned from 1.0 and the "1.0 hacks" that were
used in the wild to compensate for flaws and things that were missing. At
this point, the ITU was brought in as an additional SDO that would add
gravitas to the standard. That worked from an adoption perspective but
required an additional encoding (ASN.1). This meant we had 2 encodings: XML
and ASN.1.

Later, CAP 1.2 was created but hasn't changed much - party due to the
technical limitations of ASN.1 and lack of political will to evolve. This
has limited the movement of CAP and, in my opinion, stunted its possible
utility. Efforts to move to a v2.0 have been squashed, partly due to the
multi-encoding.

This pattern has repeated under numerous OASIS standards, especially with
the downplay of XML/XML Schema in favour of JSON renderings. This has
already become a problem for several areas - especially as JSON Schema
limitations have been found (and thus it has begun to look like a
shadow/mirror of XML Schema).

Compatibility becomes a problem with multiple encodings as well. Imagine
"equivalent" JSON and JSON-LD representations of exactly the same data. To
answer the question "are they really equivalent?" is actually quite hard.
Once you get into the (ahem) semantics of the two representations at some
point we end up with data that are explicit in one area, and implicit in
another. How do you definitively say that the two are functionally
equivalent? The only way that I have seen this work - and it was extremely
expensive - was to create testing suites that could be run by third-party
labs to accredit/certify results. Not ideal at all.

I believe that a dead simple JSON specification that can be cast into
whatever formats are needed in a specific domain will allow the broadest
use.

Well, that's my Sunday afternoon rant/thinking about this. I believe in
keeping things simple until complexity is truly warranted.

Have an awesome remainder of the weekend folks!

cheers,

Darrell

Darrell O'Donnell, P.Eng.

darrell.[email protected]

On Sat, Nov 30, 2019 at 2:50 AM ewelton notifications@github.com wrote:

@SmithSamuelM https://github.com/SmithSamuelM I really think that was
tremendously clear - thank you.

I guess, for me, I don't strongly agree with B - we pay a price for B and
I think it is a 'might help' sort of priority. In terms of the two mental
models I would say that I am somewhere in the middle - but I might use
slightly different adjectives and I would not say that one is about the
DID-subject and the other is about the DID-controller - which is critical
to pay attention to.

To me the difference is like a doorknob and lock. Separation of concerns
would say "always use a deadbolt and separate doorknob" - but for a lot of
doors in my life, I like the simple fused lock/doorknob combination. When I
come home from the store I almost always have bags in my hands, and so I
value the ability to turn the key and the knob at once - it really makes
things much easier and simpler. An example of "doorknob extension" is
whether or not you wanted the door to automatically lock, or if locking
required explicit action - there really are a lot of options with a simple
door. I may have different keys for a simple lock - but the subject is
clear, it is the portal itself, and it is under the control of people with
keys.

It could be that a good compromise is dealing with this in the context of
resolution. In other words, a DID-Doc with fixed, non-extensible semantics
that are easily mapped through any format which is peered, inherently, with
a VC and stored in the same medium as the DID-Doc using the same identifier.

Note - this is not a service_endpoint model. To me the "special VC" has
two additional properties that separate it from other VCs - it is intrinsic
to the method and medium in which the DID-Doc itself is stored - it is the
locus for the DID's communication with the method itself and it has no
sense other than about the DID/DID-Doc pair (including the distinction
between the controller(s) and subject). Those properties are hard to get if
you use the service_endpoint approach - and, most importantly, the
relevance of these properties is sensitive to the public/private role of
the DID in the ecosystem.

Also, just thinking in terms of performance engineering - consider the
evolution of HTTP/3 - where minimizing the number of discrete connections
and round trips is a primary engineering concern - and what we have with
HTTP/3 is a structured fusion of concerns. In terms of scanning blockchains
and calculating the docs from transaction chains, or establishing the trust
relationships through the resolver chain - that overhead is only done
once - I effectively get to ask the resolver for either a "core report" or
a "full report about the DID" - i either get (DID-Doc,null) or
(DID-Doc,Method-Bound-VC) - and I can do likewise with updates and
revocations (which impact both elements of the pair in - ideally - an
atomic operation). This is another area where there are different pressures
for public/private DIDs - largely because there is only one method for
pairwise DIDs that I know of, meaning that if you are focused on peer dids
you will not feel the tension in the same degree.

My apologies if I've driven this conversation off track - but I think
@SmithSamuelM https://github.com/SmithSamuelM really captured the
friction well - much better than I did. I do think this is all a coherent
thread though - it is the cost/benefit analysis of B relative to (x)
distinction between DID-subject/DID-controller and (y) the different mental
models behind the doc.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/w3c/did-core/issues/128?email_source=notifications&email_token=AAFHWB67WH6I4BLTXUO67ZDQWILM5A5CNFSM4JR6JDR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFP3YQY#issuecomment-559922243,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAFHWBYQEYTQ2ZPXMTAZILTQWILM5ANCNFSM4JR6JDRQ
.

I totally agree with @darrellodonnell’s statement “_I believe that a dead simple JSON specification that can be cast into whatever formats are needed in a specific domain will allow the broadest use._” He hits the nail on the head. Let’s quickly get consensus to build this and get to work on it!

Next, @dhh1128’s statement on extensibility is exactly right: “_DID Docs should be "extensible" in the same way and to about the same extent as HTTP headers are extensible: you can add extra stuff without breaking anything, and if the entity you're communicating with groks that extra stuff, fine. Otherwise, it has no effect._” On the other hand, the arguments I’m hearing from others that begin with “JSON isn’t extensible”, appear to be starting from a false premise, which doesn’t help achieve consensus.

I suggest that we adopt tried and true extensibility principles by including language like this in the specification: “All JSON members not defined by this specification MUST be ignored when not understood.” Allowing new fields to be added without breaking existing implementations enables the JSON to be extended over time. A way to ensure that this extensibility is actually implemented is to add undefined fields to the JSON in the conformance test suite (like "Don1tRejectThis!":true) and test that implementations ignore inputs they don’t understand. Let’s please also do that.

Thanks for all the thoughts on this thread.

If its possible to provide a clear spec with just JSON, I think we have (had?) an obligation to do that before we add(ed?) a layer on with JSON-LD.

Its taken me a while to be semi-proficient with JSON-LD, almost all of the pain I experienced was the result of using tools that comprehended JSON-LD with data from people who didn't.... (myself included).

https://w3c.github.io/did-core/#contexts
https://w3c.github.io/vc-data-model/#contexts

DID documents MUST include the @context property.

This means that with high probability, DID Documents that are constructed by people who do not understand JSON-LD will not work with software that understands JSON-LD...

When someone who does understand JSON-LD tries to sign such a DID Document, they will most likely receive the following error:

The property "ethereumAddress" in the input was not defined in the context.

This is in direct contradiction of the language described above: "All JSON members not defined by this specification MUST be ignored when not understood."

Similarly removing the context property will result in another error:

Error: "@context" property needs to be an array of two or more contexts.

When people don't understand things, they ignore them or remove them... doing either of these things will result in a did document which is not valid JSON-LD, and which directly harms interop with systems that plan to handle JSON-LD...

If you are planning on not supporting JSON-LD, the context definitions, and the human readable documentation, I'd rather not find that out when I try and sign your "looks like JSON-LD" did document / vc... I'd rather you not include an @context...

There are currently no implementations of DID Methods or VCs that do not support JSON-LD (every DID Document and VC has a context per the specs). This means that every DID Method and VC that includes that context but does not contribute to the documentation is kinda making this problem worse, and just has tons of not properly documented extensions... This prevents them from adopting JSON-LD Signatures or other JSON-LD tooling...

Things that look similar but are not are very danger0u5, including an @context and not using it properly is a recipe for security issues. It should not be permitted.

We seem to have 2 options:

  1. Keep JSON-LD and make comprehension and support of it a requirement (because not doing so creates security issues).

  2. Remove JSON-LD as a requirement and make it super clear that certain DID Document do not support JSON-LD.

I worry that (2) is actually not achievable, and instead we will be seeing @id, @type and Ed25519VerificationKey2018 in did documents / vcs that have no idea what JSON-LD is :(

These ^ are security issues, they muddy the waters, and they weaken the standards (JSON-LD, VC & DID). They are resolved by understanding why you should not use those names... that involves understanding JSON-LD today, and with medium-high probability forever.

If we are serious about (2) someone should sit down with JOSE and build a fully documented DID Method that supports JWS/JWE, show us why its better, and how it won't just make things more confusing.

If we are serious about (2) someone should sit down with JOSE and build a fully documented DID Method that supports JWS/JWE, show us why its better, and how it won't just make things more confusing.

+1, specifically, demonstrate how you will achieve at least the following things that we are currently depending on JSON-LD for:

  • Global identification mechanism (using machine-readable URLs, if possible)
  • Linking to other data elsewhere on the Web
  • Instance type expression
  • Data type expression (dates, times, units of measure, etc.)
  • Global semantics / no local key conflicts in key-value pairs
  • Versioning support
  • Internationalization support
  • Metadata annotation (information about information)
  • Decentralized extensibility w/o deeply understanding how to publish a spec at IETF/W3C.

We use every one of those features in almost every DID Method.

At the very least, please suggest spec text changes in a PR so we can do a proper analysis on how this impacts implementations. I'm concerned that there is so much miscommunication in this issue that multiple parties are talking right past each other in this thread.

@msporny : I am dubious about the assertion that almost every DID method uses features like internationalizaiton support, metadata annotation, or instance type expression. Did you mean that almost every DID method uses at least one item from your list, instead? Or do you mean something more subtle, like they use the features invisibly, without realizing it? I am only familiar with the impl details of 4 or 5 DID methods--but I can't think of how any of them use any of the items on your list except linking to other data. And we don't need JSON-LD to put URLs into strings...

Part of what some are claiming here is that your list of used JSON-LD features may be troubling rather than insightful; maybe DID methods shouldn't be doing some of these things, because they are confusing the rich problem domain of VCs with the simpler problem domain of DIDs. For instance, do we really need versioning in DID docs? Really? There is elaborate versioning in SOAP, but some of the most popular RESTful APIs on the planet don't version their JSON payloads at all. Instead, devs ignore fields they don't understand, and use the fields they do. This works in production, all day long, every day--in part because versioning has turned out to be an occasional, minor concern in these interfaces. If the interfaces were going to be upgraded many times, with many subtleties in play at every upgrade, maybe it would be different... So, do we expect DID Docs to need lots of versioning sophistication, or to be more like JSON payloads in popular REST APIs?

It seems to me that VC semantics are rich enough to justify the versioning mechanism; DID Docs, not so much. And the same for many other fancy features. If that's the case, then it's hard to credit the assertion, for each item in the list, that A) we desperately need this feature; or B) we will have to reinvent it if we don't get it from JSON-LD. Remember the story about the mechanical engineers who invented sensors to detect empty boxes coming off the assembly line, and the minimum wage worker who just turned on a fan and blew the empty ones off?

To @dhh1128's point, If you look at all the DIDs at https://w3c-ccg.github.io/did-method-registry/, None of them use all of the bullets you state @msporny except the Veres one DID which is yours. All of the participants in this issue have presented well-researched solutions and answers while I feel yours are opinion based. If you state a historical issue from w3c that is similar let us know because we'd all be interested to read it in the archives.

Based on @darrellodonnell and @selfissued comments. I am revising B) and C)

Old:
B) Universal adoption requires multiple encodings. That means encodings other than JSON-LD or naive JSON with JSON-LD syntactic artifacts. It means clean JSON, CBOR, PDF to name a few.

New:
B) Universal adoption requires simplicity in the data model and the corresponding baseline encoding. This maximizes the adoptabilty of the baseline encoding and also maximizes the ease of translating to other encoding and hence the adoptability potential of other encodings.
This combination best fosters more universal adoption.

Given that the community agrees to A) and B) then the next stated assumption may be summarized as:

C) The best approach to fostering universal adoption is to represent the DID-Doc specification as a simple abstract data model that is directly expressible in a baseline encoding that may be conveniently translated to other encodings.

That means an encoding other than JSON-LD or naive JSON with JSON-LD syntactic artifacts as the baseline encoding. As proposed this means clean JSON. This makes CBOR trivial, and makes other encoding like PDF easier.

@SmithSamuelM,

Premise "B" simply assumes away the point of contention in this thread, which is a logical fallacy.

How so @dlongley? Please elaborate.

Information Model Agreement

Broken Mental Model

@msporny @dlongley @ewelton et al.

In my attempt to re-focus and re-frame the debate I tried to draw attention to the purposes of a DID-Doc and classify them as either essential or useful. Instead of addressing this foundational issue the responses have been largely to jump directly to lists of features and then claim these features are essential without actually addressing the core purposes of a DID-Doc.

The reason we are at this impasse is that we have a broken mental model of what a Did-Doc must accomplish. There is obvious cognitive dissonance in how the community approaches this problem. This is prima facie evidence of a broken mental model.
To be more rigorous, what I have labeled in previous comments in this thread as the "mental model"
is better expressed as the "Information Model" as per RFC 3444
https://tools.ietf.org/html/rfc3444 which makes a distinction between and Information Model (IM) and a Data Model (DM).

To quote:
"The main purpose of an IM is to model managed objects at a conceptual level, independent of any specific implementations or protocols used to transport the data. The degree of specificity (or detail) of the abstractions defined in the IM depends on the modeling needs of its designers. In order to make the overall design as clear as possible, an IM should hide all protocol and implementation details. Another important characteristic of an IM is that it defines relationships between managed objects."

Until we agree on the information model aka purposes and functional dependencies of those purposes we will remain trapped in this conflict. Much of the confusion and unproductive discussion in the meetings arises from the inherent greater complexity of an open world mental model. Its less about the syntactical artifacts of JSON-LD and more about the informational artifacts.

Starting with a concrete data model encoding such as JSON-LD as the precursor to the Did-Doc spec is an extreme example of putting the cart before the horse.

I am reminded of the time my wife and I went to the car dealer to buy a new car. We found a model/package that had all the functional features we wanted. We then asked what colors of that model/package the dealer had on the lot. The salesman responded that we could get that model/package in any color we wanted as long as that color was blue. They only had a blue one.

Likewise it seems to me that responses to my comments agree that I may pick any data model I want as long as its JSON-LD.

Is universal adoption a goal of this community? I think it is or at least should be. Fair discussion of this goal starts at considering the true necessary essential features not wanted optional wanted useful features.

It seems that there is not broad agreement within this working group that universal adoption is of vital importance. My motivation is to create a DID spec that provides truly portable identifiers that foster self-sovereign identity and trust over IP.

In other words I want most a DID spec that acts as an adoption vector to realize greater self-sovereign identity and better trust over the internet. My concern is the some within this community want most to leverage the momentum behind self-sovereign identity and trust over IP as an adoption vector for JSON-LD.

I am not against graph data models. I strongly support graph data models when used appropriately but not universally. The original white paper I wrote on decentralized identity in early 2015 calls out that identity is a recursive composition that is best expressed as an identity graph. It is at the compositional information layer where privacy and selective disclosure play an important role.
https://github.com/SmithSamuelM/Papers/blob/master/whitepapers/open-reputation-low-level-whitepaper.pdf

Proposed Information Model

With respect to RFC 3444, agreement on an abstract information model is a precursor to agreement on an abstract data model. As stated above, information models represent relationships between information especially dependencies. These dependencies include functional ordering and layering. If you get the dependencies wrong or mix up dependencies and layers you have a broken information model. It is clear to me based on this discussion that the JSON-LD proponents are largely using a broken information model. Or at the very least are using an information model that is so far incomprehensible to me.

Once we have the correct information model then and only then should we worry about the data Model. The problem I am trying to fix is what I see as a broken information model.

Historically the DID spec WG jumped to a concrete JSON-LD Data Model without first getting agreement on the information model, actually, without even discussing the Information model per se. This is why I spent time deriving the purposes of a DID Doc in the appendix of the original post in this thread. It was my way of capturing the information model so that I had a good foundation for discussing a data model. But the JSON-LD proponent's responses to my original post have largely ignored the information model outlined in the appendix above.

As a result I am going to further focus my discussion on the information model.

_The correct information model is a layered relationship with a hard dependency between the layers!_

_Layer 1:_ Bootstrap from a root of trust to the authoritative signing key or keys. This is the only functionality necessary to the bootstrap layer. In no way does this layer benefit from any of the unique features of JSON-LD and the open world information model of RDF. Indeed an open-world model violates best practices for informational security. This is a fatal flaw of using RDF in the bootstrap.

_Layer 2:_ Verifiable Attestations using the authoritative Signing Keys from Layer 1. A verifiable attestation is any information signed with the signing keys.

Given agreement on this information model we can then talk about data models. At layer 2, a subset of the class of verifiable attestations includes the class of verifiable messages. A subset of the class of verifiable messages includes verifiable documents. A subsets of the class of verifiable documents includes JSON-LD documents that use an open world model. A subset of JSON-LD documents include the class of verifiable credentials.

That means we shouldn't even discuss JSON-LD until we are way down the information representation hierarchy of data models that fit our information model.

Indeed the name DID-Doc implies an information model that is inaccurate. A document in this context is dependent on a bootstrap function and therefore before we can talk about a DID Doc spec we should talk about a bootstrap to a cryptographically verifiable source of authoritative control spec.

Then and only then should we decide if we want to represent both layers with a single encoding type and if so if we want to include both layers in a single document or to keep them separate. There are lots of ways to bundle two layers in a single message or a single document without breaking, mixing or co-mingling the layers.

Trusted Computing Implicit Identifier Information Model

A closely related and very informative spec that shares information model proposed above is the following from the Trusted Computing Group. https://trustedcomputinggroup.org

https://trustedcomputinggroup.org/wp-content/uploads/TCG-DICE-Arch-Implicit-Identity-Based-Device-Attestation-v1-rev93.pdf

The trusted computing IM may be simply summarized as follows:

Bootstrap to derived implicit identifiers
Make verifiable attestations using the bootstrapped implicit identifiers.

Because the information model used in the implicit identifier spec is essentially identical to the two layered IM model proposed above, the proposed information model for DIDs stands on solid best practices security ground. My argument is that this is the most appropriate information model given we want to fix trust over IP.

They use the term _implicit identity_ to refer to identifiers that are _self-certifying_. (Apparently they are not familiar with the self-certifying work in this space from the 90s so have invented a new term). They use the name Device ID (same acronym of DID but not a W3C DID). The root of trust in their Device ID is the entropy in a random number generator that operates on first power up of a device so it is private to the device. From this root identifier other identifiers called aliases may be derived. Once these self certifying identifiers have been created, the next layer in their IM stack is uses those identifiers to make verifiable attestations that will be sent over the network.

The intent of their specification is to lay the groundwork for a future where all network capable computing devices will have this bootstrap functionality. This will enable trusted computing capability without the need for a heavy weight TPM. Just the bootstrap to a derived identifier(s) based on a root implicit identifier and then make verifiable attestations thereby.

We need to use the appropriate information model and then use the most appropriate data model for that information model and then pick an appropriate encoding to express that data model.

Given a layered information model, a different data model and encoding may be used at each layer.

Trust over IP is only possible if we use good cryptographic practices in our information model. Thus a layered model. The semantic web is an optional data model for a subset of what one may choose to do in layer 2 but has no place in layer 1. Indeed it is a huge security problem to put anything like an open world model in layer 1.

JSON, CBOR are sufficient for layer 1. Security consideration make JSON-LD a bad candidate for layer 1. Putting the bootstrap in layer 1 simplifies layer 2.
This layering makes it easier to support multiple layer 2 encodings.

@longley as prima facie evidence of B)
This post establishes B) IE layering clarifies dependencies. Removes constraints on layer 2. Some applications may only need to do minimal work at layer two. All must do layer 1, but layer one does not benefit at all from JSON-LD.
One of the editors of IETF Remote Attestation Proceedures WG told me that using JSON-LD with its open world data model for DIDs makes DIDs DOA for trusted computing. IMHO, DIDs with JSON-LD have little chance of formal adoption by any of the trusted computing standards.

My concern is the some within this community want most to leverage the momentum behind self-sovereign identity and trust over IP as an adoption vector for JSON-LD.

Nope. You can safely discard this concern. Let's move onto figuring out how to turn this into a productive discussion.

@dlongley

Assumption B) is my assumption. You can argue against my assumption. You don't have to accept it. It is clear that many in this community share assumption B). My post above explains from an information model point of view that indeed a simpler encoding is sufficient for layer 1.

Are you making the assertion an open world model is simpler than a closed one? I assume you agree with the contrary statement in the existing DID spec. So its not clear what your basis for arguing against the assumption of B). Its clear that from your viewpoint that JSON-LD is essential to what you want to do. But you have not made the case that it is essential for everyone else.

Layer 1 is essential. In contrast almost everything in layer 2 is optional. It may be practically useful but not essential.

(@SmithSamuelM) Universal adoption requires simplicity in the data model and the corresponding baseline encoding.

While adoption is clearly a goal, I would like to challenge a bit the assumption that simplicity and universal adoption should be absolute objectives above all others. The goal should be to design a new type of identifier that can serve as a basis for a universal, next-generation decentralized identity infrastructure based on fundamental URI and web architecture. If there is a trade-off between universality and extensibility at the cost of added complexity on one hand, and pure simplicity on the other hand, then I dont think the decision is entirely clear on what is more important. While I agree that very constrained JSON documents are "simpler" and probably also more secure, I am a bit worried that some opinions on this thread may be motivated primarily by a desire to make a small set of protocols or products easy to implement, without considering the "bigger picture" of decentralized identifiers as a future building block for potentially many applications.

(@SmithSamuelM) One side has the mental model that the DID-Doc provides intensive semantic knowledge about the DID subject in an extensive world model.

(@SmithSamuelM) The mental model for the other side is that a DID-Doc provides a cryptographically verifiable bootstrap that enables validation of authoritative statements ...

(@SmithSamuelM) The correct information model is a layered relationship with a hard dependency between the layers! Layer 1: Bootstrap from a root of trust to the authoritative signing key or keys. This is the only functionality necessary to the bootstrap layer.

(@jandrieu) DIDs and DID Documents only present ways for securely interacting with Subjects. They MUST say nothing about the Subject except these are alleged means of secure interaction.

I know this may be a minority opinion, but I would also challenge the assumption that the second mental model should be the only one supported by DID documents. Yes, putting too much information into a DID document is a threat to privacy and people's lives. But a dependency on an external service endpoint may also be a threat. In certain scenarios, I believe the open world model is the right model not just for VCs but also for the DID document. Especially for DIDs that identify organizations or things. And I know that @jandrieu and others will strongly disagree now, but in some cases perhaps even for personal DIDs I may want to be able to have open world statements inside the DID document rather than behind a service endpoint.

@SmithSamuelM,

Any argument that suggests JSON-LD is "too open" and that JSON would therefore be a better choice makes no sense. JSON-LD is JSON with additional constraints. It is a subset of JSON.

These constraints are introduced to encourage extension interoperability by requiring people to extend in a particular way. Loosening these constraints such that extensions can happen any way that you please would not help resolve any concern about information in a DID Document that is not understood by a consumer. It would only harm interoperability -- and, thus, harm "universal adoption".

Anyone not interested in writing extensions need only consider the DID spec's description of the data model and its expression as JSON, nothing more. This would not change if the group were to abandon JSON-LD in favor of JSON; it would only harm extension interoperability, interoperability with the Linked Data ecosystem, and our ability to reuse existing definitions and work. I've yet to be convinced that there's some significant cost that is being paid for these advantages that makes it not worth while.

There are clearly some fundamental misunderstandings that still need to be resolved here. I do want to say that I suspect there's more material being poured into this thread than can be consumed at a reasonable rate by very busy working group members. I think we need to see concrete PR(s) or we'll just keep spinning our wheels.

@peacekeeper

A layered model is not that same as a service endpoint model. A layered model allows a service endpoint model but does not require it. This is why I described one option for those that want an all-in-one approach to separate the information via encapsulation. Both layers may be provided in a single document as long as they are separable. This means if you want to use JASON-LD for layer 2 you can choose to also use JSON-LD for layer 1 and take the hit on security. But others may use JSON for layer 1 and 2 or use JSON for layer 1 and JSON-LD for layer 2. And may also choose to use a sevice endpoint model.

@dlongley

An open world information model is not a subset of a closed world information model. This is the disconnect. That the concrete data model uses syntax that could be classified as an extension of a simpler syntax does not make the expanded information model simpler but makes it more complex. Semantics and syntax are two different types of complexity. This is why the information model agreement is so vital.
@dlongley
@peacekeeper
Layering enables the best of both worlds. It allows extensibility at layer 2 without encumbering layer 1. Please explain why the layered information model is wrong. Please explain why the bootstrap from the root of trust is not essential to everything else.

This hard functional dependency is exactly why layering is the appropriate model. We can now encapsulate and separate the functionality of the two layers. This removes the most difficult security problems from layer 2.

@SmithSamuelM,

An open world information model is not a subset of a closed world information model. This is the disconnect.

I didn't argue this -- so I don't think it's the disconnect. I think we'd all like to get to more common understanding. This is another reason for us to try and focus on a concrete PR. We may find there's actual agreement on whatever you put forward -- or, we may find out where the disconnects really are!

That the concrete data model uses syntax that could be classified as an extension of a simpler syntax does not make the expanded information model simpler but makes it more complex.

I agree with this, I'm not sure why you think otherwise. Perhaps you think I agree with the maxim "the simplest thing possible is always the best approach". If so, I don't -- what I think is that complexity trade offs should be worth it. Adding constraints (as JSON-LD does) increases complexity for producers of extensions. However, these constraints are added because they have a net decrease in complexity for consumers of extensions and create an increase in interop and reusability. Of course, this is generally why constraints are added. Saying we won't have any constraints just means that there won't be any interop -- what we've created is "too simple" for it.

The priority of constituencies allows for spec writers and extension authors to take on more complexity such that consumers may take on less. Consumers know that extensions must all abide by those constraints -- and devs can often write applications or tools just once that are able to consume any information that uses the same approach. Often the complexities we must deal with in writing specs and constructing data models need not be understood at all by other parties -- yet they reap benefits from this approach. An alternative approach would be to force anyone who wants to write an interoperable extension to form a WG and go through the standardization process.

Anyway, it's fine to argue that you think we could simplify things to only support the use cases you're interested in. However, a WG is about compromise -- where we attempt to support everyone's use cases to the best of our ability. I think it would be much easier to decide whether certain use cases will be harmed if there's a concrete proposal (a PR) on the table rather than talking about all this in the abstract.

I'm really impressed by @SmithSamuelM's Information Model Agreement post. It contains a lot of actionable truth that should help us focus the discussion and reach consensus on a simple, secure, privacy-preserving information model to inform the DID specification.

@SmithSamuelM,

Layer 1: Bootstrap from a root of trust to the authoritative signing key or keys. This is the only functionality necessary to the bootstrap layer.

I think this is insufficient. I believe it is essential to also be able to discover other information about a DID subject at the root level. In fact, some DID subjects may not have any keys or may not have keys that can be used to make assertions, so you go straight from the root of trust to these other pieces of information.

This use case is missing in your analysis and I believe explains at least one of the disconnects in this issue.

I do think this conversation has been insightful and valuable, and I definitely think @SmithSamuelM brought a lot of very clear, valuable, and excellently expressed insight. I think we all brought positive insights to the table despite the echoes of tooth-grinding. At the end of the day we are not doing this to prove points to one another, or to vie for rightness, it is about advancing the core of the internet forward in a very meaningful way.

I would like to second the efforts to move the discussion into a PR that reflects the restricted use-cases and limited semantics of the proposal. Issue #65 for example, touches on many of the same abstract discussion points - it would be very helpful to be able to evaluate issues like the discussion in #65 against a concrete vision of a semantically restricted spec - we could then clearly evaluate the impact of the proposed spec in the issues of DID-resolution, DID-metadata, the role of service endpoints, and cases where the DID is not referencing an Aries attached layer-2 agent.

I would like to see the reduced, restricted, and simplified model of DIDs in a formal PR - that PR should also clear up the ambiguity introduced by the terminology in the current reference draft.

How can we best move towards that PR?

One of the editors of IETF Remote Attestation Procedures WG told me that using JSON-LD with its open world data model for DIDs makes DIDs DOA for trusted computing.

I am shocked and rather appalled by this statement, reportedly coming from someone who should be an expert in the areas of which they speak, but who demonstrates with this statement that they understand neither JSON-LD nor the Open World data model. (I won't dig into the logical fallacy of [unverifiably] Appealing to Authority, but that's also worth noting.)

JSON-LD is not a data model, it is a data serialization format, which is a subset of JSON. If JSON is viable for trusted computing, JSON-LD is also viable. If JSON-LD is not viable, neither is JSON. Note: I believe both are viable for such use, depending primarily on the data serialized therein.

The Open World Assumption in this context basically says that "anything that isn't explicitly stated, is unknown" (and that "anyone can say anything about anything", but says nothing about the veracity of those assertions) -- which is a much stronger base for security than the Closed World Assumption, which is basically that "anything that isn't explicitly stated, is not so".

I'm guessing that the speaker described above was referring to the common "anything not explicitly permitted is forbidden" security mantra (which is commonly placed in opposition to "anything not explicitly forbidden is permitted"), which has nothing to do with the Open World Assumption, nor with JSON-LD.

How can we best move towards that PR?

This is easy, someone does the work and puts forward a pull request on the spec in this repo. Any member of the WG (including any employee of any organization that is a member and invited experts) can raise a PR against the spec. If you are not a member of the group, you can always talk to the Chairs/Staff to see if they'll grant you Invited Expert status if you do the work to put together this PR and it looks like it's going somewhere.

I'm sure there are longer W3C issue threads than this one, but it's definitely the longest one I've ever been involved it. I was at the Hyperledger Aries Connect-a-thon in Provo all this week and each night when I tried to catch up with it I could never read it to the end ;-)

However today on my flight back to Seattle I was finally able to finish. So let me share two thoughts.

First, I believe this discussion, as long as it has been, has been valuable to the community as it has drawn in a wider set of views about the purpose and information architecture of DIDs and DID documents that have been present at the Credentials Community Group stage of the spec.

Second, RE next steps, a number of posts have asked for a "concrete PR" so we could stop arguing in the abstract. While of course someone could simply draft a PR redefining the data model in JSON and removing all dependencies and references to the JSON-LD spec, it’s not at all clear to me that’s the right next step. Rather I expect it might simply result in triggering the same discussions all over again and polarize us further.

Instead, I believe this discussion shows there are deeper issues we need to come to agreement on first. But rather than argue those in the abstract, what I would like to suggest is that we can do is break them down into a series of relatively concrete decisions we can discuss and make together. And that will result in steady progress towards consensus on the way forward.

Once we have done that, what should be in an eventual PR (or set of PRs) will likely be far more obvious and far less controversial.

My plane having landed, I am going to grab a Lyft and then start a new issue on the first of those concrete decisions I think we can make together.

For the sake of argument I created a DID Method based on did:key, but using JOSE, that has no @context ... so its not valid according to the did spec.

https://github.com/transmute-industries/did-jose

As I note on this issue which is related: https://github.com/w3c-ccg/vc-json-schemas/issues/7

AFAIK there won't be interop without the none JSON-LD users accepting a context which they ignore. JSON-LD is stricter than json, so if you want interop with it, you need at least the @context.

Also noted on that issue is that @context is mandatory in did core and vc spec. This means that pretty much anything that has those 2 things as dependencies should take the same approach IMO.

I think this approach of requiring the @context is the crux of the issue....

without the context, its just normal json and all the features of jsonld are lost... how will we maintain interop? what is the extension model? so many different ways we could solve these issues, and each feature that we lost will need to be addressed in some fashion...

with @context its JSON-LD and likely invalid JSON-LD if the method implementers are not paying attention to document properties and definitions.

The more I think about trying to solve this by somehow getting rid of JSON-LD and replacing it with more relaxed normal JSON, the more I feel like its a maybe not a good idea... because while its easy to delete the @context, its hard to recover all the features we would need to agree on as a community to maintain interop.

sure, not everyone uses all these features, but we get them for the price of an @context, and a requirement to understand it IF you want to interop with JSON-LD.... how much will deleting it really cost?

@OR13 You are going right to the heart of what I believe is at the very center of this debate (and the reason that this thread is so long): there are two different worldviews in conflict here.

One worldview, which I'll call the "JSON-LD worldview" or more generally the "open world semantic graph worldview" believes in the power of semantic graphs and wants DID docs to have all (or most) of the features that @msporny describes here.

The other worldview, which I'll call the "plain JSON worldview" or more generally the "hierarchical deterministic worldview" feels just the opposite. They do not want to deal with semantic graph models and do not want most of those features because in their view those features represent challenges to: a) simplicity, b) security, and c) privacy, all of which make life more complex for developers and threaten to hinder adoption.

In my experience, there are no simple solutions to worldview problems. Almost by definition, both groups are starting not just from different assumptions, but more importantly, from different value models, i.e., views of what it is important and what is not important.

Again, that's why this discussion has gone so deep and so wide. Each group is trying to convince the other about its entire worldview. That's a hard, hard problem.

The reason I started issue #140 was to start to explore one potential solution which I'll describe briefly here since it's relevant to this issue as well and also to #103 (which started this whole discussion).

The essence of the idea is to stop trying to get the two groups to agree on a worldview before we can move forward Instead turn things on their head and do this:

  1. Let the JSON-LD folks proceed as quickly as they can to develop a complete JSON-LD-based data model with all the features they want to support.
  2. At the same time, in parallel, let the plain JSON folks work as quickly as they can to develop the simple JSON-based data model with the minimal features they want to support.

Then, when both groups are done (or far enough along to be ready), get the two groups together and compare/contrast/discuss where they have landed and why.

My guess is that the plain JSON folks will have developed a hierarchical deterministic model that is an easy-to-describe subset of the JSON-LD model.

If so, aligning the two will actually be pretty easy. We'd end out with two encodings—one in plain JSON that's fairly restrictive (but meets the plain JSON folks requirements), and one in JSON-LD that's much richer (and meets all the JSON-LD folks requirements). And both can work!

I'm very curious what you (and others) this of this possible path for moving us forward.

@OR13 @SmithSamuelM @dhh1128 @ewelton (and others) - Please always wrap @context (and other @-things) in backticks, i.e. --

`@context`

-- except where you are intentionally tagging a github user.

(Optimally, go back and edit your previously posted comments to do the same.)

There is a github user with the context handle, and every time an unwrapped @context occurs, they get a notification -- which they don't want from us, as they are not working with us.

@talltree. Well stated +1.

When the JSON folks say they want the simplicity of not having an open world extensible model the JSON-LD folks respond that simplicity comes from that very same extensibility. These are two different types of simplicity and they are based on two different design aesthetics. The JSON folks have a very clear view of what they want to do and how to do it and they rationally have concluded that They don’t need JSON-LD. Likewise the JSON-LD folks have a very clear view of what they want to do and how to do it and they rationally have concluded that they need JSON-LD. Its like someone telling someone else they are irrational for preferring pizza over ice cream. What is irrational is to believe that the other side is irrational and that one can persuade them to change their aesthetic. It takes more than that it takes finding a common aesthetic that overrides the conflicting world model aesthetics. So absent that, the practical question is how best to support both aesthetics. And an abstract data model is likely the only approach that could work for both.

I am a bit worried by the approach that you propose https://github.com/w3c/did-core/issues/128#issuecomment-564434490, @talltree; you may underestimate the difficulty of "merging" the two approaches at the end of such a process.

My approach would be a little bit different, namely to do this jointly with some principles in mind.

  1. The goal should be (and I believe already is) that a DID processor should be able to process (whatever that means) a DID Doc _without any JSON-LD knowledge_
  2. This also means that the problem @OR13 mentioned should _not_ occur: it should be o.k. for a processor to process a (simple? basic?) DID Doc _without_ the presence of a @context. Put it in spec speak: the presence of a @context in a DID Doc should be a _SHOULD_ (or even a _MAY_?) but _not_ a _MUST_.
  3. For each key ("name" in JSON speak) that we add to the DID Doc definition we should make it clear under what circumstances the usage of those names would really benefit from the Linked Data aspect and, therefore, would require the author to add @context and consider JSON-LD features. Ie, what it _means_ to put those in a Linked Data context. Authors/methods/etc can then decide and/or require to use JSON-LD or not. Generic statements on Linked Data would not be helpful enough and would just shy away some people.
  4. Yes, we may hit some hurdles along the way when the "worldview" clash, but I do not think it comes that often. I was not part of the CCG discussions but, looking at the document right now, the only place where I can see an issue is the one referred to in #65 (and it seems that a proper compromise has been found there, essentially using the same pattern as for VC).

    In general, using the "non-LD worldview" might keep us in line to get the simplest possible set of concepts even if it requires some compromise on the LD side; using the "LD worldview" might force us to do a cleaner modeling of our data.

Is this a viable design method moving forward?

@SmithSamuelM has posted his comment almost at the same time :-) He said:

And an abstract data model is likely the only approach that could work for both.

and that is perfectly fine and true. But the abstract data model has to be embodied in a syntax, and I am worried to create too many syntaxes in parallel might backfire on us.

@iherman

I think that making @context optional as in MAY versus MUST (as it is now) would go a long way to resolving the conflict. That would also mean I believe that any use of @references in a document are MAY not MUST.

The proposal to use JSON as the default encoding would minimize syntaxes and would be compatible with having JSON-LD syntax be a MAY versus a MUST. But that proposal did not seem to go over well. Hence the alternative of an abstract syntax. But I agree that your proposal is a reasonable way to enable the two approaches to the world model to co-exist.

@SmithSamuelM -

Please note that github user @Drummond (whose human-world name is Valerie Drummond) is not the same as github user @talltree (whose human-world name is Drummond Reed).

Also, github users @context and @references do not need to be notified of your comments here. Please edit your latest to wrap those strings in backticks!

We really need to be more careful in how we refer to entities!

@iherman I see your point and agree what you suggest could be a constructive way for the two groups (representing the two worldviews) to work together on the semantics. I'd like to explore that in more detail as it may be the fastest way forward.

RE the @context statement, in a discussion with @SmithSamuelM and @tplooker at the Hyperledger Aries Connectathon last week, Sam make a point I'd never heard before, and which resonated strongly with me. What he said was that DID document authors need a way to explicitly indicate that no JSON-LD processing should be applied to a DID document. There can be multiple different reasons for this, but the two we discussed were:

  1. The author may want the DID document to be consumed in a security context that does not accept open world document formats (Sam says this is true of the Trusted Computing Group TEE environments).
  2. The author may want to signal to resolvers or other DID document consumers that no vocabulary is used beyond the "plain JSON" vocabulary defined in the DID spec (for speed in high volume applications, ease of processing, etc.)

When I started looking at the @context statement through that lens, I saw it in a whole new light. Rather that it being a MUST or a SHOULD or a MAY, the rules could be:

  1. If a DID document author wants the option of having JSON-LD processing applied to the DID document, the DID document MUST include the @context statement.
  2. If a DID document author does not want JSON-LD processing applied to the DID document, the DID document MUST NOT include the @context statement.

That would a very clean way for us to have our cake and eat it too, i.e., for all DID documents to share the "simple JSON" syntax and then for DID document authors who want to use the features of JSON-LD to be able to do that with a clear indication of that processing model.

@talltree,

When I started looking at the @context statement through that lens, I saw it in a whole new light. Rather that it being a MUST or a SHOULD or a MAY, the rules could be:

  • If a DID document author wants the option of having JSON-LD processing applied to the DID document, the DID document MUST include the @context statement.
  • If a DID document author does not want JSON-LD processing applied to the DID document, the DID document MUST NOT include the @context statement.

Looking at it from the point of view of testing (that will become a core issue in the rec process later), what can be tested is the presence of, or the absence of, the @context in the JSON file. Statements like "does not want" is not really a testable statement. That may be an issue with your formulation. In this respect, something which simply states that the presence of the @context makes it possible to use the DID Doc in a Linked Data setting seems to be enough for me; _not_ putting the @context into the file is equivalent with what @SmithSamuelM said, ie, that the author does not intend this DID Doc to be treated as JSON-LD.

But I must admit I do not have a strong feeling about this, I see it as a stylistic difference. I let the document editor work this out :-)


We should all be careful about the usage of @context, we are sending series of pings to guy out there...

@talltree wrote:

explicitly indicate that no JSON-LD processing should be applied to a DID document.

Finally, something we can work with! Thank you @talltree!

The author may want the DID document to be consumed in a security context that does not accept open world document formats (Sam says this is true of the Trusted Computing Group TEE environments).

Good, this is a concrete requirement that enables me to write a concrete PR against the requirement.

The author may want to signal to resolvers or other DID document consumers that no vocabulary is used beyond the "plain JSON" vocabulary defined in the DID spec (for speed in high volume applications, ease of processing, etc.)

Yes! Another good requirement. For the rest of you in this thread, these are the sorts of things that help editors write text that may achieve consensus.

Ok, so I've now spent close to 5 hours reading and re-reading this thread and just spent two hours trying to construct text that I think may have a chance at achieving consensus. Here's a concrete PR that attempts to synthesize this issue into a concrete spec change:

https://github.com/w3c/did-core/pull/142

Please jump to the PR and let's see if we can hammer on the language and get something that achieves consensus (note: I didn't say "makes everyone happy"... everyone in this thread is going to have to start compromising).

@iherman -
re https://github.com/w3c/did-core/issues/128#issuecomment-564963432

We should all be careful about the usage of @context, we are sending series of pings to guy out there...

I have to note that the quoted section of your comment includes two unwrapped instances of @context ... which unwrapping appears to have been done by a copy-paste of that section. (Github's browser-based "Quote reply" function retains such wrapping and other Markdown markup.)

Can we close this, we are trying to support 3 representations using the new https://github.com/w3c/did-core-registry

IMO this issue was resolved at the F2F... and if it was not, we should focus our criticism on the did core registry.

I am fine with closing it.

I too am fine with closing it.

I am fine with closing it.
I too am fine with closing it.

Thanks, closing because the issuer submitter (and concerned parties) are ok with closing it, there was a resolution to specify an abstract data model, and the specification has been changed to include an abstract data model section (that is waiting for content, but everyone expects that content to be written soon), and there is now a registry that assumes the existence of an abstract data model.

Was this page helpful?
0 / 5 - 0 ratings