We need a consistent rule on how to handle unknown properties when consuming a DID document, in any format. There are three common modes:
Are there any other approaches?
This mostly has implications for implementations using "local" properties that aren't registered. We want to enable that, but as a consequence we need to define the behavior of other systems that run across these "local" properties in the wild.
The JSON and CBOR mappings MUST ignore unknown member names, to enable extensibility.
@selfissued It seems they must at least retain all members, known or unknown when transforming between representations. Otherwise, we won't be able to have lossless extensibility.
I'm ok with the requirement of ignoring them otherwise, but they should not be disposed of, IMO.
Which is to say the first option, ignore but retain.
@jandrieu Agreed. That was my intent.
- reject the document; this is common with strict type parsers and is the default in JSONLD processors
JSON-LD processors drop unknown predicates. See, for example, this example on the jsonld playground
@dlongley
@iherman At least for jsonld.js/jsonld-signatures, there are controls for this. When dealing with signatures, the processor will optionally error on unknown properties. That's on purpose to ensure everything present is signed. For instance, in the above playground link, hit the "Signed with RSA" tab.
@davidlehn, thanks, I did not know that. However, that does not affect the issue originally raised by @jricher. It seems that there is a consensus among participants on this thread that dropping unknown properties is preferable for CBOR and JSON, and the original remark on JSON-LD seemed to suggest that this approach is not doable for JSON-LD. I just pointed out that this is not the case.
I think how this would affect the signature of a DID document is again common to all three formats. That issue has to be dealt separately...
The signature isn't going to survive document translation in any event, though. The signature format needs to account for its source, including any unknown fields.
Are there any other approaches?
We could introduce forward compatibility by requiring a field with any new property that states how to handle the property if it not recognizing: ignore, drop or reject.
(For old folks like me: this is exactly how forward compatibility was introduced in the second release Signalling System #7 for digital telephony)
As a Method Implementer trying to support multiple representations "JSON" + "JSON-LD",
I'm strongly in favor of reject the document; this is common with strict type parsers and is the default in JSONLD processors
Note that this does not mean that you can't define arbitrary JSON in JSON-LD... it just means you have to explicitly state where you expect to see it.
The same is true of adding support for additionalProperties in JSON Schema.
"additionalProperties": true => your did document can have anything you want!
@csuwildcat are you sure you want arbitrary JSON to be allowed in did documents?
We should retain normative text saying that unknown properties MUST be ignored. Otherwise, DIDs would not be extensible.
Is it possible to do birectional lossless transformations with unknown properties being ignored?
We should retain normative text saying that unknown properties MUST be ignored. Otherwise, DIDs would not be extensible.
Mike,
Another way to introduce forward compatibility is that every property has an attribute that explains what should be done if the property is unknown, where the default (i.e. no forward-compatibility attribute) would be to ignore the property. Unfortunately, such forward compatibility can be hard to introduce in a backward compatible way. For further discussion ...
Oskar
didn't we have a discussion about extensions to the DID document outside of JSON-lD that would help solve this?
Is it possible to do birectional lossless transformations with unknown properties being ignored?
I think so, so long as "ignored" means retained, not dropped?
FWIW, the reason to not ignore undocumented properties... is that they alter signatures and are an attack vector or content addressed systems...
I had typed up several different responses here while considering the issue -- mostly around trying to highlight how there are several different concepts of "unknown".
I think what I settled on is that we should refer to "unknown" properties as "ambiguous" properties (or properties with ambiguous semantics) because it helps highlight the problem. Such properties could be interpreted in multiple ways; it isn't just that there is one global meaning and our application doesn't happen to be programmed to understand it.
In that light, I believe this issue really just comes down to whether a consuming application is only locally consuming a DID Document -- without performing any transformations on it -- or if it is doing something more.
The group has already made the decision to create the DID spec registries to ensure that it is possible to losslessly transform a DID Document from one concrete representation to another. For properties without entries in the DID spec registries, i.e., properties that have ambiguous semantics across concrete representations, it has been decided that there is no reasonable expectation that such a transformation is possible.
This seems to indicate that if data is only locally consumed and not transformed, then ambiguous properties can be safely ignored (and retained). However, for any other case, the transformation involved defines what happens. There are various types of transformations that may happen in JSON-LD spec, but the behavior and expectations of those aren't for the DID WG to define. Here we need to say what happens if the transformation is from one DID Document concrete representation to another. And, I think there are only two options that are on the table: dropping ambiguous properties or throwing an error when they are encountered. Since there can be no reasonable expectation that a lossless conversion can occur with these "local" properties -- that are by nature not defined in the DID spec registries -- I think we should throw errors to avoid creating false expectations.
So, I propose that local only, non-transformative use cases do not require an application to throw an error, only that these applications should ignore ambiguous properties. All other cases require an error to be thrown when an ambiguous property is encountered.
-1 to retaining unknown properties... is SSN an unknown property? is dateOfBirth and unknown property?
this is privacy cancer :)
Per the discussion on today's call, ignoring properties that are not understood means that implementations MUST NOT throw and error when encountering them.
Per the discussion on today's call, ignoring properties that are not understood means that implementations MUST NOT throw and error when encountering them.
To be clear, this was @selfissued's position on the call today. In my comment above and on the call, I noted that this is good advice when a consuming application is not performing any transformations on the data. However, if format transformations are performed and ambiguous properties are encountered, I recommend that an error be raised. Ambiguity can be resolved in these cases by using the DID spec registries (i.e., if the ambiguous property is added to the registry, transforming applications could avoid throwing an error because the ambiguity could be resolved via the registry).
https://tools.ietf.org/html/rfc7519#section-12
A JWT may contain privacy-sensitive information. When this is the
case, measures MUST be taken to prevent disclosure of this
information to unintended parties. One way to achieve this is to use
an encrypted JWT and authenticate the recipient. Another way is to
ensure that JWTs containing unencrypted privacy-sensitive information
are only transmitted using protocols utilizing encryption that
support endpoint authentication, such as Transport Layer Security
(TLS). Omitting privacy-sensitive information from a JWT is the
simplest way of minimizing privacy issues.
Omitting privacy-sensitive information from a JWT is the simplest way of minimizing privacy issues.
Aside from using a private / unpublished DID over a secure channel, there is no other mechanism for protecting sensitive properties in a DID Document (encryption has a shelf life... do not publish cipher text to ledgers / public networks).
DID Document's that contain properties that are not registered, are almost certainly going to be including properties that impact privacy... since they are at the very least, vectors for fingerprinting, reducing anonymity... https://en.wikipedia.org/wiki/Degree_of_anonymity
See also: https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html ....input validation is related to "ignoring" unknown properties
I think we need to address this via content-type and default parser behavior...
This needs a PR to address this: the group needs to decide whether things get ignored, passed through, handled as errors, or whatever -- as long as it's consistent across representations.
I can take a stab and improving this, in did core... I tend to agree that we should allow unknown properties to be ignored... and note that JSON parsers do that by default... and that if someone asks specifically for JSON-LD... that maybe that means that unknown properties are dropped... that seems fine... I would never ask for JSON-LD if that were the case :)
I believe that we should move the statement that properties that are not understood from the JSON section to the abstract data model section, so that it applies across representations.
+1 to @OR13 that this would be handled separately depending on content-type, Do we have a full list of content-type ?
if content-type == JSON-LD {
....
} else if content-type == jwt {
....
} else if content-type == json {
....
} else if content-type == cbor {
...
} default {
.... but I'm not sure what default handle unknown should look like .
}
so far we have seen application/did+json and application/did+ld+json... I would expect unknown properties to be forwarded without tampering when asking for application/did+json.... when asking for application/did+ld+json I would expect one of 2 things...
The end result of this behavior is that only people who really understand json-ld will request it... and that seems to be inline with how things should be.
To be clear, I'm still not sure what is meant by unknown. Is it not in the JSON-LD context, is the JSON-LD not resolvable at the given time of validation/verification, are the terms/attributes not in the DID-spec-registry, are they not in a registered IETF/IANA registry like published algorithms for JWT, JOSE or COSE, (i.e. ES256k-R, is it unknown to a specific verifier because it is DID method specific, or is it unknown because the specific crypto library is simply out of date and doesn't have a good implementation of the IANA registry items?
yes, this was why I attempted to import sufficient language regarding did core into the registries in this PR:
https://github.com/w3c/did-spec-registries/pull/115
In essence:
application/did+json documents have no "unknown" properties.... they have known ones, which are in the registries.application/did+ld+json documents have "unknown" properties... you can get a list of them, by applying JSON-LD Compact / Expand and counting the properties that are dropped... all properties not included in the @context will be dropped.For application/did+json, this is easy... do not do anything with "unknown/known" properties... just forward the JSON... its not your job to mess with it at all anyway.
For application/did+ld+json, there are 3 options:
application/did+json but forward the result as application/did+ld+json (don't drop unknown properties)application/did+ld+jsonDID Methods appear to be doing 1 today, and they would need to write extra code to do something else... I expect no amount of normative text will actually change this behavior... we should avoid telling people they MUST do stuff they will not actually do, and have not to date done.
The Universal Resolver is doing 2 today.... I believe this will result in either of the following scenarios:
a. nobody ever requests application/did+ld+json
b. universal resolver decides to throw an error instead of returning a document with dropped properties when someone requests application/did+ld+json
At the heart of this issue and the JSON vs JSON-LD issue, is the conceit that somehow this WG can control what DID Methods do.... we cannot.
If the DID Core spec says a DID Method can return JSON, that JSON can be JSON-LD or not.... the DID Method can decide... by supporting application/did+ld+json and/or application/did+json or not... by including an @context or not, and by making sure it always contains no "unknown" properties or not.... its up to the DID Method to decide this stuff... and per the spec today, a did method could comply with the spec and return the following did document:
{
"id": "did:predator:data uri image of a pedofile",
"imageOfHomeAddress": "data uri image of a house",
"streetAddress": "...",
^ thats a valid JSON DID Document, making use of preservation of unknown properties feature.
If you think this problem is solved by JSON-LD... think again....
{
"@context": ["https://www.w3.org/ns/did/v1", "https://predators.example.com/ns/did/v1"],
"id": "did:predator:data uri image of a pedofile",
"imageOfHomeAddress": "data uri image of a house",
"streetAddress": "...",
Now both are valid did documents...
and no "unknown" properties would be dropped from either, regardless of which representation you requested...
Does anyone think that having:
{
"@context": ["https://www.w3.org/ns/did/v1"],
"id": "data uri image of a pedofile",
"imageOfHomeAddress": "data uri image of a house",
"streetAddress": "...",
get transformed to
{
"@context": ["https://www.w3.org/ns/did/v1"],
"id": "data uri image of a pedofile",
Is doing anything for us?... especially in a world where the did method author controls the resolution process?
DID Core cannot compel unknown properties to be dropped without destroying interoperability between JSON and JSON-LD... we have imported the security implications of vanilla application/did+json... for better or worse, this was the result of the F2F in Amsterdam... we just suck at explaining what actually happened in plain english / kicked the can on the hard topics like "unknown" properties.
IMO, this is fine because we have already heard on numerous WG calls that additional validation is required before a consumer will process a JSON document... and its the responsibility of consuming software to do that... and they are not required to do anything other than parse the JSON / CBOR and then decide if they care about JSON-LD or JSON Schema, or SHACL, or IPLD Schemas, or whatever.... all that stuff is not our concern.... our job is to explain what standard properties look like, and how they are represented in supported representations (JSON).
We have the did spec registries to do this, and it says.... we use application/did+json for JSON and application/did+ld+json for JSON (which sometimes contains unknown properties and sometimes does not)... all is as it should be :)
For
application/did+ld+json, there are 3 options:
- do the same thing as
application/did+jsonbut forward the result asapplication/did+ld+json(don't drop unknown properties)- drop unknown properties and forward the result as
application/did+ld+json- throw an error.
I must admit I am not sure what you mean by "forwarding" in this case. However, per https://github.com/w3c/did-core/issues/205#issuecomment-589865739 (see also the playground example) a JSON-LD parser _drops_ the unknown property when turning the document into RDF. What this suggests to me (again, I am not sure what you mean by "forwarding") that (2) above is the compatible approach with a vanilla JSON-LD processor.
(Acknowledging the remark of @dlongley on the signature issue but that is probably an orthogonal problem insofar as it raises a problem both in JSON and in JSON-LD.)
@iherman
- drop unknown properties and forward the result as application/did+ld+json
By this I mean the result of the following:
https://github.com/gjgd/jsonld-checker/blob/master/packages/lib/src/check.ts#L18
https://json-ld.org/spec/latest/json-ld-api/#dom-jsonldprocessor-expand
https://json-ld.org/spec/latest/json-ld-api/#dom-jsonldprocessor-compact
By doing this, all properties absent from the @context are dropped.
It is possible that a JSON-LD Producer might return application/did+ld+json with properties that are not in the context...
It is also possible that we could define that a JSON-LD Consumer not drop them... (we all know that whenever expand + compact gets called later, they will be dropped).
The question is whether we want to say that all JSON-LD Consumers are always doing the following in this order:
If all JSON-LD consumers are always doing this, or we want to require them all too, we should say:
A JSON-LD Consumer MUST drop all properties from the document not defined in the context (+note on how to do this).
So the question really boils down to: "What is a consumer"... is it equivalent to an HTTP client that:
You can see that if we stopped at 2, JSON and JSON-LD would be consumed the exact same way... if JSON-LD goes further to 3, unknown properties (properties not in the context) will always be dropped.
Finally what I mean by forwarding... is that the consumer has requested application/did+ld+json for some reason.... and it will provide that data to software... for example, a documentLoader used to verify credentials, or some other system which cares deeply enough about linked data to desire that it not be malformed... A consumer, is not the end of the road... but it is the end of getting application/did+ld+json into a JSON object representation...
We already know that a JSON consumer produces JSON with unknown properties. (fancy way of saying that it does not alter what a producer has produced)
The question is should a JSON-LD consumer produce JSON-LD with unknown properties or not :)
This will be a topic of he upcoming virtual face-to-face meeting.
Addressed in #454
This issue will be closed when PR #454 is merged.
PR #454 has been merged; closing.