Dxwg: Clarify the scope of `dcterms:identifier`

Created on 21 Feb 2019 · 23Comments · Source: w3c/dxwg

It would be helpful to clarify that, in the context of a dcat:Resource, the value of a dcterms:identifier denotes the _resource being catalogued_.
This is different to the identifier for the _metadata record_ or _catalog entry_, which of course is the URI of the dcat:Dataset or dcat:DataService node, etc.

This in response to a discussion in a meeting of the Australian government geospatial metadata working group which I am currently sitting in.

Currently the definition of dct:identifier just copies the DCMI definition 'A unique identifier of the item.'

dcat Dataset Resource future-work identification

Source

dr-shorthair

All 23 comments

@dr-shorthair , I don't see the risk of confusion here.

dcat:identifier denotes the identifier of the subject resource.

So, for catalogue records, it's the identifier of the catalogue record (as what is called "metadata file identifier" in ISO 19115). This is also how it is used in DCAT-AP and GeoDCAT-AP.

andrea-perego on 21 Feb 2019

If I understand your comment @andrea-perego, you seem to be saying the opposite of what @dr-shorthair is saying. An Identifier for a catalog record should be in the dcat:CatalogRecord; this would be gmd:fileIdentifier, but dcat:CatalogRecord does not include an identifier. My understanding is that identifier in the dcat:Resource identifies the described (subject) resource that the catalog record is about. I think an explicit clarification of the intention is a good idea. I see plenty of variation in the use of fileIdentifier in ISO because such clarity is also lacking in their definitions. Many users don't get the distinction between the metadata record as a resource that is distinct from the resource it describes.

smrgeoinfo on 21 Feb 2019

I agree with @andrea-perego. I don't see the confusion either.

makxdekkers on 21 Feb 2019

Agree with @dr-shorthair & @smrgeoinfo. In the same meeting as Simon and there really is confusion in ISO land. It really would be beneficial to make absolutely clear what the URI of the resource v. the dct:identifier are intended to be used for and how to split the metadata record v. resource identification.

By the way, the catalogues I've used that speak DCAT (the old one) don't ever contain CatalogueRecord, only Dataset & Distribution so, if trying to distinguish metadata record v. resource in just the Dataset metadata then differentiated use of the Dataset URI & dct:identifier could do that.

I apologise if this has been catered for in other discussion and I'm not aware of it.

nicholascar on 21 Feb 2019

👍1

I, too, have struggled with dct:identifer in the RDF context. dct:identifier predates DC terms in RDF, and was developed in a time when one created records that described a resource and gave a publicly visible identifier to the _resource_ being described. So a record for a book would have the ISBN as an identifier. I don't recall any solution for providing the identifier for the metadata record itself, which was generally considered to be an internal number and not identifying of the resource. In any case, DC terms doesn't define a record format and therefore doesn't define a record identifier.

In RDF, there are no records, and _all_ resources have identifiers, and everything is a resource. But I see the IRI of the object of dcat:record as the identifier of the catalog record. It looks to me like this is a blank node unless one mints actual IRIs for them. A dct:identifier wouldn't be needed to identify the catalog record because it already has one by virtue of existing as an RDF "thing", but if there is a known, public identifier, then dct:identifier could be used.

I'll check with Tom Baker over the weekend to see if he has a better response to this, but I think what I've given is correct.

kcoyle on 21 Feb 2019

@andrea-perego @makxdekkers This issue is not about changing usage or inferred semantics, merely about clarifying the explanation text, perhaps with additional usage note. The DCMI definition (which is what DCAT currently quotes) is failing to convey the intention to some willing users.

But from @kcoyle and @smrgeoinfo comments, and as @nicholascar confirms, there may also be some real ambiguity here.

dr-shorthair on 22 Feb 2019

If it is known what the dct:identifier identifies then that can be used in the description, giving more context for users. The dct definition is purposely vague so as to allow the widest use. A specific description shouldn't conflict with that.

kcoyle on 22 Feb 2019

I'm not sure I see the problem here. In principle, anything can have an identifier in the sense of dct:identifier - a book, or a metadata description of that book. The subject of an RDF statement is already an identifier (a URI), and RDF graph about the resource identified by that URI may not _need_ to have additional dct:identifier statements about that resource, but it could. Those identifiers would not need not to be URIs, but they identify the same resource as the subject of the triples.

In other words, the triple:

    ex:p1234 dct:identifier "p1234"

declares the string "p1234" to be an identifier for the resource identified by the URI ex:p1234.

Is there any disagreement about dct:identifier on this level?

tombaker on 22 Feb 2019

OK - I think I see the problem here: it is about whether the RDF node that has type dcat:Dataset (etc) represents the dataset itself, or a catalog entry that describes the dataset.

I have been working on the assumption that the description-within-the-catalog is distinct from the dataset-in-its-repository, so the URI for the description is not the same as the URI for the dataset.

dr-shorthair on 24 Feb 2019

👍1

@dr-shorthair All of the property/object pairs with the subject IRI of the dcat:Dataset are about the resource that is an instance of the class dcat:Dataset. It's logical for that instance to have a dct:title and other descriptive properties. Any properties that describe the physical resource, such as bytesize, however, need to have the physical resource IRI as the subject of their triple, not the class dcat:Dataset instance.

kcoyle on 24 Feb 2019

I'm thinking from a linked data point of view:

what do you get when you dereference the dataset URI?
- I suppose it is a description of the dataset, structured as a dcat:Dataset RDF graph
does this dcat:Dataset have the same URI if it is listed in different catalogs?

dr-shorthair on 24 Feb 2019

Maybe this guideline for DCAT-AP is useful?

https://joinup.ec.europa.eu/release/dcat-ap-how-use-identifiers-datasets-and-distributions

makxdekkers on 25 Feb 2019

Thanks @makxdekkers indeed that helps in that it lays out some of the issues and options, though I don't think it fully resolves it.

The geospatial metadata community wants to have an identifier for the dataset-metadata-record distinct from the identifier for the dataset, because they are managed and versioned separately. In a DCAT context that means that a dcat:CatalogRecord comes into play, since it is explicitly related to the lifecycle of the dcat:Dataset description. Conceptually that much is clear, and the separation of concerns in DCAT matches standard registry models.

However, I'm not sure that either the URI of the dcat:CatalogRecord or the URI of the dcat:Dataset identifies the _RDF graph_ that is the actual dataset description. Maybe the full story requires us to step up to 'named graphs'?

I realise I'm making fine distinctions here, and it has echoes of the notorious Range-14 discussion from >10 years ago. Most of the web has moved on from that, relying on informal understandings to recognize the distinction between external things and their web-presence, but I think the issue is still alive here.

dr-shorthair on 25 Feb 2019

GoIng back to my original comments, since there is confusion in a well-intentioned part of the user community, I believe some clarifications in the text are in order. The new chapter on identifiers explores some of the by-ways of identifier forms, but does not really address the semantics of the identifier-resource relationship.

As with all the DCT elements used in DCAT, merely re-iterating the external definition is not enough, because of either or both
(a) the original definition is misleading or lame in some way
(b) usage patterns in the context of DCAT merits some additional explanation.

IMO 'both' is the case here - https://w3c.github.io/dxwg/dcat/#Property:resource_identifier

within the definition, 'unique' is misleading ( I think the intention is that it is an inverseFunctionalProperty?) and 'the item' begs the question 'which?' (I think we mean 'the resource being catalogued' or 'the resource (dataset or data-service) which this description refers to')
the usage note only refers (obliquely) to a rather trivial lexical pattern. I suggest that we also need a note something like

The dct:identifier property records any identifier that denotes the catalogued resource, typically assigned by the resource provider, publisher, a repository manager, or some other registration authority. It is not unusual for a catalogued resource to have multiple identifiers.

Finally, I just noticed that we do not mention dct:identifier as a property recommend for dcat:Distribution, But I used it in several examples here: https://w3c.github.io/dxwg/dcat/#ex-elaborated-bag . Was I wrong or does it need to be added to the list for dcat:Distribution?

dr-shorthair on 3 Mar 2019

👍1

To be decided if this should go to milestone DCAT Future Priority Work.

andrea-perego on 22 Sep 2019

@dr-shorthair , based on what we have in DCAT 2, which work do you think is needed to address this issue?

andrea-perego on 13 Mar 2021

I think the documentation could be improved.

RDF Property: | dcterms:identifier
-- | --
Definition: | A unique identifier of the resource
Range: | rdfs:Literal
Usage note: | The identifier is a text string which is assigned to the resource to provide an unambiguous reference within a particular context. A resource may have more than one identifier assigned by different authorities, or for use in different contexts.
Usage note: | An identifier string might be used as part of the URI of the item

dr-shorthair on 14 Mar 2021

About the proposed usage note:

I think that suggesting that dcterms:identifier can be used for multiple identifiers has some issues (UC11 gives some background information) and it is not aligned with what said in §7 Dereferenceable identifiers, which makes a distinction between primary and secondary identifiers following DCAT-AP (where dcterms:identifier is used for the primary identifier, and adms:identifier for secondary/additional identifiers).

The relevant usage notes in the DCAT-AP specification:

For dcterms:identifier:

This property contains the main identifier for the Dataset, e.g., the URI or other unique identifier in the context of the Catalogue.

For adms:identifier:

An identifier in a particular context, consisting of the string that is the identifier; an optional identifier for the identifier scheme; an optional identifier for the version of the identifier scheme; an optional identifier for the agency that manages the identifier scheme

The distinction between primary / secondary identifiers is present also in other metadata schemas - as DataCite and the metadata guidance for Google Dataset Search.

My suggestion is therefore to limit the scope of dcterms:identifier to the primary identifier (the identifier assigned to the resource in the catalogue), and to add adms:identifier to the relevant class descriptions - as per https://github.com/w3c/dxwg/issues/761

andrea-perego on 14 Mar 2021

@andrea-perego I don't think that you can limited the scope of dcterms:identifier to a single primary identifier - "primary" can vary based on usage by different communities. If you want an identifier that is specific to and limited to the catalog then I think you need a DCAT property for that.

kcoyle on 14 Mar 2021

My suggestion is therefore to limit the scope of dct:identifier to the primary identifier (the identifier assigned to the resource in the catalogue), and to add adms:identifier to the relevant class descriptions

Wouldn't it be simpler to stick to one property for everything, qualified if need be (thus indicating adms:idnetifier, rather than dct:identifier) as surely it will confuse users to see two properties with the same ID but different namespaces and different modes of use.

So use adms:identifier for 'the identifier assigned to the resource in the catalogue' but create some scoping for that (perhaps that's just the un-qualified use of the property).

nicholascar on 15 Mar 2021

@andrea-perego I don't think that you can limited the scope of dcterms:identifier to a single primary identifier - "primary" can vary based on usage by different communities. If you want an identifier that is specific to and limited to the catalog then I think you need a DCAT property for that.

Thanks, @kcoyle . Defining a specific property in the DCAT namespace can be an option, but I suggest we deal with this in a separate issue, and we first address the inconsistency between the possibility of using dcterms:identifier for both primary and secondary identifiers and what said in the relevant guidance section (https://www.w3.org/TR/vocab-dcat-2/#dereferenceable-identifiers)

andrea-perego on 20 Mar 2021

👍1

My suggestion is therefore to limit the scope of dct:identifier to the primary identifier (the identifier assigned to the resource in the catalogue), and to add adms:identifier to the relevant class descriptions

Wouldn't it be simpler to stick to one property for everything, qualified if need be (thus indicating adms:idnetifier, rather than dct:identifier) as surely it will confuse users to see two properties with the same ID but different namespaces and different modes of use.

So use adms:identifier for 'the identifier assigned to the resource in the catalogue' but create some scoping for that (perhaps that's just the un-qualified use of the property).

@nicholascar , I think dcterms:identifer / adms:identifier are already playing these roles - i.e., dcterms:identifier being used as the un-qualified form of adms:identifier.

andrea-perego on 21 Mar 2021