Provide means to distinguish the primary and alternative (legacy) identifiers.
Related to #53 & #68.
Following from discussions in #53 and @riccardoAlbertoni's proposal in the wiki, neither Dublin Core nor ADMS have terms to represent alternative identifiers.
In DATS we followed DataCite approach by having a representation of: primary identifier, alternate identifiers and related identifiers.
Proposal 1 in the wiki suggests using
as far as I have understood, this was the guideline made by DCAT-AP to manage duplicates, though I am not 100% sure this suggestion is still valid in the newest DCAT-AP release.
@agbeltran Is this acceptable or Do you think we should add new specific terms in DCAT 2?
I guess @andrea-perego and @makxdekkers might have their own view on the opportunities/issues behind the reuse of such a guideline.
That proposal sounds OK to me.
+1 from me as well.
The issue I see with the proposed solution is that one may want/need to specify the agency that manages the identifier scheme also for primary identifiers, rather than just for secondary identifiers.
@agbeltran In my mind, the distinction between 'primary' and 'secondary' identifiers is related to what you want to do with them.
The 'primary' identifier in @riccardoAlbertoni's proposal and at https://joinup.ec.europa.eu/release/dcat-ap-how-manage-duplicates, is used for (a) linking back to the orginal publication of that dataset, and (b) to (string-)compare identifiers to see if two descriptions refer to the same dataset.
The 'secondary' identifiers are ones that play a role in a wider context, and for which you need to declare that context to understand what they are.
From what I remember of the development of ADMS, the adms:Identifier class was created primarily for non-resolvable identifiers. For example, a prublisher might have an identifier "XYZ123", either a local production number, or coined in some other (non-Web) context, in which case it would be necessary to express what it was or where to look it up.
During development of DCAT-AP, it was noted that in situations that descriptions were exchanged, shared or harvested, intermediaries could change, e.g. correct or enhance, a description along the way.
It was then agreed that there needed to be a way to refer back to the original description of a dataset, and the notion of primary and secondary identifiers was introduced with different usage.
It might be that this is more an issue for a profile than for the base standard, though.
Thanks @makxdekkers. First, while I realise that the wiki and the discussions have focused on identifiers for datasets, the solution we provide should also tackle identifiers for other entities (catalogues, people, services, and even distributions).
For datasets, as we are mostly considering them in the context of a catalogue (even though we do have an issue about the relationship between datasets and catalogues #62), the primary identifier would be the identifier for the dataset in the catalogue being considered IMO. When dealing with these identifiers programmatically, I think it would still be useful to be able to indicate what is the identifier type for those primary identifiers.
@agbeltran , I think the catalogue context may apply also to identifiers for other resources.
Take ORCIDs as an example: we are using them in the JRC Data Catalogue for dataset authors/contributors, along with their name and, possibly, email. In the JRC Data Catalogue, contributors are identified by a specific URI, whereas the ORCID is specified both as an alternative URI (owl:sameAs) and identifier. On the other hand, in the ORCID catalogue / registry, their primary URI / ID is the ORCID.
@andrea-perego I think your comment indicates that the notions of primary and secondary identifiers are very much application-specific. I am still of the opinion that the base standard should not try to solve the issue. It might mention that there is a requirement in particular applications to make the distinction, but it should not mandate a general approach.
@agbeltran @andrea-perego Considering @makxdekkers' comment, I have changed the text of issue 67 about primary and secondary ids, which is now
The need to distinguish between primary and legacy identifiers for a dataset has been posed as a requirement. However, it is very much application-specific and should be better addressed in application profiles rather than being mandate a general approach.
I have moved the issue before of the duplication guidelines. If the issue text captures fairly well the group agreement, we might reuse the issue text directly in the document and to close the issue. Otherwise please feel free to suggest a rephrasing ...
@riccardoAlbertoni , I am happy with the revision - although we may need come back to this, or at least the text may need to be further elaborated. E.g., we need to define what we mean with primary, legacy, and alternative / secondary identifiers.
Only, I would recommend, following @makxdekkers 's consideration in https://github.com/w3c/dxwg/pull/614#issuecomment-444594450 , to replace "legacy" with "alternative" or at least with "legacy, and alternative".
@andrea-perego @riccardoAlbertoni My worry was indeed that it is not clear what the meaning of 'primary' and 'alternative' is, basically because it depends entirely on context. There are many examples of situations where things (people, cars etc.) have several identifiers that may be primary in one context and secondary in others (e.g. tax identification, social security numbers). In the case of datasets, @agbeltran argues that "_the primary identifier would be the identifier for the dataset in the catalogue being considered_"; other people think that the primary identifier should be the one in the catalogue where the dataset was first published. So I think it will be hard to come up with a generally applicable definition of the terms 'primary' and 'secondary/alternative'.
Having said that, I have no objections to the current formulation "_it is very much application-specific and should be better addressed in application profiles_" as this means we don't have to say anything more about it.
While there one or two minor edits still to be done on the PR #614, there appears to be consensus around the following conclusion:
"The group has agreed that distinguishing between primary, secondary (alternative) identifiers it is very much application-specific and should be better addressed in application profiles rather than being mandate a general approach."
Flagging this as "due for closing" to encourage any concerns to be raised.
Closing this specific issue as ''resolved', and any further discussion/issues/proposals can be included in #675