Dxwg: DCAT for particular Resources/CKAN catalog

Created on 22 Dec 2020  ·  8Comments  ·  Source: w3c/dxwg

Posted by @Aymen-Charef

Dear colleagues,
I am working on adapting DCAT for particular resources to be published in FAO CKAN catalogue. Besides datasets, we want to publish the international standard classifications that are the reference for FAO statistical work. Here is the full list of the international family of classifications (with metadata attached) in the UNSD website: https://unstats.un.org/unsd/classifications/Family/ListByDomain
In general, the classification is issued and maintained by a custodian agency/organization, once adopted it is not frequently revised (5-10 years) alike a dataset, but requires endorsement by a community after each revision.

There is specific metadata information that is required by our users and I am striving to find the best match in DCAT terms under any of the Classes: Resource, Distribution or Record: the table below with examples.
I hope you would consider this type of resource to enrich the use and scope of DCAT Version 3. Once these classifications published in FAO CKAN, I can share it with you as implementation use case.

Metadata UNSD | DCAT | Comment
-- | -- | --
Year adopted | Property: release date/ dct:issued | e.g: 1984
Last year endorsed | Property update/modification datedct:modified | e.g: 2018
Owner | Creator | Body of statistical community
Citation | dataCite | It is not data
Maintainer | Publisher (FAO) | e.g. FAO
Reasons of revision | ?? | e.g. changes in codification system, adding categories

I will be glad if you can point me to documentation related to STATDCAT, I would like to explore it, as we are using SDMX for data >processing and dissemination (e.g for SDGs data).
Example: http://www.fao.org/faostat/en/#data/QA/metadata

Best regards,
Aymen Charef
Office of Chief Statistician (OCS)
Viale delle Terme di Caracalla
00153, Rome, Italy

dcat due for closing feedback

Most helpful comment

Dear all, Thank you very much for these clarifications. much appreciated.

All 8 comments

@Aymen-Charef, Here a first a quick reply based on my personal view. Other DXWG members or DCAT editors might have different and more elaborated ideas on your cases.

You want to provide classifications as items of the catalog. Did I get this right?

In that case, as a first attempting solution I would consider providing the classifications as datasets. That might make sense, especially if your classifications are provided in more than one concrete way (aka, have different distributions), e.g., CVS, SKOS.

You could distinguish among different kinds of datasets using "soft types" (aka, dct:type typeOfdataset)

Data and classifications are two different things in many cases. However, the notion of datasets is quite extensive in DCAT.

Would this be reasonable for you?
If your use case has specific constraints I have overlooked please feel free to mention them.

I agree that it is quite sensible to consider a 'classification' (a kind of controlled-vocabulary?) as a 'dataset'.
A classification is a particularly high-value dataset, you might also call it a 'reference dataset' but managing and cataloguing it as a dataset makes sense.

I want to have the classifications as items of the catalog.

Thank you . I can have them as "reference datasets".

For terms, I found the best matching "Version information" to accommodate "Reasons of changes".

@Aymen-Charef said:

I will be glad if you can point me to documentation related to STATDCAT, I would like to explore it, as we are using SDMX for data >processing and dissemination (e.g for SDGs data).
Example: http://www.fao.org/faostat/en/#data/QA/metadata

StatDCAT-AP is available at:

https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/statdcat-application-profile-data-portals-europe

There you can find also links to the existing releases.

The working space is on GitHub: https://github.com/SEMICeu/StatDCAT-AP . In case you have questions, you can create an issue there: https://github.com/SEMICeu/StatDCAT-AP/issues

Finally, an overview of StatDCAT-AP is provided in the following papers:

I am currently working on a project where a DCAT-based application profile is used for the description of "reference data assets" -- things like code lists and controlled vocabularies -- that are modelled as dcat:Dataset.

To complement what said so far:

  1. Existing examples of classifications (as well as code lists, thesauri, mappings, etc.) documented as datasets can be found in the EU Open Data Portal and the European Data Portal. E.g., all reference data maintained by the EU Publications Office (EuroVoc included) are published as datasets - see: https://data.europa.eu/euodp/en/data/publisher/publ
  2. The same applies to the reference data maintained in the INSPIRE Registry. This is also reflected in their RDF representation, where a code list is typed both as a skos:ConceptScheme and dcat:Dataset - see, e.g..: https://inspire.ec.europa.eu/theme/theme.en.rdf
  3. The DCAT profile which probably fits best your use case is BRegDCAT-AP (DCAT application profile for base registries), which is also complemented by a number of support tools: https://joinup.ec.europa.eu/collection/access-base-registries/solution/abr-bregdcatap-tools/about - @makxdekkers , I wonder whether this is the project you were referring to
  4. As suggested by @riccardoAlbertoni in https://github.com/w3c/dxwg/issues/1286#issuecomment-749616859 , you can put a "flag" to say that this is a classification by using dct:type. This approach is used in DCAT-AP and related extensions. The Dataset Type Named Authority List maintained by the EU Publications Office could be used for this purpose: https://op.europa.eu/en/web/eu-vocabularies/at-dataset/-/resource/dataset/dataset-type
  5. About the terms to be mapped:

    • Dates: You may also consider using other properties from DCTERMS, as dct:dateSubmitted and dct:dateAccepted, or to complement the date with the classification status (approved, endorsed, etc.). We have included a draft section on these aspects in the first public WD of DCAT3 (see §10.3 Resource life-cycle)

    • Citation: It is not clear to me if this is meant to specify how to cite the classification, or to point to a document citing the classification. In the latter case, DCAT recommends using dct:isReferencedBy (see §C.3 Link datasets and publications). In the former case, you may consider using dct:bibliographicCitation. An example of its use is DCAT-AP-JRC (a DCAT-AP extension for research data) - see https://ec-jrc.github.io/dcat-ap-jrc/#dataset-bibliographic-citation

    • Maintainer: Whether to map it to dct:publisher depends on what "maintainer" means in your context, and whether maintainer and publisher could be different. Note that DCAT provides a general pattern for the specification of roles for which no specific property is defined - see §14.1 Relationships between datasets and agents

    • Reasons of revision: Indeed, the closest notion in DCAT3 is "version notes" in §10.2 Version information (adms:versionNotes), which is also used in DCAT-AP and related extensions

@Aymen-Charef: Does the thread answer your request? If yes, can we close this GitHub issue?

Dear all, Thank you very much for these clarifications. much appreciated.

Was this page helpful?
0 / 5 - 0 ratings