Dxwg: Examples of a manifest for DCAT2

Created on 26 Jun 2020  路  8Comments  路  Source: w3c/dxwg

Opening an issue to track the request by @ericstephan [1]. A copy of the request is attached below.


Hi all,

I'm interested in the concept of a manifest file to identify datasets that belong to a dataset. In the DCAT and DCAT2 recommendations, the following phrase appears "Aggregated DCAT metadata can serve as a manifest file as part of the digital preservation process.". Unfortunately, I can't find examples of DCAT manifests.

Are you aware of examples of aggregated DCAT metadata that support both a manifest that contains a listing of datasets and composition?

I am also interested if there a concept of manifests for a profile that relies on other profiles.

Thank you,

Eric Stephan
US DOE Pacific Northwest National Laboratory

dcat due for closing feedback

All 8 comments

This is essentially the bag-of-files example https://www.w3.org/TR/vocab-dcat-2/#examples-bag-of-files, but would perhaps be better if dcterms:hasPart were used in place of dcterms:relation

Note that all the sub-properties of dcterms:relation are already mentioned as being available on all dcat:Resource - see https://www.w3.org/TR/vocab-dcat-2/#Property:resource_relation - though the scope note could be improved to _recommend_ use of a more specialized sub-property to get better semantics.

Also see https://www.w3.org/TR/vocab-dcat-2/#qualified-relationship

(for membership of a data-series - #806 #868 use dcterms:isPartOf/dcterms:hasPart)

@ericstephan: You can find on the European data portal some examples of DCAT in the manifest, see the following two

https://www.europeandataportal.eu/data/datasets/diplomati-stranieri-a-scuole-di-specializzazione-anno-solare-2013?locale=en

https://www.europeandataportal.eu/data/datasets/593c5b46-5c92-475e-90fa-20077fc3c06e?locale=en

Do the guidance provided by @dr-shorthair's reply and the above examples address your question? Or did you have more specific use cases in mind?

@dr-shorthair - I prefer hasPart because it has an inverse relation isPartOf that I can use to represent a list, tree or any dag structure for a dataset composed of other datasets.

@riccardoAlbertoni - A classic example in the modeling and simulation (M&S) world where a published dataset is comprised of many other datasets (in this case a dataset could be a file or a directory of files).

simulation run dataset A is composed of other datasets that includes:

  • human readable description of the simulation experiment
  • input parameters
  • model that was used for the simulation (for example network feeder model for power grid study
  • configuration settings for the simulation
  • domain specific output produced by the simulation
  • logfiles or provenance explaining lineage, errors during the simulation

A sensitivity study is where the simulation is run a thousand times altering only one or two input parameters to understand the input's impact on simulation output. In this case, the sensitivity study is a dataset, the 1000 simulation datasets are a part of the sensitivity study, and all of input parameter, model, configuration settings, output, and logfile datasets are a part of a given parent simulation.

Yes this does have overlaps with Provenance for the simulation team running the experiments, but the consumers of published simulation runs and sensitivity studies are more interested in the composition of the dataset when analyzing published experiments.

Does this help?

Thanks for this clarification, @ericstephan .

May I ask if your requirements concern only describing the specific role played by each component (input parameters, model, etc.), or also the actual simulation workflow?

I think that DCAT qualified relationships pointed by @dr-shorthair work well in the case illustrated by Eric. Of course, he might need to mint domain/case-specific roles.

If representing the workflows and their execution is a need, I would suggest reusing OPMW-PROV, which relies on PROV and OPMW.

@ericstephan: Is that combination of vocabularies working for your case? or do you think there is any specific missing part we should consider to add in DCAT?

@riccardoAlbertoni exactly, I think the existing provenance and workflow vocabularies are sufficient for describing the role of the various components. @andrea-perego based on @riccardoAlbertoni this would provide the clarification necessary for inputs, outputs etc.

@ericstephan I suppose you know this, since you're the originator of both this enquiry and another about PROF, but just for the purposes of this thread: The Profiles Vocabulary (PROF)'s ResourceDescriptor class supports a hasRole property and we have a vocab of Role instances so that a PROF Profile can be linked to a number of resources with roles, formats, nots on their conformance etc. This then is a profile manifest.

While PROF's ResourceDescriptors are conceptually related to Distributions, there's no sense in which the ResourceDescriptors are a part, as in part/whole of the Profile since multiple profiles can refer to resources and to external resources etc.

If I am not misreading, we have dealt with this issue.
Let's wait for the acceptance of pull #1244, and then I think we can close this issue.

@ericstephan: Am I right?

Was this page helpful?
0 / 5 - 0 ratings