dxwg 🚀 - Change domain or create superclass of dcat:Distribution

I assume this is to assist in managing representations of standards/profiles?

Dataset --> Distribution is a well-known terminology in the data space, but elsewhere Resource --> Representation is more typical. So I'd favour the superclass/super-property route. How about

dcat:hasRepresentation a owl:ObjectProperty ;
    rdfs:domain dcat:Resource ; 
    rdfs:range dcat:Representation .

dcat:distribution rdfs:subPropertyOf dcat:hasRepresentation .

dcat:Representation a owl:Class . 

dcat:Distribution rdfs:subClassOf dcat:Representation .

dr-shorthair on 22 Aug 2018

dcat:Resource needs the equivalent property

@rob-metalinkage Why is this? dcat:Resource is either dcat:Dataset, then it has dcat:distributions, or dcat:DataService, then it does not need a dcat:distribution. I always pictured dcat:Resource as an abstract class.

Are there any examples where you actually have an instance of dcat:Resource which is neither a dataset nor a data service and it needs to have a representation? And does this instance belong to DCAT catalogs?

jakubklimek on 22 Aug 2018

I understand that the concern is to be able to add Profiles (and other technical standards) to a Catalog. Is a Profile a subclass of Dataset, or is it another kind of Resource?

dr-shorthair on 23 Aug 2018

I dont think a profile is either a Dataset or a Service - so its a direct subclass of Resource, hence the need to have the equivalent of a Distribution, and the suggestion to model this consistently with the dataset/Distribution relationship. I think Services and service descriptions will need to be treated the same too - e.g. a OAS API document for a service API.

rob-metalinkage on 24 Aug 2018

@jakubklimek : In the Use Cases we have documented exactly this practice, where DCAT-AP profiles have been catalogues as pseudo-Datasets.

see #238

rob-metalinkage on 24 Aug 2018

As per #238 - It's been on the plenary agenda but we never discussed it. Should I move it up so that we cover it at an upcoming meeting? It seems to be addressing some things that are needed so we should see if the WG will accept it.

kcoyle on 24 Aug 2018

👍1

@rob-metalinkage: I agree that a profile is different from a dcat:DataService, but not so sure that profile cannot be seen as a dct:Dataset. In what way do you see it being different from "_A collection of data, published or curated by a single agent, and available for access or download in one or more formats_"?

makxdekkers on 24 Aug 2018

@makxdekkers I think it depends on how "data" has been defined. If it is defined so broadly that any file of ones and zeroes is data, then it includes an e-copy of War and Peace. I don't think that's the object. But it doesn't look like "data" is itself defined, which may be the problem here. I wouldn't consider a profile that is a PDF a dataset. I'm not clear in my own mind if I would consider a profile expressed in SHACL a dataset. Yet I can see serving SKOS vocabularies as datasets.

Summary: I think this needs to be solved by defining the term "data", of which a set is called a "dataset"

kcoyle on 24 Aug 2018

@kcoyle If I remember correctly, the Goverment Linked Data working group that developed the 2014 version of DCAT spent hours and hours trying to define the boundaries of 'data' and 'dataset' and ended up with the current definition. Everytime someone suggested a boundary, someone else brought up an example of something clearly outside of the proposed boundary that everybody agreed could be seen as a dataset. Even in your examples, we could talk endlessly about why you think a concept scheme with a SKOS/RDF expression is a dataset and a profile with a SHACL/RDF expression and a printable PDF expression isn't. It's very personal. The GLD group decided that the discussion would never lead to consensus, and settled on the view that DCAT just provided a model and a set of properties that anyone could use to describe anything that they considered to be 'curated data'. I would suggest we do not reopen that discussion and try to define 'dataset' beyond the current definition. I honestly think we're not going to achieve a 'better' definition.
And, yes, it means that someone can describe the book War and Peace as a Dataset, even if other people may think that it doesn't make sense.

makxdekkers on 24 Aug 2018

👍2

+1 to Makx. In any case, I think it can be useful to distinguish between a dataset and a profile. I can also think of other things that can be resources distributed with a dataset but are not generally considered datasets, such as images provided as visualizations of the data, or code lists, or written protocols. Whether or not you see any of those things as datasets, it can be useful for someone else to distinguish them.

agreiner on 24 Aug 2018

👍1

I think some of this was already discussed in https://github.com/w3c/dxwg/issues/64, which mentions a wide range of potential dataset types. Profile could well be a genre of dataset.

makxdekkers on 24 Aug 2018

@makxdekkers That's fine with me if we accept the broader definition, which implies, although it does not state, that anything can be a dataset. In that case, a profile could be a dataset, but I'm not sure that treating it as a dataset is useful in our context. I think the criteria for deciding is "what functionality do we want around profiles?" not "what is the formalism that describes them?" Everything is an rdf:Resource but we usually go on to define more specific classes for our specific uses.

kcoyle on 24 Aug 2018

@kcoyle a PDF or a SHACL document are both concrete representations of things, therefore they cannot be Datasets - which are conceptual things. A dataset might be _represented_ by PDF or SHACL Distributions.

This is the core of the issue that @rob-metalinkage raised - by implication at least. i.e. that there is a separation between Profile and Profile-Description (or more generally between Technical Specification and Specification Document) which is parallel to the one that we already have between Dataset and Distribution.

dr-shorthair on 27 Aug 2018

👍2

If we really cared (and i'm not convinced we should) - the we would need to define both data _and_ set - and to my mind what distinguished a dataset from the more general Resource, is that its a set of things - and we can make statements about both the types of things in the set and the membership of the set. I dont see documents fitting that very well, as there is not much useful to say other than 1s and 0s are ordered members of the document.

Because we would want to qualify the relationship between concrete representations and the abstract thing (i.e. a SHACL document expresses constraints against an RDF vocabulary, vs a document containing guidance) - modelling Profiles is similar to modelling Datasets and their possible distributions, or Services.

We do not need to axiomatise disjointness between subclasses of Resource, but separate models do seem to be useful, according to decisions taken already in DCAT group, so this issue is just a natural consequence of that.

rob-metalinkage on 27 Aug 2018

@rob-metalinkage What is the actual motivation behind the original issue? If it is to accommodate for profiles, I would just create a dcat:Profile (or prof:Profile) as a subclass of dcat:Resource, create the dcat:distribution equivalent there and leave the dcat:Resource still as an "abstract" class.

Specifying rdfs:domain of dcat:distribution as dcat:Resource seems odd to me as It would mean that suddenly dcat:DataServices have dcat:Distributions, which were coupled quite tightly with dcat:Datasets.

If we are looking for modeling something like frbr:Work, frbr:Manifestation and frbr:Expression, then I think we would have to have a new superproperty with domain dcat:Resource and its subproperties, where one of those would be dcat:distribution for dcat:Datasets, and other subproperties for dcat:DataServices and prof:Profiles.

jakubklimek on 27 Aug 2018

If we need to allow for representations of things other than Datasets then I think we need a super-class of dcat:Distribution and a corresponding super-property associated with dcat:Resource - perhaps like this:

Resource + Representation

We already have

dcat:Dataset rdfs:subClassOf dcat:Resource .

This would just add the complementary

dcat:Distribution rdfs:subClassOf dcat:Representation .
dcat:distribution rdfs:subPropertyOf dcat:hasRepresentation .

Then if any new type of thing is also catalogued as an individual of a subclass of dcat:Resource (such as a profile) then a representation of this can be associated without forcing everything into the Dataset and Distribution boxes.

dr-shorthair on 31 Aug 2018

👍1

@dr-shorthair This seems clean and reasonable.

What do you think of designating the dcat:Resource and dcat:Representation abstract, so that if anything else pops out (like a Profile), a subclass of Resource has to be created for that, so that it is clear what it is - i.e. discourage creating instances of just dcat:Resource and dcat:Representation?
Doesn't this align with the FRBR approach? Shouldn't it be reused in a more explicit way?

jakubklimek on 31 Aug 2018

Yes. The class names are in italics in the diagram which is the UML convention for 'Abstract'

And there is a usage note on dcat:Resource which says

It is strongly recommended to use a more specific sub-class when available.

A similar recommendation would be made on dcat:Representation.

Absolutely. In a different thread somewhere I already made this point too ... though FRBR has more layers (though arguably by sticking to just four it might not be expressive enough for some cases). However, I don't think FRBR is that widely known outside of library circles, and even there is somewhat controversial (@kcoyle wrote a book on how is has hindered more than helped!)

dr-shorthair on 31 Aug 2018

👍1

I do get a FRBR-ish vibe from this, something like Resource being FRBR:Expression, but that would require the subC's of Resource to have a link to Representation (hasRepresentation) rather than a direct dcat:Distribution. That's one of the down sides of FRBR as a data structure, its linearity.

Will there be any properties associated with dcat:Representation? It does seem that many of the properties of dcat:Resource would be appropriate. That could imply that there is a need for a class of "everything" with properties appropriate to both. But that would disrupt the current dcat:Resource.

kcoyle on 31 Aug 2018

@kcoyle note the proposed subclass and subproperty relations above. The term 'Representation' is taken from Fielding, where it is used for the concrete realization of all resource types. I understood that 'Distribution' is essentially just a special word for a 'Dataset-representation'. The proposed model above says: if the resource is a Dataset, then its representation is a Distribution and the link between them is dcat:distribution. If the resource is an individual from some other sub-class of Resource then its representation is Representation _or a sub-class_, and the link between them is dcat:hasRepresentation _or a sub-property_. I tried to make the terminology consistent with both Fielding and the DCAT-2014 legacy.
Concerning potential properties of dcat:Representation: first we should consider if any of the properties of dcat:Distribution should be promoted to a superclass. We might also look for overlaps with the properties of dcat:Resource, but I'm disinclined to start looking for a higher superclass, unless you can think of a useful scenario where this is needed to support some reasoning (i.e. what's the use case?). owl:Thing will do for me (which is entailed by these all being instances of owl:Class anyway).

dr-shorthair on 2 Sep 2018

Oh, and regarding the 'FRBR-ish vibe' - to me that seemed intrinsic to the original DCAT backbone model, with Dataset and associated Distributions, also consistent with Fielding's 2-layer REST model.

The proposed model above merely responds to the decision to add the generalization Resource by adding a parallel generalization of Distribution, here called Representation.

dr-shorthair on 2 Sep 2018

Sorry, I may have missed part of the discussion, but I wonder whether the idea of re-using ADMS in profiledesc has been dropped altogether. In that scenario, adms:Asset and adms:AssetDistribution could address the issue discussed here, as far as I can understand.

andrea-perego on 2 Sep 2018

I am a little nervous about these 'coathanger' classes, dcat:Resource and dcat:Representation. These are things that people aren't supposed to use, right? They are just for convenience in the specification, so that we don't have to repeat properties that are common for the subclasses. However, it would be perfectly valid -- although we could include notes to discourage such use -- for an implementation to describe everything in the catalogue as dcat:Resource with dcat:Representation. Aren't we not 'dumbing down' DCAT this way?

makxdekkers on 2 Sep 2018

@andrea-perego I also suggested to look at ADMS as a starting point for profile descriptions at https://github.com/w3c/dxwg/issues/279#issuecomment-417358855.

makxdekkers on 2 Sep 2018

@makxdekkers yes I understand and to a certain extent agree that there is a risk. However, I'd make a slightly different emphasis concerning the 'coathanger classes'.

The point is that there really is a common pattern here: there are (1) conceptual things, and (2) concrete realizations of them, with often more than one realization of each conceptual thing. In the REST literature they are called Resource and Representation. DCAT-2014 considered a _subset_ of them and called them Dataset and Distribution. This is all good, as the different terminology between DCAT-2014 and REST reflects the more limited scope of DCAT-2014.

Now in DCAT-rev we have expanded the scope, and we have agreed that new things - in particular _services_ - are not _datasets_, but some other kind of resource. The proposed model respects all of these precedents, preserving the terminology from each in a consistent way, so it should not surprise anyone, while still being backward compatible with DCAT-2014.

I've been involved in a number of initiatives that set up an abstract framework (GML, O&M), and have observed that implementers are incredibly reluctant and non-creative in actually extending the abstract- or stub-classes in the way that we had intended. So your prediction that implementers might just use the 'abstract' classes as-is is based in experience. Is this enough reason to shy away from a clean model? In my opinion, no - for one thing, we would have to replace it with something else, which will likely be messier!

In previous discussion you have advocated making sure the documentation is right, and also supplementing it with good pedagogical material. I think we should do that here.

dr-shorthair on 3 Sep 2018

I'm with Makx about the "abstract" classes, but even more so because I don't see the need for dcat:Representation unless it will be the domain of something in the DCAT RDF. I also am a bit concerned about the alignment of properties with classes in the diagram when those properties in the .ttl file do not have that class as a domain. This means that a property like dcat:contactPoint can actually be used without regard to any class adherence because it has no domain (unless I missed it, sorry). I can understand organizing a UML with these structural concepts as a form of documentation, but only if we agree that this doesn't reflect the graph structure. To manage the structure in this diagram you of course need a validation language like SHACL or ShEx -- or an application profle.

For this reason I conclude that dcat:Representation is not needed. However, I defer to @azaroth42 who did a brilliant analysis of the necessary and less necessary classes of an RDF/OWL ontology in the museum world, and managed to reduce the ontology to its necessary parts. RobS?

kcoyle on 3 Sep 2018

The 'UML' representation was prepared in response to your suggestion @kcoyle - but it is just a convenient graphical rendering and is non-normative (as is the rest of clause 5).

Regarding the alignment of properties to classes, DCAT is axiomatized using a mixed model. Some but not all of the DCAT properties have RDFS domains and ranges. However, DCAT makes extensive use of Dublin Core, and there are no axioms in DC's RDF formalization to tie any of those to DCAT classes. We discussed this and agreed that the statements in the Recommendation document are definitive, where we say "The following properties are recommended for use on this class: ..." so I just reflected these up into the diagram. I might have missed a few, but was guided initially by this diagram from DCAT-2014.

dr-shorthair on 3 Sep 2018

Concerning the possible properties of the Representation class - the diagrams above are just to seed the conversation about the major backbone relationships and potential subsumption relations. Probably I should not have shown any properties in any classes, as that was not the point at this stage of discussion. As mentioned above - "first we should consider if any of the properties of dcat:Distribution should be promoted to a superclass"

dr-shorthair on 3 Sep 2018

@kcoyle "...a property like dcat:contactPoint can actually be used without regard to any class adherence because it has no domain..."

There's been a trent to relax domain constraints, which @dr-shorthair knows all about, in schema.org and SSN ont with domainIncludes or just no domain, rather than specifying a domain.

And, this is the way with all of DC after all! Means we can use DC properties just about anywhere.

nicholascar on 3 Sep 2018

@andrea-perego "I wonder whether the idea of re-using ADMS in profiledesc has been dropped altogether":

Not entirely but there are properties that we are finding useful in profileDesc that are not present in ADMS, such as the core prof:profileOf and the specialised prof:profileOfTransitive & prof:resourceRole. The first and second go where ADMS does not - transitive dependence on other assets (mappable to prov:wasDerivedFrom between adms:Asset instances perhaps?). The third could perhaps be implemented with some sort of role property added to adms:AssetDistribution (there are loads of role properties to choose from, I suppose).

nicholascar on 3 Sep 2018

Note the above is 'shoehorning' of profiling concerns into DCAT/ADMS land which I just warned against in issue/279 since not all the profiling requirements are yet known.

nicholascar on 3 Sep 2018

@nicholascar I agree totally with the relaxation of ranges - my comments here were instead about the depiction of properties in the UML diagrams as properties of a class when in fact they are not properties of the class. There's a gap between the RDF for DCAT and the depiction given here, and I'm wondering if there is any suggestion for how to close that gap, or if it is considered unnecessary.

My own approach is that the gap would be closed by a profile that contains constraints that are not available in RDF.* This is a bit frightening because it means that even DCAT is a profile. In fact, DCAT could be a profile under the DCMI definition.

*(We know that OWL 2 has attempted some of this, but the axiomatic basis for OWL is so complex that I fear there is almost no way to use it "correctly." Can anyone define what an OWL constraint actually constrains? I only know one person who can, and that person is not in this discussion.)

kcoyle on 3 Sep 2018

Regarding the UML-style diagram, @kcoyle would your concerns be allayed with a note in the figure caption to point out that

(a) the figure is non-normative (as is the rest of Clause 5), and

(b) the properties shown on the classes correspond to the (normative) recommendations in Clause 6.

The recommendation to use UML for the diagram was discussed in #186 - I mis-remembered, it was @makxdekkers not @kcoyle .

dr-shorthair on 4 Sep 2018

Profiles are semantic assets IMHO, so a profile of ADMS for Profiles (:-)) makes perfect sense, bringing in the description of profiles from profileDesc. we can make sure profileDesc does not re-specify anything from ADMS, but generally making it align with DCAT should make this easy.

rob-metalinkage on 4 Sep 2018

I guess thie biggest impact of adopting ADMS for cataloguing profiles is the axiomitisation that a profile is a sort of Dataset, hence we would not need to create a superclass of dcat:Distribution, but merely to align with prof:ImplResourceDesc rdfs:subClassOf dcat:Distribution,

This doesnt negate the argument for a "Representation" superclass for clarity, but its a separate issue (i.e it shows profileDesc can complement ADMS and provide an interoperable way of describing key aspects of profiles within an ADMS conformant environment.)

(PS - we can easily tweak ProfileDesc to mode closely and formally align with defined ADMS roles - but prov:wasDerivedFrom is not expressive enough in itself to describe constraint conformance mandated by the relationship, so we need some vocabulary somewhere to do this for us.

rob-metalinkage on 4 Sep 2018

@dr-shorthair I tried to find some more information on UML-OWL - folks have written about it but naturally (:-() the best stuff is behind a paywall. I think that one could offer the UML to readers as a conceptual model. It could also be input to a SHACL or ShEx representation that implements the required constraints.

My concern is that while an OO approach may require an abstract class (Representation) that is not the case in RDF/OWL. So there's a problem of reconciling the UML with RDF/OWL, and it may not be useful to carry over the abstract class from UML to OWL. As Makx said, you cannot enforce that a class is abstract in RDF. You could do so in SHACL or ShEx, but in OWL the class serves no functional purpose that I can see. Somewhere above (here or another thread?) I believe Nick offered that it's fine that Distribution default to being a subclass of owl:Thing. I agree with that.

kcoyle on 4 Sep 2018

Thanks @kcoyle .

I suggest not fretting too much about UML. This is not a UML model. The diagram is merely a rendering of the OWL model using UML-style notation. The UML-style notation has been found to be expressive and familiar enough that it is scattered through many (most?) W3C Recommendations that describe ontologies. However, you do raise important concerns about not getting sucked into the OO paradigm.

Whenever I'm stressed about RDFS/OWL modeling I remind myself that a _set based_ analysis is the correct approach. Venn diagrams would send that message better, but it is hard to show lists of class attributes.

From a set-theoretic point of view the 'abstract' class dcat:Resource exists - it is the union of dcat:Dataset, dcat:DataService plus any other catalogable things that people want to use DCAT for. I think we are all OK with that? There is no notion of 'abstract class' in RDFS or OWL - the best in this context would probably be an anonymous union class, but by giving it a name we provide an extensibility point for other kinds of catalog.

What we are now thinking about is whether there is a corresponding class of dcat:Representation which has any members that are not dcat:Distribution. To resolve this question we should ask whether the Resource corresponding to every notional Representation is a dcat:Dataset. If yes, then every Representation is a dcat:Distribution and we don't need a new class. @rob-metalinkage confirmed that he created this issue because

I dont think a profile is either a Dataset or a Service - so its a direct subclass of Resource, hence the need to have the equivalent of a Distribution

So maybe we should focus on that question first. We can then deal with whether that or anything else requires an (abstract?) class for Representation subsequently.

dr-shorthair on 5 Sep 2018

Returning to this topic, I have a hunch that a dcat:Representation superclass would also help explain the 'bag of files' UC more clearly.

Any _file_ that is _not_ a representation of the dataset itself - such as a _schema_, or _supporting documentation_ - is a Representation but not a Distribution. A taxonomy of representations could exist though it is unlikely that we could actually express it (it would mix format and schema and role).

dr-shorthair on 1 Feb 2019

@dr-shorthair This is contradictory. You write "_Any file that is not a representation of the dataset itself - such as a schema, or supporting documentation - is a Representation but not a Distribution_." So a file that is not a representation [...] is a Representation (?)
I still feel that the approach recommended at https://w3c.github.io/dxwg/dcat/#bag-of-files to use dct:relation for any files that are not proper representations is sensible, and I would be against complicating the model if it is not strictly necessary.

makxdekkers on 1 Feb 2019

Yes. A file that is not a representation of the _dataset_ is a representation of _something else_.

Perhaps I'm reading too much into the class name, but I was assuming that dcat:Distribution was only the class of representations-of-datasets, which would mean that we don't have a class for representations-of-anything-else (such as a schema or documentation).

Does the class dcat:Distribution have any properties that are specific to representations-of-a-dataset? Or is it just a representation-of-a-resource? If it is the latter, then the issue with the current design is only that the class that is the range of dcat:distribution is called dcat:Distribution. It would be better if the class were called dcat:Representation.

dr-shorthair on 3 Feb 2019

@makxdekkers said:

I still feel that the approach recommended at https://w3c.github.io/dxwg/dcat/#bag-of-files to use dct:relation for any files that are not proper representations is sensible, and I would be against complicating the model if it is not strictly necessary.

I agree.

andrea-perego on 3 Feb 2019

Sorry - I'm certainly not proposing a change to this pattern.
I am merely asking if there is a class that is the (implicit or explicit) range of dct:relation in this context - it being the more generalized case.

dr-shorthair on 3 Feb 2019

... and which would allow us to provide a description of the related artefact, without implying that it is a distribution of a dataset. i.e. promote most of the properties of dcat:Description up to a more general dcat:Representation.

dr-shorthair on 3 Feb 2019

Here's an example to illustrate the issue. This is one of my own datasets. This DCAT description is adapted from an earlier example that I used for the bag-of-files use-case.

The first four blank nodes are rdf:type dcat:Distribution because they are informationally-equivalent representations of the actual dataset.

The next three blank nodes are shown un-typed. They are supporting resources, not representations of the dataset. However, it is useful to provide descriptive properties taken from the set that is associated with dcat:Distribution since these are files or representations, but _not_ of the actual dataset.

I guess it is fine from an RDF point of view to leave these nodes un-typed, but maybe they deserve a type? The 'model' is the same as dcat:Distribution, but they are _not_ distributions-of-the-dataset.

dap:d33937
  rdf:type dcat:Dataset ;
  dct:description "A set of RDF graphs representing the International [Chrono]stratigraphic Chart, comprising Turtle serializations of data from the 2017-02 version, ..." ;
  dct:identifier "https://doi.org/10.25919/5b4d2b83cbf2d" ;
  dct:issued "2018-07-07"^^xsd:date ;
  dct:license <https://creativecommons.org/licenses/by/4.0/> ;
  dct:publisher <http://www.csiro.au> ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.jsonld" ;
      dcat:mediaType "application/ld+json" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.nt" ;
      dcat:mediaType "application/n-triples" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.rdf" ;
      dcat:mediaType "application/rdf+xml" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.ttl" ;
      dcat:mediaType "text/turtle" ;
    ] ;
  dct:relation [
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.jpg> ;
      dcat:mediaType  "img/jpeg" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.pdf> ;
      dcat:mediaType "application/pdf" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      dcat:downloadURL <http://resource.geosciml.org/ontology/timescale/gts> ;
      dcat:mediaType "text/turtle" ;
      dct:conformsTo <https://www.w3.org/TR/owl2-overview/> ;
      dct:description "This is an RDF/OWL representation of the GeoSciML Geologic Timescale model, which has been adapted from the model described in Cox, S.J.D, & Richard, S.M. (2005) A formal model for the geologic timescale and GSSP, compatible with geospatial information transfer standards, Geosphere, Geological Society of America 1/3, 119–137." ;
      dct:issued "2011-01-01"^^xsd:date ;
      dct:issued "2017-04-28"^^xsd:date ;
      dct:title "Geologic Timescale model" ;
    ] ;
  dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:33937> ;
.

dr-shorthair on 4 Feb 2019

@dr-shorthair I think you are complicating the issue more than necessary. Your explanation that things that are not representations of the dataset are representations of something else got my head spinning. You are using the notion of 'representation' which is already used in the revision (both the current version and 2PWD) and suggested as a synonym of distribution - but I think that term is confusing. For example the definition of dcat:Distribution now states: "_represents an accessible form or representation of a dataset ..._" so it "represents ... a representation". Not helpful. I think a dcat:Distribution is just "_represents an accessible form of a dataset ..._".
Is there really a need to make it more complicated by creating a new class dcat:Representation?
In your example, the two images can be modelled as foaf:Document with dct:type Image, and the third one could be modelled as an adms:Asset with dct:type http://purl.org/adms/assettype/DomainModel and one or more adms:AssetDistributions.

makxdekkers on 4 Feb 2019

I agree that the distribution should represent an accessible form of a dataset; as such shouldn't the distributions have either accessURL or downloadURL
As far as the related resources (the dct:relation items) I think @makxdekkers typing suggestion is a good solution-- the relation is to a resource that has a type, and that resource has its own distributions (which I guess can be represented by adms:AssetDistribution). The only thing I think would be useful to add is a way to specify what the relation is about (documentation, guidance, examples, critique, annotation...).

smrgeoinfo on 4 Feb 2019

Thanks @makxdekkers - I think you are suggesting that there are more than enough classes available in existing vocabularies to enable suitable typing of the target of dct:relation properties. That is true, though it does require users to be familiar with the options. I guess I was trying to gather them together in the scope of DCAT but I'll stop now.

Regarding this phrasing

the definition of dcat:Distribution now states: "represents an accessible form or representation of a dataset ..." so it "represents ... a representation".

Yes, that is unfortunate. The noun 'representation' in this sentence is a nod to Fielding's 'REST' principle, where it contributes the 'R'. You will also recall that an earlier iteration of the definition avoided the repetition as it read "_describes_ an accessible form or representation of a dataset ...", but then either you or @kcoyle requested the change! But probably not necessary to introduce this in the overview - it is clarified in the normative clause anyway.

dr-shorthair on 4 Feb 2019

So this more complete example could be added to Appendix D?

dap:d33937
  rdf:type dcat:Dataset ;
  dct:description "A set of RDF graphs representing the International [Chrono]stratigraphic Chart, ..." ;
  dct:conformsTo <http://resource.geosciml.org/ontology/timescale/gts> ;
  dct:identifier "https://doi.org/10.25919/5b4d2b83cbf2d" ;
  dct:issued "2018-07-07"^^xsd:date ;
  dct:license <https://creativecommons.org/licenses/by/4.0/> ;
  dct:publisher <http://www.csiro.au> ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.jsonld" ;
      dcat:mediaType "application/ld+json" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.nt" ;
      dcat:mediaType "application/n-triples" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.rdf" ;
      dcat:mediaType "application/rdf+xml" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.ttl" ;
      dcat:mediaType "text/turtle" ;
    ] ;
  dct:relation [
      rdf:type foaf:Document ;
      dct:type <http://purl.org/dc/dcmitype/Image> ;
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.jpg> ;
      dcat:mediaType  "img/jpeg" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      rdf:type foaf:Document ;
      dct:type <http://purl.org/dc/dcmitype/Image> ;
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.pdf> ;
      dcat:mediaType "application/pdf" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      rdf:type adms:Asset ;
      dct:type <http://purl.org/adms/assettype/DomainModel> ;
      dcat:downloadURL <http://resource.geosciml.org/ontology/timescale/gts> ;
      dcat:landingPage <http://resource.geosciml.org/ontology/timescale/gts> ;
      dcat:mediaType "text/turtle" ;
      dct:conformsTo <https://www.w3.org/TR/owl2-overview/> ;
      dct:description "This is an RDF/OWL representation of the GeoSciML Geologic Timescale model ..." ;
      dct:issued "2011-01-01"^^xsd:date ;
      dct:issued "2017-04-28"^^xsd:date ;
      dct:title "Geologic Timescale model" ;
    ] ;
  dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:33937> ;
.

dr-shorthair on 4 Feb 2019

@rob-metalinkage I think we moved away from the concern that originally motivated you to create this issue. But I think the sense of the DCAT team is that your suggestions would be best dealt with outside the core DCAT vocabulary, e.g. in a profile of DCAT for profiles, if you like.

dr-shorthair on 4 Feb 2019

One solution to the concern about saying that a distro "represents ... a representation" would be to remove the first use of "representation" in the definition. The word "represents" at the beginning of nearly all the definitions has bothered me, too. A dcat:Distribution doesn't represent a distribution, it is a distribution.
A dcat:Catalog is a dataset in which each individual item...
A dcat:Resource is an individual item in a catalog...
A dcat:Dataset is a dataset in a catalog...
A dcat:Distribution is an accessible form or representation of a dataset...

agreiner on 4 Feb 2019

I don't want to unduly broaden this issue, but the discussion on dcat:distribution / dct:relation brings to my mind a couple of considerations:

Do we have already UCs and/or implementation evidence on the use of specific subproperties of dct:relation which can be used to model some specific resource types? In such a case, should be provide guidance on when to use them? E.g., in the requirements for data to be documented in the JRC Data Catalogue we identified 3 resource types that should be supported: distributions (dcat:distribution), publications about a dataset (dct:isReferencedBy), and those not fitting in either of the previous categories (dct:relation).
@dr-shorthair 's example includes documents that could be seen as dataset "visualisations". Should they be considered or not as distributions? Or, at least, should this be something to be left to the decision of the data provider?

I'll create separate issues for these 2 points.

andrea-perego on 4 Feb 2019

@andrea-perego about this:

1. Do we have already UCs and/or implementation evidence on the use of specific subproperties of `dct:relation` which can be used to model some specific resource types? In such a case, should be provide guidance on when to use them? E.g., in the requirements for data to be documented in the JRC Data Catalogue we identified 3 resource types that should be supported: distributions (`dcat:distribution`), publications about a dataset (`dct:isReferencedBy`), and those not fitting in either of the previous categories (`dct:relation`).

We have an issue about linking datasets and publications https://github.com/w3c/dxwg/issues/63, and we are using dct:relation for things that don't fit distributions or publications (as per https://w3c.github.io/dxwg/dcat/#bag-of-files). So I don't think we need a new issue for this?

agbeltran on 5 Feb 2019

2. @dr-shorthair 's example includes documents that could be seen as dataset "visualisations". Should they be considered or not as distributions? Or, at least, should this be something to be left to the decision of the data provider?

I do agree that this decision should be left to the data provider, but I believe @dr-shorthair was pointing out a way to show what are distributions (as informationally equivalent representations of a dataset) vs other files that are about the dataset but are not a dataset distribution. IMO, it would be to the data provider to decide if a 'visualisation' of the dataset is an informationally equivalent representation of the dataset with respect to the other available representations.

So, I think this issue should be analysed in conjunction with https://github.com/w3c/dxwg/issues/411 and possibly also #433 and #531. I created a project to see if it help us addressing all these issues simultaneously: https://github.com/w3c/dxwg/projects/8

agbeltran on 5 Feb 2019

@andrea-perego wrote:

2. @dr-shorthair 's example includes documents that could be seen as dataset "visualisations". Should they be considered or not as distributions? Or, at least, should this be something to be left to the decision of the data provider?

They might be visualizations of the underlying data, but they are definitely not visualizations of "A set of RDF graphs representing ..."

dr-shorthair on 6 Feb 2019

ironically, the Use Case that motivated this was the one @makxdekkers referred to where Profiles where cataloged, and resources describing different aspects them were modelled as Distributions. So OK to say that example is "wrong" (i.e. inconsistent with DCAT semantics)

the Profiles Ontology is not dependent on DCAT so its not affected. DCAT simply has no reusable way of attaching resources to dcat:Resources in general - only a specific sense is supported for dataset distributions and people will have to "roll their own" using other vocabularies or just keep using DCAT in catalogues in looser interpretations than the DCAT spec.

rob-metalinkage on 6 Feb 2019

DCAT simply has no reusable way of attaching resources to dcat:Resources in general

That's not correct. Use dct:relation - see https://w3c.github.io/dxwg/dcat/#Property:resource_relation

dr-shorthair on 6 Feb 2019

"dct:relation SHOULD be used where the nature of the relationship between a catalogued item and related resources is not known. A more specific sub-property SHOULD be used if the nature of the relationship of the link is known. The property dcat:distribution SHOULD be used to link from a dcat:Dataset to a representation of the dataset, described as a dcat:Distribution"

"in general" includes the important case when the relation type is known, so "a more specific sub-property" is called for, but not defined by DCAT. I guess this is just a qualified relationship pattern.

rob-metalinkage on 6 Feb 2019

All the immediate sub-properties in the dcat: and dct: namespaces are listed right there. And immediately following is the description of the qualified-relation pattern. There might be a tweak needed, but otherwise I suggest that all the ingredients that could reasonably be provided by DCAT are already there. Maybe modify

A resource with an unspecified relationship to the catalogued item.

→
"A resource with a general or unspecified relationship to the catalogued item."

Maybe also list sub-properties prov:wasGeneratedBy, prov:wasDerivedFrom, prov:wasAttributedTo.

dr-shorthair on 6 Feb 2019

On 'visualisations', there is a vocabulary at the Publications Office of the EU for "Distribution type": https://publications.europa.eu/en/web/eu-vocabularies/at-concept-scheme/-/resource/authority/distribution-type/. This controlled vocabulary was created for DCAT-2014 where several type of non-file distributions were allowed. Using dct:type on dcat:Distribution one can indicate that the distribution is a visualisation of the dataset. This is recommended in the specification of StatDCAT-AP (https://joinup.ec.europa.eu/release/statdcat-ap-v100 section 7.2.3).

makxdekkers on 6 Feb 2019

The key phrase there is

one can indicate that the distribution is a visualisation of the dataset.

@dr-shorthair's reaction above shows that he doesn't see those pictures as visualisation of the dataset (though they probably are of the underlying information). Other publishers of this might make a different choice (and would probably use different language to describe the dataset as well) I think we decided that the right positioning for the DCAT vocab (as opposed to any design guidance/training) was to stay silent on that choice.

davebrowning on 6 Feb 2019

(Noting this has been closed at least once before, though there has been quite a bit of discussion in #482 that touches on who makes the choice of what can be a distribution or just some other 'thing'. That other issue drove some significant clarification of the concepts also of where DCAT was deliberately remaining silent)

Suggest we either close this or mark it as future work. Views, particularly from @rob-metalinkage, @dr-shorthair (or indeed anyone with a strong view)

[In the absence of a countervailing view my instinct is to close it as the spec has moved on and discussion of the issue covered quite a bit of ground that has since been modified]

davebrowning on 23 Sep 2019

👍1

While I stand by the model sketched above in principle, there is probably an overriding principle here: that if an abstract superclass only has one concrete sub-class, then it is probably redundant and confusing. I understand that it might be convenient for the Profiles vocabulary to be tied into a generalized Resource --> Representation pattern, that requirement is external to DCAT so I don't think we have an obligation to meet it here.

dr-shorthair on 23 Sep 2019

👍1

Dxwg: Change domain or create superclass of dcat:Distribution

Most helpful comment

All 62 comments

Related issues