Dxwg: Confusion between major classes as newly defined in DCAT

Created on 3 Oct 2018  ·  12Comments  ·  Source: w3c/dxwg

From the quick overview definitions, it was not clear to me what the difference is between a resource and a catalogRecord. I think it would help to explain up front that a resource is being used as a parent class. At this point, it's not clear to me that it should be the parent of a dataset rather than a distribution. One reason is that CKAN has a different meaning for the term "resource" that is often used as a distribution. But I also just think that the term "resource" suggests a less abstract thing than a dataset. To me this hinges on the question of whether distributions vary in ways other than format.
If we find that a distribution is a resource, I think we would make a small change to the overview diagram and move the connector between dataset and resource to connect distribution and resource. I would also change the arrow from DataDistributionResource to point to the Distribution rather than the Dataset.

dcat

All 12 comments

The use of the term 'resource' is always confusing if you only look at the term itself.
In DCAT, it are things published or curated by a single agent. (Note that the definition of https://w3c.github.io/dxwg/dcat/#Class:Resource defines 'resource' as a 'resource' -- so probably has to be made more explicit, like "Parent class for Datasets, DataServices and any other things described in a Catalog)
In CKAN, “resources” are said to "hold the data itself", so there it is used as a superclass for anything that can be associated to a dataset.
In RDF, resources are "All things described by RDF".
As I argued before, it is essential to consider the definition of a term in a particular context and not what someone would think that a terms means.

I agree with Makx's remark

it is essential to consider the definition of a term in a particular context and not what someone would think that a term means.

and I strongly believe that users must read definitions and shouldn't solely rely on the class name.

However, I have also had the impression the dcat:resource is a quite confusing name. As a matter of fact, in the document, the paragraph related to dcat:Resource is titled as "Catalogued Resource"

I suspect we'd better use dcat:CataloguedResource instead of just dcat:Resource. That would clarify that the class indicates anything which can be catalogued and not any kind of thing as in the case of rdfs:Resource.

@agreiner Would this make a little bit clearer what we mean with resource in DCAT? At least it would explain why dcat:CataloguedResource does't include dcat:Distribution as subclass?

@riccardoAlbertoni I don't feel that the name change solves it, though I think moving away from an already overloaded term where possible is good. My concern is that I think that a different distribution would deserve to have its own catalog entry, certainly its own URL, so that people know the various options exist.

The name of the class was discussed at the f2f in Genova. The conclusion there was that the dcat: namespace was sufficient to indicate the scope of the class. Note that the rdfs:label of the class in the RDF representation (line 352 today) is "Catalogued resource" (the namespace is not visible there to give the hint) and this is what is used in the document. I think that is all consistent.

While class name should be informative (when they are not deliberately opaque) they do not substitute for the definition. And there is no requirement for the class name to match the label.

From the quick overview definitions, it was not clear to me what the difference is between a >resource and a catalogRecord.

The difference is described in the CatalogRecord usage note (see https://w3c.github.io/dxwg/dcat/#Class:Catalog_Record): "It exists for catalogs where a distinction is made between metadata about a dataset or service and metadata about the entry in the catalog about the dataset or service. "

I would suggest to change to definition from
"A record in a catalog, describing the registration of a single dataset or data service."
to
"A record in a catalog, describing the registration of a single dcat:Resource."

I think it would help to explain up front that a resource is being used as a parent class. At this >point, it's not clear to me that it should be the parent of a dataset rather than a distribution.

This is explained in Resource usage note (see https://w3c.github.io/dxwg/dcat/#Class:Resource)

I think it would help to explain up front that a resource is being used as a parent class.

To address this, I added more details in the Vocabulary Overview section when introducing dcat:Resource. See here:

https://rawgit.com/w3c/dxwg/dcat-issue431/dcat/index.html#vocabulary-overview

If we find that a distribution is a resource, I think we would make a small change to the overview >diagram and move the connector between dataset and resource to connect distribution and >resource.

We are not considering distributions as catalogued resources, as these are associated with datasets, which are the resources that can be included in a catalogue.

@agreiner: given the previous changes, and the latest one in this PR: https://github.com/w3c/dxwg/pull/612

do you agree that this issue can be closed?

The addition of the sentence about Resource being a parent class addresses my concern here. I do feel that the group still needs to reach agreement on how much distributions can differ from each other. If they are informationally equivalent, then I agree that cataloging at the level of datasets still makes sense, but if they can differ, then it seems to me that we would want to catalog at the level of distributions. If distributions are informationally equivalent, then profiles cannot be used to generate different distributions of the same dataset. So we are left needing a term for the variants of a dataset that conform to different profiles, or else saying that profiles generate different datasets (but what are the things form which they generate them?). What we have in the doc now is at least a helpful clarification. I suppose we could close this particular issue and handle the larger question of whether distributions are informationally equivalent elsewhere.

Thanks @agreiner - then, we will merge the PR (https://github.com/w3c/dxwg/pull/612) and close this issue., as the discussion about distributions is happening in a few other issues (e.g. #433 #531 #411).

In my experience working with distributions in ISO metadata, the distribution section is used to provide various 'affordances' for accessing representations of the data-- e.g. links to dataset landing pages, links that directly download (possibly varying) file representations, links to services providing API access (data distribution services in the DCAT revision), links to ftp directories containing collections of files that may represent parts or versions of the dataset.

Assuming dcat:distribution has a similar intention to the ISO19115 MD_Distribution, I'm guessing it has come up before in discussion that perhaps distribution should be defined something like 'Options for accessing representations of the dataset'. I would hope that the dcat:conformsTo property on the distribution would let me know what kind of distribution its is

@smrgeoinfo according to your description, it seems to me that ISO19115 MD_Distribution does not have a similar intention to dcat:Distribution. As you mentioned, MD_Distribution is about accessing the dataset, while dcat:Distribution is about the dataset representation.

See the usage note here: https://w3c.github.io/dxwg/dcat/#Class:Distribution

"[dcat:Distribution] represents a general availability of a dataset. It implies no information about the actual access method of the data, i.e. whether by direct download, API, or through a Web page. The use of dcat:downloadURL property indicates directly downloadable distributions."

doesn't 'general availability' have more to do with access than representation?
working from Merriam-Webster definitions, 'availability' would appear to mean "the quality or state of being present or ready for immediate use". It seems to me that the interpretation of the distribution as representing an option for accessing (describes how it is available) one or more representations is completely consistent with #Class:Distribution text.
Separating the concept of 'how to get a representation' from 'what is the representation' would avoid questions about whether the representations should have separate catalog items -- it puts the ball in the metadata creator's court to decide when the representation represents a different resource.

This issue about clarifying the definitions of the new classes in DCAT was addressed in two PRs: #497 and #612 and can be closed. The discussion about distributions should be moved to the relevant issues, including the previous comment by @smrgeoinfo.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jpullmann picture jpullmann  ·  7Comments

agbeltran picture agbeltran  ·  7Comments

chris-little picture chris-little  ·  5Comments

riccardoAlbertoni picture riccardoAlbertoni  ·  4Comments

lvdbrink picture lvdbrink  ·  6Comments