Dxwg: Digest for DCAT distributions

Created on 6 Jan 2021  路  16Comments  路  Source: w3c/dxwg

A digest of the file may be useful for downloadable dataset distributions, in order to verify the authenticity of the downloaded file and to verify that the dataset has not been updated after that the digest has been created (the digest should be updated just on the last update time)

dcat Distribution due for closing feedback requirement

Most helpful comment

The relevant updates have been merged into the ED via PR https://github.com/w3c/dxwg/pull/1323

Unless there are any objections, I propose we close this issue.

All 16 comments

@cristianolongo , I wonder whether the use case you are proposing is one of those addressed in DCAT-AP and its extensions by using spdx:checksum. Or you have additional / different requirements?

thanks @andrea-perego , yes spdx:checksum do the case. However I can't find the specification of DCAT-AP

thanks @andrea-perego , yes spdx:checksum do the case. However I can't find the specification of DCAT-AP

https://github.com/SEMICeu/DCAT-AP

great, thanks

Hi @cristianolongo @andrea-perego - this seems to me a quite generic use case that might be good to consider in DCAT too.

of course, it is relevant for all downloadable datasets, and may be also for datasets provided via a SPARQL endpoint. Should I reopen the issue?

@agbeltran said:

Hi @cristianolongo @andrea-perego - this seems to me a quite generic use case that might be good to consider in DCAT too.

Agreed.

For our records:

DCAT-AP 2.0.1 describes the purpose of spdx:checksum as follows:

This property provides a mechanism that can be used to verify that the contents of a Distribution have not changed. The checksum is related to the download URL.

An example of its use:

https://www.europeandataportal.eu/sparql?default-graph-uri=&query=describe+%3CnodeID%3A%2F%2Fb621486445%3E&format=text%2Fturtle

````turtle
@prefix spdx: http://spdx.org/rdf/terms# .
@prefix xsd: http://www.w3.org/2001/XMLSchema# .

https://europeandataportal.eu/set/distribution/2f3d36a4-79de-4cfb-85d7-706519be7b25 spdx:checksum _:b621486445 .

_:b621486445 a spdx:Checksum ;
spdx:algorithm spdx:checksumAlgorithm_sha1 ;
spdx:checksumValue "71bf58e542a47d7092ed1924f34db91bb24fe2c2"^^xsd:hexBinary .
````

of course, it is relevant for all downloadable datasets, and may be also for datasets provided via a SPARQL endpoint. Should I reopen the issue?

@cristianolongo: we discussed the adoption of spdx:checksum in the last DCAT meeting (see meeting minutes).
I wonder if you could specify the mentioned use case for "datasets provided via a SPARQL endpoint" more, and whether you have any spdx example solution when it comes to SPARQL- distributed datasets.

Yes, I dump my dataset via a construct query like

construct {?x ?y ?z} where {?x ?y ?z}

With appropriate ordering clauses, the output should be predictable (depending on the knowledge base content of course).

Other cases may be dataset exposed via a REST API which returns json-ld.

However, I'm not fully convinced that these examples are in the scope of DCAT.

For what it's worth, I have used checksums for datasets in a web app, though not with SPARQL. The use case was differentiating datasets of electronic potentials for use in quantum mechanical calculations.

Agree to the need for a "Checksum" (algorithm + value) for dataset integrity (like spdx).

I've created a draft PR to integrate the relevant SPDX class and properties: https://github.com/w3c/dxwg/pull/1323

A preview of the newly added sections:

I've included a couple of EDNOTEs about additional issues to be discussed.

Please review.

I would update the definition to be _"The Checksum includes the algorithm and value that allows the integrity of a file to be verified to ensure no errors were detected in transmission or storage."_

@riannella said:

I would update the definition to be _"The Checksum includes the algorithm and value that allows the integrity of a file to be verified to ensure no errors were detected in transmission or storage."_

I've added it as a usage note to spdx:Checksum - see https://raw.githack.com/w3c/dxwg/dcat-distribution-digest/dcat/index.html#Class:Checksum

The relevant updates have been merged into the ED via PR https://github.com/w3c/dxwg/pull/1323

Unless there are any objections, I propose we close this issue.

We are closing this issue as proposed above and as a result of tonight's DCAT subgroup meeting

Was this page helpful?
0 / 5 - 0 ratings

Related issues

riccardoAlbertoni picture riccardoAlbertoni  路  4Comments

agbeltran picture agbeltran  路  7Comments

agbeltran picture agbeltran  路  5Comments

davebrowning picture davebrowning  路  7Comments

jakubklimek picture jakubklimek  路  6Comments