Dxwg: Service vs endpoint

Created on 21 Jun 2020 · 7Comments · Source: w3c/dxwg

I propose we consider re-visiting the service-endpoint relationship, based on existing examples where such relationship is 1-to-many.

DCAT 2 includes two properties - dcat:endpointURL and dcat:endpointDescription - for specifying a service endpoint, plus property dct:conformsTo, which is used to specify its "protocol". As these properties take as subject a dcat:DataService, the service-endpoint relationship is 1-to-1.

There are however cases where services have more than one endpoint.

As an example, see the following record:

https://sdi.eea.europa.eu/catalogue/idp/api/records/29e08b66-e6f6-4b4f-95ad-b582d9fe3df5

The record describes a geospatial "view" service (i.e., a service portraying data on a map) with 2 endpoints, both serving the same dataset, but using different protocols (WMS and ArcGIS REST), and with different endpoint descriptions and URLs. Transformed into DCAT, this record will then be as follows:

turtle :eea_v_4326_250_k_wise-eionet-monitoring-sites_service a dcat:DataService ; ... dcat:endpointDescription <https://water.discomap.eea.europa.eu/arcgis/rest/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer> ; dcat:endpointURL <https://water.discomap.eea.europa.eu/arcgis/rest/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer> ; dct:conformsTo <https://developers.arcgis.com/rest/> ; dcat:endpointDescription <https://water.discomap.eea.europa.eu/arcgis/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer/WMSServer?request=GetCapabilities&service=WMS> ; dcat:endpointURL <https://water.discomap.eea.europa.eu/arcgis/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer/WMSServer> ; dct:conformsTo <http://www.opengeospatial.org/standards/wms> ; dcat:servesDataset :eea_v_4326_250_k_wise-eionet_p_2001-2020_v01_r04 ; ... .

This case is outlined in DCAT 2 in Example 49, where the solution is to duplicate the service record, and changing only the endpoint protocol, description, and URL. So, the record above should result in two different ones:

````turtle
:eea-rest a dcat:DataService ;
...
dcat:endpointDescription https://water.discomap.eea.europa.eu/arcgis/rest/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer ;
dcat:endpointURL https://water.discomap.eea.europa.eu/arcgis/rest/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer ;
dct:conformsTo https://developers.arcgis.com/rest/ ;
dcat:servesDataset :eea_v_4326_250_k_wise-eionet_p_2001-2020_v01_r04 ;
...
.

:eea-wms a dcat:DataService ;
...
dcat:endpointDescription https://water.discomap.eea.europa.eu/arcgis/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer/WMSServer?request=GetCapabilities&service=WMS ;
dcat:endpointURL https://water.discomap.eea.europa.eu/arcgis/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer/WMSServer ;
dct:conformsTo http://www.opengeospatial.org/standards/wms ;
dcat:servesDataset :eea_v_4326_250_k_wise-eionet_p_2001-2020_v01_r04 ;
...
.
````

I think this approach should be complemented with the possibility of keeping instead a single service with 2 endpoints, as in the original record. It would be up to data providers to decide which one would suit them best.

A possible solution is to define a new property (e.g., dcat:endpoint), which specifies the endpoint (possibly typed itself as a dcat:DataService), along with the endpoint description, URL and protocol.

The example above would then be re-written as follows:

turtle :eea_v_4326_250_k_wise-eionet-monitoring-sites_service a dcat:DataService ; ... dcat:endpoint [ a dcat:DataService ; dcat:endpointDescription <https://water.discomap.eea.europa.eu/arcgis/rest/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer> ; dcat:endpointURL <https://water.discomap.eea.europa.eu/arcgis/rest/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer> ; dct:conformsTo <https://developers.arcgis.com/rest/> ; ] ; dcat:endpoint [ a dcat:DataService ; dcat:endpointDescription <https://water.discomap.eea.europa.eu/arcgis/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer/WMSServer?request=GetCapabilities&service=WMS> ; dcat:endpointURL <https://water.discomap.eea.europa.eu/arcgis/services/WISE_SoE/EIONET_MonitoringSite_WM/MapServer/WMSServer> ; dct:conformsTo <http://www.opengeospatial.org/standards/wms> ; ] ; dcat:servesDataset :eea_v_4326_250_k_wise-eionet_p_2001-2020_v01_r04 ; ... .

dcat DataService requires discussion

Source

andrea-perego

👍2

Most helpful comment

@jakubklimek , @smrgeoinfo , thanks for your feedback, and sorry for my late reply.

This issue is yet to be discussed by DXWG, so no position has been taken for the moment. However, IMO, you rightly pointed out the key issue here, namely "what identifies a service".

I perfectly agree that, from a data-centred perspective, a service may correspond to an endpoint, using a given protocol, etc. This is actually what has been implemented in DCAT 2 for distributions accessible via a service/API. However, in DCAT 2, services/APIs have been also introduced as first-class citizens, and, as such, their existence may not be necessarily bound to the data they serve, and therefore it is arguable whether a service actually corresponds to an endpoint.

My example was related to what happens in the geospatial domain, where services are fist-class citizens of a catalogue. There, as you know, a service is not identified by the endpoint protocol, but rather by a conceptual definition of its "type" (download, view, transformation, invoke service), which can be implemented by using different protocols.

This may not be considered now the "right" way of doing it - there's a lot of discussion about the disadvantages of the service-centred approach used in the geo domain. But, as a matter of fact, there's a wide community following this approach, and producing metadata which are also made available using DCAT - in addition to their native ISO 19115 / ISO 19119 / ISO 19139 representation.

As you say, @jakubklimek , having two different ways of describing the same thing does not help usability. This is definitely something that the DXWG will take into account about this issue. However, this needs to be carefully weighed against the fact that the way services are currently represented in DCAT may prevent / limit its use in specific communities.

andrea-perego on 5 Jul 2020

👍3 👀1

All 7 comments

@andrea-perego What is the benefit of describing this case (and similar cases) as one service with multiple endpoints as opposed to multiple services?

geospatial "view" service (i.e., a service portraying data on a map) with 2 endpoints, both serving the same dataset, but using different protocols (WMS and ArcGIS REST)

Why is this one service and not two services? What makes the identity of a service?

Creating two ways of describing this makes it again harder for consumers of such metadata to work with it - more possibilities = harder to work with. There needs to be a good benefit to this new approach to justify the increased complexity.

jakubklimek on 21 Jun 2020

👍2 👀1

I have been approaching this from the point of view of someone looking first for data, and then for how to access the data. From this point of view, the DataSet is the primary resource of interest, and different services to access the data would be different distributions. This is similar to Example 49, but I would add distribution elements in the DataSet object that point to the DataService objects via the accessService property. The question of what identifies a service is key. I'd argue that from the point of view of an application parsing the metadata for a dataset to determine how to get the data in a format it can use, the service is defined by 1. the protocol for communicating (transport, request syntax and semantics) 2. information model for the content 3. the serialization scheme(s) for the data in service responses (xml, xml schema, JSON, JSON schema, rdf, rdf vocabulary).
[edit 2020-07-05]-- I left off an important one-- 4. the operations that the service offers.

smrgeoinfo on 22 Jun 2020

@jakubklimek , @smrgeoinfo , thanks for your feedback, and sorry for my late reply.

This issue is yet to be discussed by DXWG, so no position has been taken for the moment. However, IMO, you rightly pointed out the key issue here, namely "what identifies a service".

andrea-perego on 5 Jul 2020

👍3 👀1

I'd argue that the only services that makes sense to include as 'first class citizens' (geospatial or not) are processing services. I don't see the logic of cataloging a data service (view, download) separately from the data it serves.
What is an endpoint? Suggestion: an endpoint is a web location (identified by a base URL) at which one can access a particular service. That service might offer various processing and data access options, but in the end there should be a service specification that is implemented at that endpoint that accounts for the 4 aspects mentioned above. An endpoint is a particular implementation of a service, possibly with bindings to particular data. I'd suggest that 'service' be thought of as a specification that might have various software implementations, and each of those implementations might be exposed by various endpoints, and those endpoints might have binding to different data.
Service: protocol, information model, interchange format(s) used, operations offered
Endpoint: URL, service specification, coupled data (optional)

smrgeoinfo on 5 Jul 2020

👀1

I'd argue that the only services that makes sense to include as 'first class citizens' (geospatial or not) are processing services.

This was discussed at some length during the development of DCAT2. The key points are:

The simplest download service - which just gets a file from a file-system - might not merit a service description much beyond 'conforms to HTTP v1.1'. However, there is a rich spectrum beyond that. At the very least, even if the data is coming from a static datastore, there is usually a query mechanism to select an extract from the whole, both in terms of which records are retrieved, and which properties (columns). And a data service usually provides at least a method to project the result according to some 'schema' described in the request. Then there may be re-sampling, or coordinate transformation, or other processing as well.

The client needs a way to find out about these options - selection/query operations, parameter ranges, response schema and format options - so there needs to be a service description somewhere (it may be implied by the standard that it conforms to).

Some services are tightly bound to a single datastore, some not. But even a processing service that can connect to multiple data sources is also initialized with (i.e. bound to) some 'data' - for example coordinate transformation parameters, or coefficients used in some other numeric.

So I don't think it is so clear where the boundary between 'download' and 'processing' is.

Every service we are interested in in the DCAT context delivers 'data'. Sometimes this is retrieved from a static(-ish) store. Sometimes generated on-the-fly somehow. But the client machine doesn't know and usually doesn't care what happens behind the interface. That's why, after an initial discussion about a small taxonomy of service-types, we decided to just have a single class dcat:DataService.

dr-shorthair on 6 Jul 2020

👀1

@dr-shorthair I think we're in agreement.

smrgeoinfo on 6 Jul 2020

Also note this W3C note which proposes extensions to schema.org aligned with the DCAT2 model for DataService description - https://webapi-discovery.github.io/rfcs/rfc0001.html

dr-shorthair on 8 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

dcat:compressFormat and dcat:packageFormat description inconsistency

jakubklimek · 6Comments

Documenting changes to DCAT classes / properties

andrea-perego · 3Comments

Clarify if DCAT's use of ProfileDesc

nicholascar · 5Comments

There is a need to distinguish between distributions that package the entire dataset and those that support access to specific items, queries, and packaged downloads of data. [ID51] (5.51)

nicholascar · 6Comments

CONNEG: Reference to WFS3

andrea-perego · 6Comments