Dxwg: Indicate a conventional way to automatically validate data instances of application profiles

Created on 26 Jan 2019  路  29Comments  路  Source: w3c/dxwg

Feedback from @paulwalk:

we don't have a conventional way to automatically validate data instances of application profiles (i.e. data which allegedly conforms to the constraints of a given application profile)

Original email from Paul

due for closing feedback profiles-vocabulary

Most helpful comment

This issue has gone a bit wild... but if we can all agree that:

  • the "convention for finding how to validate data against a profile" would be that consumers of PROF profile metadata would seek to access the Resources with role "validation" and try to make sense of them
  • that making sense of them cannot be specified further because there too many validation models/languages that could be used (btw for me a "validation resource" could be actually a set of human-targeted instructions...)

Then I think I'd be happy with that :-)

There could be some language in the spec about how users of a profile could look at the validation resources attached to related/"ancestor" profiles (i.e. in a prof:isProfileOf relationship), if this could bring them some useful validation resources. But we should also be very cautious here. I.e., there can be models/languages (say, OWL) that allow to re-use specifications across different levels of specifications built on top of each other. But then the re-use of the validation resources could be specified at the level of the resources (say, with owl:imports) without needing to see what's available around the prof:isProfileOf hierarchy.

All 29 comments

You can imagine a method deriving from the structures of the PROF ontology where code (a Linked Data crawler) could find a Profiles' resource with prof:role Validation (and perhaps a particular formalism of a Validation resource, such as SHACL or similar) and then it could recurse up a isProfileOf hierarchy finding all similar resources. Then, joining all those resources together, data claiming conformance could be validated against all. This would both ensure data is valid to a Profile and it's dependencies and also potentially allow Profile implementors to only have to define their extensions on the things they profile, not the full set of constraints.

In pseudo code, for data_x claiming adherence to profile_y:

profiles = get_profiles(profile_y)
validators = gather_validating_resources()
validate(data_x, aggregation_of_all_validators)

function get_profiles(profile_uri):
    recurse up a profile hierarchy indicated by isProfileOf statements and return all profiles' RDF

function gather_validating_resources(profiles_metadata, conformance_type):
    profile_validators = null
    for each profile in profiles_metadata:
        for each resource with role Validation and conformsTo conformance_type:
            add resource content to profile_validators

    return profile_validators

I'm not sure I quite followed this. I'm talking about the need for a conventional way to validate data against a specified application profile. If any given software validation process has to first traverse a graph in order to assemble and reconcile a collection of related validation resources, then I can't imagine that being widely implemented, or even scaling if it is. The opportunity, it seems to me, is for a documented application profile to offer a single, documented resource to facilitate automated validation of data which declares 'conformance' with that application profile.

Most probably I have misunderstood though!

@paulwalk I'm not sure you'll get exactly what you want as noone's yet able to agree on The Correct Way to validate against a profile (application profile). For instance, what constraint language is to be used, does validation need to validate against all dependency validators or not, what if dependencies use different forms of validators etc. At the moment, everyone's talking SHACL as a constraint language... except for those who prefer ShEx. And in my XML life, I use Schematron. So right there we don't have language uniformity.

Let's break this point down into sub points, just as I listed them above, but adding a few more lower-level ones:

To "Indicate a conventional way to automatically validate data instances of application profiles" PROF may need to:

  1. Indicate which validators are available
  2. Indicate a particular constraint language validator (standard & format)
  3. Indicate the role the validator plays (all constraints, partial etc.)
  4. Indicate whether dependency validators need to be consulted for validation or if this profile's validator is sufficient

For 1.: Currently Possible. The Profile just includes one or more pref:ResourceDescriptor classes describing validators.

<http://example.org/profile/x> a preof:Profile ;
    prof:hasResource [
        prof:hasArtifact <SOME_FILE_URI> ;
        ...
    ] ;
    ....

For 2.: Currently Possible. PROF suggests use of dct:conformsTo & dct:format to indicate this:

<http://example.org/profile/x> a preof:Profile ;
    prof:hasResource [
        prof:hasArtifact <SOME_FILE_URI> ;
        dct:conformsTo <http://www.w3.org/ns/shacl> ;
        dct:format <https://w3id.org/mediatype/text/turtle> ;
        ...
    ] ;
    ....

For 3.: Currently Possible. PROF uses prof:hasRole to indicate the particular role a prof:ResourceDescriptor plays with suggested roles defined at https://w3c.github.io/dxwg/profilesont/resource_roles.html but more are possible.

<http://example.org/profile/x> a preof:Profile ;
    prof:hasResource [
        prof:hasArtifact <SOME_FILE_URI> ;
        dct:conformsTo <http://www.w3.org/ns/shacl> ;
        dct:format <https://w3id.org/mediatype/text/turtle> ;
        prof:hasrole <http://www.w3.org/ns/dx/prof/role/validation> ;
        ...
    ] ;
    ....

For 4.: Not currently spelled out but perhaps able to be indicated with roles. E.g. if a validator with Role Full Constraints is used (defn: "Complete set of constraints for a profile") then it is sufficient to use this for profile validation. If only a validator with Part Constraints is available, then more info is needed.

I could imagine a new Role: "Differential Constrains" which would be this Profile's constraints, i.e. only those that this profile adds on top of those belonging to the things it is profiling. If this was available, you'd know you'd have to pull in dependency constraints to perform a complete validation.

I will suggest updates to the Roles Vocab for this.

Thanks, Nick, for taking the time to go into the detail like this.

I think we have a fundamental difference in understanding on this. In the use-case where someone wishes to validate data against some declared conformance with some metadata profile, I just don't see any advantage in the validation process being able to somehow automatically interrogate the application profile to determine which mechanisms its supports. This can be simply documented in prose - it is for the systems developer to decide which mechanism they want to use. The mechanisms themselves need formal descriptions, but that is a separate issue being taken care of by those communities.

This slightly reminds me of the huge efforts made to introduce Universal Description, Discovery, and Integration (UDDI) to web-services around the beginning of this century. Then Web2.0 came along and showed that all that was really needed was some nice clear documentation aimed at developers - what has become known as the 'Web API'.

I think Paul's interpretation is what we should be aiming for. He says "The opportunity, it seems to me, is for a documented application profile to offer a single, documented resource to facilitate automated validation of data which declares 'conformance' with that application profile." +1
The discussion around inheritance and traversing a graph just to check conformance seems to be introducing barriers to use and to mixing vocabulary from multiple standards.

The discussion around inheritance and traversing a graph just to check conformance seems to be introducing barriers to use and to mixing vocabulary from multiple standards.

@agreiner I completely agree. Notions of inheritance differ across the various technologies that people are likely to use, and it does not seem realistic to coin properties, such as prof:isInheritedFrom, with the assumption that people will share a common notion of inheritance.

OK - so lets just focus on what is proposed again:

1) the ability to declare conformance of a resource to a specific profile
2) the ability to indicate that a specific profile conforms to more general profiles

(and I think we have just shown that this cant be done with any universally applicable constraint language - so thats the motivator here)

3) a flexible way of referencing the various forms of documentation in a way a machine can find an appropriate resource for a particular function.

Whilst we can understand a desire for a universal validation use case, thats just not realistic (that would be the UDDI equivalent). So this related Use Case is limited to finding what resources are available and canonical description of them, not a canonical form of them.

So, we can accept the comment "we don't have a conventional way to automatically validate data instances of application profiles (i.e. data which allegedly conforms to the constraints of a given application profile)" as a truism. I dont see a concrete need or proposal for change beyond stressing the scope. Adding in the "competency questions" section ( #732 ) should help keep scope in mind.

re "The discussion around inheritance and traversing a graph just to check conformance seems to be introducing barriers to use and to mixing vocabulary from multiple standards." - this is _exactly_ what happens if you import RDFS or OWL axiomitisation. It is a barrier perhaps - but its the responsibility of the chosen set of constraints languages and how they are used that determines if navigation is necessary. The Profiles Vocabulary is agnostic about this choice. It provides the new capability of declaration of intent w.r.t. to conformance in heirarchies however, and this thread has not affected that underlying requirement or the proposed solution, so I think it can be closed as out-of-scope.

@rob-metalinkage writes:

So this related Use Case is limited to finding what
resources are available and canonical description of
them

Okay, the requirement is resource discovery.

its the responsibility of the chosen set of constraints
languages and how they are used that determines if
[traversing a graph to check conformance] is necessary.
The Profiles Vocabulary is agnostic about this choice.

The Profiles Vocabulary implies a generalized notion of
"inheritance". Can SHACL, PDF, Schematron, CSV, etc, be
said to follow a common notion of inheritance?

  1. the ability to declare conformance of a resource to a
    specific profile

"Declaring conformance" goes well beyond resource
discovery. The Profiles Vocabulary cannot be used to
_determine_ conformance of data to a specific profile;
that is the responsibility of the chosen constraint
language. Is the goal simply to record a conformance
result in RDF data? If so, are we to assume that
profiles and datasets are immutable? (Because otherwise,
the RDF data could be making an assertion that is no
longer true.)

Users can use appropriate technologies to test for
themselves whether a resource conforms. Is this not
better than trusting a "declaration of intent" (as you
put it) made in some RDF data at a particular point in
time?

Or is this not about conformance of data at all?

It is confusing to talk about conformance of "a resource"
to a profile, when profiles are typically used to test
the conformance of data, where you actually seem to mean
conformance of a particular expression of the profile
(e.g., SHACL, Schematron) to The Profile in a more general
sense".

"Resource" is unhelpful as a choice of words because in
RDF, everything is a resource.

  1. the ability to indicate that a specific profile
    conforms to more general profiles

(and I think we have just shown that this can't be done
with any universally applicable constraint language - so
thats the motivator here)

I'm seeing several types of conformance here:

1) Conformance of a profile to a standard used to express
the profile (one example in PV shows something,
presumably a SHACL graph, which conformsTo SHACL).

2) Conformance of data to a profile (which is perhaps out
of scope of the PV?).

3) Conformance of a profile to another profile. I am
struggling to see what this means in the general
sense. Does a profile "conform" if it restricts
another? If it "extends" another? Or do you mean to
say that one profile actually validates another?
Or do you mean that alternative expressions of a
profile conform to each other?

A requirement for discovering profiles related to
particular dataset would make sense to me, but not a
requirement for taking a SHACL/ShEx/Schematron/whatever
validation result at a given point of time and expressing
it as a "declaration of intent" in RDF.

  1. a flexible way of referencing the various forms of
    documentation in a way a machine can find an
    appropriate resource for a particular function.

Right, resource discovery, though I'm unclear on what
"particular function" means here.

It provides the new capability of declaration of intent
w.r.t. to conformance in heirarchies however, and this
thread has not affected that underlying requirement or
the proposed solution, so I think it can be closed as
out-of-scope.

I do not see a requirement to express "declaration of
intent w.r.t. to conformance in hierarchies" in RDF data
about profile documentation. Where is the use case? I
am struggling to see why this is considered to be in
scope.

@paulwalk Even though we have suggested ways of doing this here and lots of discussion, actually proposing a conventional way to automatically validate data instances of application profiles is out of scope for the Profiles Vocabulary document so we won't be taking any action here.

@nicholascar In my reading, @paulwalk is not asking that PROF propose a conventional way to automatically validate data instances of application profiles, as you imply in your first and last postings to this issue. Rather, he is questioning the value, for validation, of documenting dependencies between the documents needed for validation in PROF-based metadata (the "graph to be traversed"). As he writes above:

In the use-case where someone wishes to validate data against some declared conformance with some metadata profile, I just don't see any advantage in the validation process being able to somehow automatically interrogate the application profile to determine which mechanisms its supports. This can be simply documented in prose - it is for the systems developer to decide which mechanism they want to use. The mechanisms themselves need formal descriptions, but that is a separate issue being taken care of by those communities.

@paulwalk is questioning whether developers really need, or would use, PROF-based metadata in support of validation as opposed to more community-specific mechanisms. Or @nicholascar, do you mean to imply that PROF-based metadata is not intended to have any relevance for validation?

I haven't digested all of this lengthy thread, but it seems to me that it would be useful to include a link to a validation resource (xml schema, schematron, SHACL) that can be used to test conformance with a profile, and the existing prof:hasResource and prof:hasRole seem that they would support this kind of linkage (as in https://github.com/w3c/dxwg/issues/698#issuecomment-458102895, above). For more complex validation procedures, the 'hasResource' could point to a document describing a validation procedure. If that's so, the existing design doesn't need modification, just some role vocabulary.

I'm basically confused, but I did add a link to Paul's original email to the first comment as a way to getting back to that context.

If the discussion here is whether PROF should have a role "validation" and a description of the validation resource (as in https://github.com/w3c/dxwg/issues/698#issuecomment-458102895, above), then I don't think there is disagreement. So I would +1 that, and this should be finalized in the discussion of roles.

If the discussion is about defining validation of a profiles with hierarchical relationships, then I think that discussion could take place at #1043 although it needs links back to prior discussions, which I will try to add there. We've had some reach discussion about this which we shouldn't lose.

@kcoyle - correct those concerns are dealt with in different issues.

There is one other concern hidden in here, but its out of scope for this issue as stated - and that is whether it is possible to create a "consolidated" form of validation resource for a given choice of constraints language (and the answer is of course but thats a matter for constraints languages) and describe the role it plays

We have a role "validation" with a comment "This role implies inclusion or import of inherited constraints" which is distinct from "constraints" which are the specific constraints that a profile adds to its base specifications. These are not disjoint - you can declare that a resource both defines constraints and can validate conformance to the profile.

2PWD has quite explicitly indicated a way to do this, and individuals may or may not "see the need" for this - it doesnt require an open issue to capture non-interest in part of a specification.

So we should close this issue and only raise another issue if someone wants to propose improvements to those existing role names or definitions which were already released in 2PWD.

@rob-metalinkage Can you explain what a "consolidated form of validation resource" is?

@paulwalk can you please comment on where you feel this issue is up to?

@nicholascar Thanks for the invitation to comment - I have just read through all of this again. I have found some of this to be quite confusing and I suspect this is because people are talking about different things. I'm especially struck by the comment from @rob-metalinkage a couple of comments up from here:

...individuals may or may not "see the need" for this - it doesnt require an open issue to capture non-interest in part of a specification.

My position, as stated in my email feedback (linked at the top of this issue - thanks @kcoyle), is that I and, I believe, other people interested in using application profiles to constrain data would like to see:

a conventional way to automatically validate data instances of application profiles (i.e. data which allegedly conforms to the constraints of a given application profile)

I believe the business of formally managing the inheritance of constraints from one application profile to another is somewhat orthogonal to this requirement, in the sense that it ought to be possible to satisfy the requirement without developing such a mechanism. For this reason, I "do not see the need" for the latter - or at least do not recognise that it is a necessary foundation to solve the requirement.

In very simple terms, expressed as a "user-story":

_As a systems engineer, I would like to understand from the formal documentation of an application profile exactly how to go about validating the conformance of this big pile of data I am planning to ingest/publish to that application profile, so that I may interoperate at a syntactic and semantic level with some other system which will supply/consume the data._

This issue, then, is about simplifying the process of understanding how this validation can be done by establishing some convention(s) for documenting and implementing this.

I'm not at all sure that the discussion here has got us closer to that. However, I freely admit that I have not entirely understood the discussion - insofar as some of it seems to be about somewhat orthogonal issues to do with modelling inheritance between the application profiles themselves.

@nicholascar You asked @paulwalk to clarify comments he made seven months ago, on Janary 25, and to which the WG never responded. In such cases, please cite your sources. The burden should not be on readers of this (or any other) thread to click, scroll, or google around to find what you are referring to. Given the time passed, and as a courtesy to commenters (and readers of the thread), you should also please say in such requests whether anything of relevance to the issue changed in the 2PWD. I intend to make this point every time I see a response to commenters that does not cite its sources.

@paulwalk

I believe the business of formally managing the inheritance of constraints from one application profile to another is somewhat orthogonal to this requirement, in the sense that it ought to be possible to satisfy the requirement without developing such a mechanism. For this reason, I "do not see the need" for the latter - or at least do not recognise that it is a necessary foundation to solve the requirement.

I agree with your point but would add that there is no common notion of "inheritance" shared by designers of profiles. Put another way, there is no common notion of "inheritance" that is shared by the technologies used to express profiles. Each technology has _its own way_ of expressing dependencies among documents (sourcing, importing, citing, declaring namespaces...). Trying to reflect those dependencies in metadata seems redundant and, because metadata is often an afterthought, brittle.

This issue, then, is about simplifying the process of understanding how this validation can be done by establishing some convention(s) for documenting and implementing this.

I agree and, to be clear, "establishing some conventions" for documentation should not imply that there is some universal way to model validation.

I really like the suggestion from @smrgeoinfo that one link to a validation resource when possible but that when more complex validation procedures are involved, just point to a document describing the validation procedure.

@tombaker I presume you missed my email to Paul that accompanied this request: https://lists.w3.org/Archives/Public/public-dxwg-comments/2019Aug/0005.html

@nicholascar I did indeed miss the mail to the list! It a bit confusing with so many channels in parallel, so may I suggest we simply try to link between the two as much as possible.

@paulwalk responding to your "user-story":

As a systems engineer, I would like to understand from the formal documentation of an application profile exactly how to go about validating the conformance of this big pile of data I am planning to ingest/publish to that application profile, so that I may interoperate at a syntactic and semantic level with some other system which will supply/consume the data.

I think that what you ask for is there in PROF by design! If a profile is described using PROF, it will have each of the parts (files/resources) identified and allocated a role. A starting role list is in the specification in Section 9 and it includes a role "Validation" defined as "Supplies instructions about how to verify conformance of data to the profile". So, in your user-story, the actor would look within the profile for a Resource with that role.

A community could extend the Roles vocabulary - they are expected to - to create a specific version of "Validation" if they wanted more specific resources indicated, perhaps "Validation-using-SHACL" or similar.

We do hope to maintain a live vocabulary of roles that may grow once the specification is complete and immutable. We've set up a namespace for them for that purpose.

@nicholascar that sounds reasonable to me - so long as the framework recognises that preceding standards or application profiles - even where there is a notional "inheritance" - may not have been described with the same ontology, and that this would not prevent validation of data against a declared application profile.

This issue has gone a bit wild... but if we can all agree that:

  • the "convention for finding how to validate data against a profile" would be that consumers of PROF profile metadata would seek to access the Resources with role "validation" and try to make sense of them
  • that making sense of them cannot be specified further because there too many validation models/languages that could be used (btw for me a "validation resource" could be actually a set of human-targeted instructions...)

Then I think I'd be happy with that :-)

There could be some language in the spec about how users of a profile could look at the validation resources attached to related/"ancestor" profiles (i.e. in a prof:isProfileOf relationship), if this could bring them some useful validation resources. But we should also be very cautious here. I.e., there can be models/languages (say, OWL) that allow to re-use specifications across different levels of specifications built on top of each other. But then the re-use of the validation resources could be specified at the level of the resources (say, with owl:imports) without needing to see what's available around the prof:isProfileOf hierarchy.

@smrgeoinfo @tombaker +1 to separating out the concerns of the validation procedure from the resources needed to perform it. This is not currently supported in proposed roles - so if we have a concrete example available it could be added. Create new issue to suggest a name and description for such a role if desired.

All, we have #1049 for discussion of what roles to include, and how to define them.

Continuing action occurring elsewhere so closing this Issue after a period of due-for-closing and no further actions.

@nicholascar Can you say what you think was resolved here? And also please link to the related discussions in the closing message. There are a lot mentioned above and it isn't clear to me which ones carry forward this discussion. Thanks.

we don't have a conventional way to automatically validate data instances of application profiles

PROF indicates a. how you would identify a profile such that it can be used as a conformance target (with a PROF definition & a URI) and b. how to link resources to that conformance target (via ResourceDescriptor instances). Resources can be associated with roles (ResourceRole instances) from an open-ended vocabulary to indicate if they are suitable for validation or related purposes.

PROF leaves to communities how specific validation should occur for their profiles and this is described in the ED's Introduction.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andrea-perego picture andrea-perego  路  5Comments

bertvannuffelen picture bertvannuffelen  路  4Comments

jpullmann picture jpullmann  路  7Comments

andrea-perego picture andrea-perego  路  3Comments

andrea-perego picture andrea-perego  路  6Comments