Dxwg: Revisiting the definition of "profile"

Created on 27 Jun 2019  Â·  91Comments  Â·  Source: w3c/dxwg

As per the meeting of June 25, 2019, this is an area for the discussion of the definition of "profile". Let's put each definition in a separate comment so people can up or down vote them.

due for closing prof-due-for-closing profile-guidance profile-negotiation

Most helpful comment

Karen suggests a slightly different wording at https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Jun/0146.html - which would also be fine with me:

A human- or machine-readable specification that defines additional
constraints, conventions, or extensions over one or more given data
specifications.

All 91 comments

The current definition:

A named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.

A modification of the definition suggested by Antoine in this email. changing "on one or more" to "can be based on", resulting in:

"A named set of constraints which can be based on one or more identified
base specifications, including the identification of any implementing
subclasses of datatypes, semantic interpretations, vocabularies, options
and parameters of those base specifications necessary to accomplish a
particular function."

A profile is a specification that applies additional constraints for validation of conformance targets to those defined in one or more base specifications. Resources that conform to a profile must conform to all provisions of the profile's base specifications.

Note-- without the conformance to base profiles provision, a profile is just another specification.

@agreiner Annette responded to Antoine's thoughts with an email that states that

"Profiling is about acknowledging a reuse. The threshold at which that
acknowledgment is called for would be hard to define, but it seems to me
that it is up to the creator of the spec or profile what level of
acknowledgment they want to advertise. For example, I don't think that
DCAT becomes a profile of vCard by including vCard elements. And I don't
think we would want to advertise it as such, because we don't have the
goal of creating a version of vCard that suits the needs of a specific
community. I think it comes down to the intent."

data profile 

    A human- or machine-readable specification that
    clarifies, constrains, combines, excerpts,
    extends, or annotates one or more given data
    specifications.  

    A well-designed data profile provides
    information, useful for describing data in a
    given context, without semantically contradicting
    the data specifications on which it is based.

data specification

    A document, or family of related documents,
    possibly in alternative or complementary human-
    and machine-readable formats, that provide
    vocabularies or guidelines usable for describing
    data.

For further discussion, see https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Jun/0144.html

Karen suggests a slightly different wording at https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Jun/0146.html - which would also be fine with me:

A human- or machine-readable specification that defines additional
constraints, conventions, or extensions over one or more given data
specifications.

Looking at the W3 site , aside from our work I see things like:

  • The Registered Organization Vocabulary is a profile of the Organization Ontology for describing organizations that have gained legal entity

  • ADMS is a profile of DCAT

  • Appendic C was a profile of a specific XHTML syntax but not a profile of a specific HTML syntax, whereas Polyglot Markup is a profile of both the specific syntax of XHTML5 and the specific syntax of HTML5

  • WebCGMTM is a profile of the ISO Computer Graphics Metafile standard (ISO/IEC 8632:1999), tailored to the requirements for scalable 2D vector graphics

  • The value specified must adhere to the W3C Date and Time Format, which is a profile of ISO 8601

I think that the Antoine definition would describe the intent of these.

Elsewhere, outside of W3C, I see things like:

  • http://www.dcc.ac.uk/resources/subject-areas/biology with a few mentions of metadata profile, or similar and also "data profile" [e.g. FGDC/CSDGM Biological Data Profile ] or "an extension of" [e.g. HISPID - Herbarium Information Standards and Protocols for Interchange of Data
    An extension to ABCD 2.06, it is designed to allow the storage and transmission of herbarium plant specimen data.]

I think that these also fit with the Antoine's definition.

The docs for these mention that they are based on some prior standard and that this was done for a defined purpose [e.g. section 2 of https://github.com/tdwg/abcddna/blob/master/640-3005-1-SM.pdf

Section 1.1.2 of https://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/biometadata/biodatap.pdf is usefully detailed:

This profile is intended to support the collection and processing of biological data. It is intended
to be useable by all levels of government and the private sector.
The profile was developed by defining information required by a prospective user to determine
the availability of a set of biological data; to determine the fitness of a set of biological data for an
intended use; to determine the means of accessing the set of biological data; and to successfully
transfer the set of biological data. As such, the profile establishes the names of extended data
elements and compound elements (groups of data elements) to be used for documenting
biological data, the definitions of these extended compound elements and data elements, and
information about the values to be provided for the data elements. The profile also describes any
modifications to the optionality or repeatability of non-mandatory elements and any
modifications to the domains of standard elements in the FGDC’s Content Standard for Digital
Geospatial Metadata.
The standard is not intended to reflect an implementation design. An implementation design
requires adapting the structure and form of the profile to meet application requirements. The
profile does not specify the means by which this information is organized in a computer system
or in a data transfer, nor the means by which this information is transmitted, communicated, or
presented to the user.

Antoine suggests that the definition of "profile" be modified to read:

A named set of constraints, which can be based on one or more other identified specifications...

I think we are all in agreement that a profile typically constrains some other (base) specification(s) and we might even accept that a profile is not _necessarily_ based on other specifications. However, this is not the part of the definition that is problematic.

The biggest problem lies with named set of constraints.

None of the examples cited by @pwin above, I would assert, can usefully and accurately be characterized as a "set of" anything, and certainly not as a "set of constraints", especially if the notion of "constraint" is not defined. The notion of a "named set" does not make the definition any better.

To test whether something a set, it should be possible to enumerate the members of that set. The Registered Organization Vocabulary, to take one example, has a list of namespaces used, "Status of this document", header metadata, change log, property and class definitions, usage guidance, Table of Contents, Acknowledgements, References, copyright notice, and the like. Is it helpful to call this a "set of constraints"?

"Set of" is the language of mathematics and computer science. An RDF graph can usefully be described as a set of triples. A Python dictionary can be described as a set of key:value pairs. But to reduce profiles to "sets of constraints" implies a mathematical rigor to the notion of profile that is quite inappropriate.

As for "constraint": if anything is a constraint, then nothing is a constraint. And if a constraint does not actually "constrain" something, then it fails the test of common sense.

Hence my definition (above) of "data profile" as a "specification" -- not a "set" -- that constrains, extends, or annotates a "data specification". This definition avoids the thorny question of how to usefully define "constraint". We can discuss whether a profile MUST refer to another data specification or MAY stand on its own, but that is a different discussion.

So, @tombaker , the challenge is that much of what was in my list above was from a different era, and that we need a development that looks forward to the increased use of profiles that are likely to have a certain amount of rigour based on set membership etc. If we had this approach in our minds at the start, things like licenses wouldn't be as difficult to work with as they currently are. So I think that we are needing to frame something that is (perhaps awkwardly) backwards compatible, but sets a sense of direction for future work.

@tombaker asked:

Is it helpful to call this a "set of constraints"?

Could they be enumerated in a checklist? If so, then 'set' is fine AFAICT, in fact it is a useful idea.

I understood that the goal of the profile-vocabulary was to help in expressing a profile more formally.
Wouldn't this converge towards enumerating the _set_ of concerns?

@dr-shorthair I'm not sure what exactly you are including in "they". If you look at DCAT-AP, which I assume we consider an application profile, then there is a significant portion of that document is cannot be ennumerated. There is introductory material, and the organization of the document includes the division of terms into categories, which would suggest that these are all subsets.

4 DCAT APPLICATION PROFILE PROPERTIES PER CLASS
A quick reference table of properties per class is included in Annex I.
4.1 Catalogue
4.1.1 Mandatory properties for Catalogue

If you are only considering the terms themselves, outside of the document, then you may be referring to the RDF or SHACL files as "enumerations", but those are greatly reduced in content from the AP itself, even lacking the usage notes and categories of the PDF document. Could the enumeration contain everything in the AP? I dont' think we've shown that to be the case yet, although that is a worthy goal.

If we allow our definition to include APs that are expressed as documents, like DCAT-AP, then we cannot limit them to things that can be considered "sets" in the formal sense of that term, and the definition suggested by Antoine has an air of formality, so people would not be wrong to assume the mathematical sense of "set" rather than the very informal "any bunch of stuff".

@pwin You found one "profile" that was stated to be "an extension of" another. That is in contradiction to the definition that considers profiles to be "constraints". What fits better in the definition is the statement that a profile "defines additional constraints, conventions, or extensions " that comes from RFC 6906. Limiting to constraints - or I could say constraining our definition to constraints - leaves out a significant number of actual APs.

Yes @kcoyle . I wanted to find usage of the term 'in the wild' with examples that we might be able to use to clarify our thinking and also try to evidence the components of our definition. That example was from a US Federal source in 1998/9.

The question that interests me is whether 'profile' should mean something that has implications for software implementations, or if it is a sort of qualitative statement something like 'a lot of what is this spec is adopted from pre-existing spec X (or specs X, Y, and Z)'.

There are pre-conditions to be software actionable-- the base spec has to define some software-testable conformance requirements. If it doesn't, then there is no machine-actionable way to validate resources for conformance to that spec, and there is no point trying to formalize what a profile of that spec is in a machine-actionable sense.

What I'm really interested in here is machine interoperability-- aggregators can harvest metadata conforming to some specification (or profile of a spec) so as to consistently index harvested metadata, present results to users, and offer value-added services based on the harvested metadata. Software applications can advertise the kind of input formats they need (a spec or profile of a spec), and datasets can declare a spec or profile of a spec they conform to, so that search applications can get the user from data discovery to data utilization with fewer clicks. I tried to provide some concrete examples of this in ID21 and ID22. ID2 and ID3 are also related.

@smrgeoinfo I think that software development should be a (perhaps significantly increasing) subset of the use cases for a 'profile'. This is why I think we need to be inclusive in our definition, but quickly move to some broad taxonomy of profile types that can then permit development of good practice documentation. There might also be a migration path that could be suggested between things which at present require quite a bit of human in the loop to other processes that make much more machine use. There are many examples of maturity model that we might use for this - here I'm thinking about interoperability maturity models as referenced in https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/document/interoperability-maturity-model and interoperability reference architectures as described in https://ec.europa.eu/isa2/solutions/eira_en

@pwin I think your suggestion that a taxonomy of profile types is what is needed. Perhaps this could be approached (partly) as a taxonomy of relationships between specifications. This is based on the view that a profile is just a kind of specification that has a particular relationship with one or more base specifications.
Some possible relationships (just a start...)

  • inheritsFrom - source spec inherits some provisions from target (base) spec, with no implication that conformance to the source spec implies conformance to target spec
  • constrains -- source spec restricts provisions in target (base) spec, but does not add additional provisions (i.e. no new elements or properties); conformance to source implies conformance to target
  • inheritsAndExtends - source spec inherits some provisions from target (base) spec, and adds additional provisions for conformance that are not inconsistent with the base spec. conformance to source implies conformance to target
  • uses -- source spec inherits or restricts provisions in target (base spec), but adds additional provisions that are not necessarily consistent with the base spec. Conformance to source does not imply conformance to target.
  • related -- source spec is conceptually based on the target (base) spec, but one or the other spec does not assert testable constraints, so there is no implication of conformance to either source or target.

The specification taxonomy would focus on aspects like 'does the spec provide machine actionable conformance tests', 'has clearly defined conformance requirements', 'provides guidance and recommendations but not testable conformance requirements'

@smrgeoinfo , the relationships you outline look very similar to the ones defined in VOAF (http://purl.org/vocommons/voaf) - probably with the exception of the conformance aspect.

@pwin @kcoyle We need not look back to a "different era" to find profiles that "extend" (one of the patterns enumerated above by @smrgeoinfo). The GeoDCAT-AP and StatDCAT-AP listed in Section 14 of DCAT clearly describe themselves as "extensions". If the German adaptation of DCAT-AP listed there is at all typical, the language and country adaptations are not just translations, but include both constraints and extensions ("Einschränkungen und Erweiterungen").

@dr-shorthair

Tom: Is it helpful to call this a "set of constraints"?
Simon: Could they be enumerated in a checklist? If so, then 'set' is fine AFAICT, in fact it is a useful idea.

I think you are asking whether "constraints" defined in profiles could be enumerated. If we had a coherent definition of "constraints" (which we do not), I'm sure this could quite trivially be done. But it is terribly reductive to characterize profiles as "sets of constraints" as if they were data structures like RDF graphs ("sets of triples") or Python dictionaries ("sets of key:value pairs"). If the PDF of a profile cannot be algorithmically processed as a set, and an ordinary user cannot immediately recognize its contents as constituting a set, it should not be called a set.

I understood that the goal of the profile-vocabulary was to help in expressing a profile more formally.

But is that really the goal of the profile vocabulary? Technologies such as ShEx, SHACL, Schematron, etc, really do help express a profile more formally. The draft Profiles Vocabulary, as I see it, merely aims at describing relationships within a cluster of documents.

@tombaker scripsit:

But it is terribly reductive to characterize profiles as "sets of constraints" as if they were data structures [...]

Would it help to call them "collections of constraints" or "lists of constraints" in order to get away from the mathematical notion of sets?

@larsgsvensson wrote:

Would it help to call them "collections of constraints" or "lists of constraints" in order to get away from the mathematical notion of sets?

The problem is that profiles consist not just of constraints alone, but also of extensions and usage annotations. (I emphatically resist the notion that extensions and usage annotations can be characterized as "constraints". If constraints do not "constrain" something, they fail the test of common sense. If everything is potentially a constraint, then the term is effectively meaningless.) I cannot think of a single word that captures the range of things one might find in a profile.

The definition I proposed, or a variant thereof, gets around this problem by saying that a profile is a specification that "constrains, extends, or annotates" one or more data specifications. This definition defines the profile in terms of how it relates to other specifications, not in terms of the sets, lists, or collections of things it might contain.

@smrgeoinfo @pwin I agree re: need for taxonomy along these lines. This is precisely the sort of analysis one would wish from a Guidance document. Indeed, Section 1.2 of the Guidance draft points in that direction.

This is all getting rather convoluted - at this stage we dont need a taxonomy because we only need to deal with strict conformance - uses, or guided by or any other possible relationship may be described by some other process - its an open world - its not relevant to conneg.

Other rather obvious points: text in a document describing a profile is not the same as the profile - all specifications have background and informative information, and a set/list/collection/handbag of normative requirements - all that other stuff is metadata and annotations about the "thing". nothing in the definition does, or should, stop you from describing any aspect any way you like - thats just confusing the representation of the concept with the concept itself.

Usage annotations are just open-world view of the original specification - extensions however actually do constrain things if there is a requirement to use the extension to conform to the specification.

We simply dont have Use Cases around annotation, re-use, or publishing optional extension vocabularies - valid though they may be. It is true that existing profiles may also perform these roles w.r.t. to specifciations - but nothing stops them having multiple roles.

Feel free to suggest better wording, but dont introduce different requirements without following the agreed UCR discipline

@rob-metalinkage

we dont need a taxonomy because we only need to deal with strict conformance

The taxonomy proposed by @smrgeoinfo nicely shows that different types of profile imply different notions of conformance. In such a context, what does "strict conformance" even mean?

Other rather obvious points: text in a document describing a profile is not the same as the profile

I think we can agree that DCAT-AP is a profile, correct? The various things _in DCAT-AP_, such as its text, are obviously not the same as the DCAT-AP.

thats just confusing the representation of the concept with the concept itself

I agree with Antoine's preference for "specification" over "document". While "document" implies a concrete instance of a profile, "specification" implies something more conceptual, which allows different representations to be grouped together (and which is what I think you are driving at with "concept"). This is why I use "specification" in the definition proposed above.

We simply dont have Use Cases around annotation, re-use, or publishing optional extension vocabularies - valid though they may be.

We do have:

6.12.1 Profile documentation [RPFDOCU]

A profile should have human-readable documentation that expresses for humans the main components of a profile, which can also be available as machine-readable resources (ontology or schema files, [SHACL] files, etc). This includes listing of elements in the profile, instructions and recommendations on how to use them, constraints that determine what data is valid according to the profile, etc.

6.11.1 Human-readable definitions [RPFHRDEF]
6.11.2 Global rules for descriptive content [RPGRDC]
6.11.9 User interface support [RPFUI]

wow this discussion really goes into too many aspects: using "constraints" or not, "being based on something" or not, being a "set" or not...
Maybe we should split and identify the different facets of the issue? ('m pessimistic about it, as every attempt to do this seems to lump again everything together. But maybe we can try...)

On these different aspects:

  • group/list/collection works fine for me. Whatever gives the sense that profile can specific several things.
  • constrains may be removed but I still would like something that expresses that profile seek to control how data is. Even when I annotate or create an extension of something, it is with the idea of having some form of control, otherwise it's pointless. To me the word "constraint" captures this (https://www.merriam-webster.com/dictionary/constraint) but if you want another word I won't have an objection as long as it sends the same sort of message.
  • on "based on something" or not, maybe we should have two different definitions (one subsuming the other) as hinted at https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Jul/0057.html . The more I think of it, the more I think DCAT would need the two anyway, as it uses the two (see my email at https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Jun/0106.html)

And I disagree with the notion of taxonomy proposed by @smrgeoinfo . Not that the notions are meaningless, it's rather that it presents them as alternatives, while to me they can be combine (i.e. an extension is a form of constraint on the data too, see my comment above)

@smrgeoinfo Here's what we have in the very drafty draft of profile guidance for profile "types" :

1.3 Examples of profiles and related work

Profiles can take a number of forms and can have a variety of relationships to existing vocabularies, standards, and other profiles. We recognise this variety, but for the purposes of this document we are focusing on the most general forms of profiles and profiling. Although it is not possible to list all of the types of profiles, some illustrations of frequently-used profiles include:

profiles that are subsets of a larger vocabulary. These reduce the vocabulary terms of a broad data standard to a smaller number of terms that are useful for a particular community member or application. An example of this is BIBFRA.me, which is designed for library materials and defines both a core set of terms as well as profiles for specialized communities such as cataloging of rare materials or early printing trade. In this community, all profiles use only terms from a single vocabulary.

profiles that can both reduce and extend a base standard. These profiles are developed by members of a data-sharing community but for reasons of jurisdiction or specialization need to add terms beyond the base standard vocabulary in order to meet their needs. They may also omit terms from the base standard that are not relevant to their implementations. An example of this is data catalog vocabulary standard, DCAT, its primary profile, DCAT-AP, and the national variants (DCAT-AT-IT, DCAT-AP-NO, DCAT-AT-DE). While maintaining overall compatibility with the larger data catalog community, each of these profiles adds needed terms for the local variant. These profiles generally make use of terms from more than one namespace.

profiles that amend a base standard by inheriting or overriding values of that standard. The example here is of the Open Digital Rights Language (ODRL) which is a language to support rights in the use of digital content in publishing, distribution, and consumption of digital media. The ODRL language encodes a policy that has a core vocabulary that can be extended or overridden by individual instances called "profiles." profiles that use some vocabulary terms from multiple standards without having a strong relationship to any base standard. These profiles develop new groupings of existing terms as vocabularies and may define new terms as needed. An example of this is the Asset Description Metadata Schema (ADMS) vocabulary [vocab-adms]

And I'd like to emphasize that @kcoyle 's "types" above are more exclusive than the other taxonomy. We've factored out notions like "use", "constrain" and "inherit" because all these type do these things, to some degree.

@aisaac As an English-speaker, extensions do not make sense as "constraints". Constrain really means to limit and is the opposite of expand or extend. This is why I favor listing contraints AND extensions in the definition. I understand how the dictionary definition could be used to include a profile with extensions, but it is going to be quite counter-intuitive to readers and will cause confusion. I see no problem with using both constrain and extend, which are easily understood.

@kcoyle my problem is really that my extensions are constraints on the data.
I think we're talking about "constraints/restriction over an existing specification" (your sense) and "constraints over data, independently on whether they are derived from an existing specification" (my sense). What word to use for my sense?

Maybe what I'm after is 'validation of conformance targets' as @smrgeoinfo has put it at https://github.com/w3c/dxwg/issues/963#issuecomment-506438552 but this looks quite complex an expression

@tombaker for the record I've -1'ed your proposal at https://github.com/w3c/dxwg/issues/963#issuecomment-506650168 mostly because of the definition of data specification, which I find too document centric. And (probably as a result) then the definition of data profile is only in terms of machine- or human- readable doc, while I still would like a conceptual aspect (which you've correctly identified in other messages!)

@aisaac I really like your distinction between specification and document, so if my definition can be improved to make that clearer, I'm happy. Instead of "family of documents", would reference to related "representations" be clearer?

@tombaker this is better but I'm still hesitant - it depends on the final wording. I see the specification as something conceptual, a "hub" between representations/documents, so I'd prefer something that hints that a data specification has a family of representations, rather than is a family of representation (even though of course the representations are an essential part of the data specification). Similarly for your definition data profile, for me it's not about being human- or machine-readable, but rather about having human- or machine-readable representations (and btw I guess in Guidance we should say that there must be some human-readable representation, even partial, otherwise it's a strange spec...)

@smrgeoinfo Nothing in our definitions or the guidance document say that profiles CANNOT be actionable. Actionable profiles are one of the many possible kinds of profiles, and we have more than one use case that speaks to validation of data based on profiles. (5.41 Vocabulary constraints [ID41]) (5.37 Europeana profile ecosystem: representing, publishing and consuming application profiles of the Europeana Data Model (EDM) [ID37]), If your need is to have a profile that you can compute over, that is fine. We do, however, want to include profiles that are not written as actionable code, such as DCAT-AP and many others. We can hope that our future is one with standarized and actionable profiles, but we have a present and legacy that also needs to be accommodated. My feeling is that excluding those will overly narrow the community that will participate with us.

Note that I come from a humanities environment where the data is much less rigorous and other than search and display very little computation takes place. Profiles have been written documents, and only now is there some profiling based on RDF classes and properties.

but rather about having human- or machine-readable representations

+1 to representations, without saying more about what they are or their relation (could be from a single SHACL or ShEx document; could be separate).

? Do we exclude profiles without one or the other? Can we say "may have" to be inclusive?

RE: Constraints

@aisaac when you say:

my problem is really that my extensions are constraints on the data.
I think we're talking about "constraints/restriction over an existing specification" (your sense) and "constraints over data, independently on whether they are derived from an existing specification

I suspect that we are tripping over the fact that we are sometimes talking about different things:

1) the profile in relation to an existing specification, where the profile can constrain or expand prior vocabularies
2) the profile in relation to the data it defines, where it the profile defines/constrains the data

So we should talk about these two separately in the definition. First, what the profile is in relation to other specifications, and then its role in defining/constraining the data for a use or application. If we make that clear then I think we may have solved the confusion we have about "constrain" - that a profile's purpose is the constrain some data, but it does so by being based on prior specifications and may limit or expand on those.

data profile 

    A data specification, with human- and/or
    machine-processable representations, that defines
    the content and structure of data used in a given
    context, often by constraining, extending, or
    annotating other data specifications.

data specification

    A specification, with human- and/or
    machine-processable representations, that
    provides vocabularies or guidelines for
    describing data.

This proposal, based on earlier proposals, as presented on the public-dxwg-wg list, tries to retain features that were upvoted by Lars, Riccardo, Peter, and Karen, while addressing several issues:

Some comments on @tombaker 's definition (thanks! I like it better indeed):

  • I'd like really the "named" element of our earlier spec to be kept. It's essential to our view of profile
  • I also like the ideas of "collection" of constraints, even though we could replace constraints by something else
  • describing data is not prescriptive enough for me. I'd like more to keep "defines/constraints" in @kcoyle 's proposal above
  • "that defines the content and structure of data used in a given context" is probably true for "data specification" as well, not just data profile
  • in fact I don't understand well what "annotating" means
  • some of the enumeration of what's contained in a profile from the earlier definition would really help. I think we didn't add these for no reason, at the time!

I don't see the value of calling something a profile if it doesn't constrain, extend, or annotate some other specifications-- in which case it would just be another specification as noted by @agreiner in the e-mail thread.

@tombaker 's definitions could reduce to

data specification
A specification, with human- and/or
machine-processable representations, that defines
the content and structure of data used in a given
context

data profile
A data specification that constrains, extends, or
annotates other data specifications.

With the understanding that a specification must be identifiable (named) to be useful. The name/identifier for either a data specification or data profile could be used for profile negotiation, or as the value for a 'dcat:conformsTo' property on a dcat:Resource or dcat:Distribution.

Annotate is interpreted to mean provide additional guidance or recommendations on usage to improve interoperability

@smrgeoinfo

Annotate is interpreted to mean provide additional guidance or recommendations on usage to improve interoperability

Yes, that is what I meant. I like your reduction of my proposed definitions!

@aisaac What does "named" actually add? Can we not take it for granted that a profile would have a name?

describing data is not prescriptive enough for me. I'd like more to keep "defines/constraints"

I agree, and that is indeed why I changed "describes" (from my earlier proposal) to "defines" - borrowed from Karen.

some of the enumeration of what's contained in a profile from the earlier definition would really help. I think we didn't add these for no reason, at the time!

When I dug a bit deeper, I found that the wording ("subclasses of datatypes, semantic interpretations, vocabularies, options and parameters... necessary to accomplish a particular function") was mostly borrowed from ISO 10000-1, which AFAICT was a software engineering standard. I questioned the notion of "sub-classes of datatypes" because datatypes are not classes, and I suspect that the original ISO standard actually was referring to software functions. I am also unclear what "parameters" refers to; does DCAT have parameters? That leaves vocabularies, semantic interpretations, and even options, which make more sense to me.

@tombaker yes "defines" is better. And I am not claiming that we should keep all the enumeration of examples verbatim. But I think some enumeration helps.

@tombaker I'm keen on "named" because this paves the way for URIs. And it's a requirement we've identified, I believe.

@smrgeoinfo @tombaker yes replacing "annotating" by your suggested paraphrase would be very helpful. "Annotate" is perhaps the only word that's more ambiguous than "profile" in our context ;-)

@smrgeoinfo as I've said on the mailing list, can live with not calling "profile" the things/specs that are not derived from other things/specs.

I like the reduced form of Tom's definitions

+1 from me too. Maybe with a slight revision:

Data profile
A data specification that constrains, extends, and/or annotates other data specifications.

I also wonder whether we should include "combine" for profiles based on multiple specifications:

Data profile
A data specification that constrains, extends, combine, and/or annotates other data specifications.

I dont think there is actually a consensus yet on what a profile is, so we cant vote for wording and expect a good outcome.

I have proposed a new UC which hopefully covers the concerns people have over things like "recomendations" in documents co-existing with testable requirements:

https://github.com/w3c/dxwg/issues/978

We may need to update examples to add identifiers for different "sets" within "documents", or define a convention that a conformance is strictly interpreted as conformance to the mandatory set of requirements, and that further description is necessary to handle cases of suggestion, optional re-use in an open world etc.

ps - i think we are finding we cannot "take it for granted that a profile would have a name" - life would be easier if we could :-(

Copying email comments to github (my bad)

Karen said:

What is meant by "named"? Is a name the same as an identifier? The same
as a title on a document?

Note that we have stated that for content negotiation the resource MUST
have a web-based identifier (URI/IRI). When a profile is a document
(e.g. PDF) then I would say that it SHOULD have an identifier, but if it
doesn't it does not cease to be a profile.

Antoine replied:

I think at the level of a definition we can give ourselves a bit of slack, and only refine later (i.e. in our guidance document) what are the possible "names" or identifiers and what level of mandatoriness we expect.

By the way I think what you outline in the second paragraph does not conflict with what is in https://w3c.github.io/dxwg/profiles/ (either as rather clean text or very drafty notes).

@aisaac I would be fine with giving ourselves slack on names if it weren't that the primary statement that is made in the definition is "A named set ...." That's pretty definitive. If the definition read "A set ..." and in the document we talked about optional and preferred naming, I would feel better about it. But the definition opens with "named" in a way that would make later "slack" very hard to pull off.

@kcoyle well my request for slack is for sorting out the detail later. I am fairly sure there is a need for a name, even though the type of name may be flexible (and I include "having a URI" among the options for "being named"). Otherwise how would you refer to the profile? I mean, having an anonymous profile sounds a bit pointless. All cases we have for profiles are somehow named.

It feels like we are at last getting to grips with this beast :-).

It seems the concept of specification is made more complex by the sense its something to conform to, and the use of it for documents that do more than specify things, or may contain multiple such specifications.

Simon was on the money when he explicitly modelled "conformanceClass" as the specifying component of a specification document - but thats not really a "mass market" accessible set of terminology.

In my mind I'm happy to think of the specification component of a specification document as a "profile" - so a specification document may have a mandatory profile and a recommended profile - or multiple discrete cases of each of these.

I think we cant fix use of "specification" as a noun (i.e. define it well enough) because of its conflicting but common usage senses - but as a verb it does describe a role well enough.

More than happy to try to explicitly model the idea of a specification document containing multiple profiles - the OGC example shows how a naming policy can be enforced - but that's a corner case against legacy documents.

I think we have a choice:
1) define a convention and mechanism for describing how "specification document" objects can be polymorphic (act as a mandatory profile and a recommendation profile and a guidance note and a set of examples etc) - nb I think this can already be handled in Profiles ontology using roles, but we may want to define role like "mandatoryCore" - so a profile can state it represents that part of a document.
2) limit scope to cases where the equivalent of "conformanceClass" is explicitly named - we just use those names as references - profiles or X ('specification' seems not to work)
3) specify a mechanism to provide names for the different components within a specification document and define how each named thing is related to the original document.
4) some other idea ?

Once we have consensus about the relationship of profile to "specification" I can model options to describe - but I feel that my original "computer scienc-y" view that each "conformanceClass" is naturally a profile of itself can be extended happily to "each specification document describes a number of possible profiles of the specification" - and we just need a single notion of profiles - as per conneg-by-ap (and consistent with @tombaker comments on this matter)

@rob-metalinkage just checking: is this comment really for this issue? As it mentions "documents", "specifications", "conformance classes" and "documents that contain multiple specifications", that makes me think of #978

I should have put the link to #978 into this issue - that UC is input into solving this issue - trying to untangle the conflated notions we have identified...

@rob-metalinkage it is still not too late: you could move your comment to #978 and replace it by a link?

no the comment belongs here - its all about the definition - which relates to what the thing is (i hope)

UPDATED PROPOSAL

  • Update 2019-07-10: added "combines" as per @pwin, "provides guidance" as per @aisaac

Taking into account the discussion above, here is an updated proposal for a definition of "profile" to use both in DCAT and CONNEG:

data profile

    A data specification that constrains, extends, 
    combines, or provides guidance or explanation 
    about the usage of other data specifications.

data specification

    A specification, with human- and/or
    machine-processable representations, that defines
    the content and structure of data used in a given
    context.

As per @aisaac's concerns about the ambiguity of "annotates", I suggest we add something like: "In
this context, 'annotate' means 'to provide guidance or explanation about usage'."

By my count, circa eight out of the eleven active members in this discussion have expressed support for these definitions (or a variant thereof), either by upvoting in Github or by commenting on calls.

I agree with Antoine that the definition of "profile" should acknowledge and cite two other usages of "profile" in a data context -- JSON-LD "profiles" ("syntactic profiles") and "data profiling" (a data analysis activity) -- if only to say that they are out of scope.

See https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Jul/0201.html for further discussion

I was alerted to this discussion by @tombaker's mail kindly forwarded by Peter. I was surprised to see that the profile definition is indeed still a concern. I had already voiced a strong opinion against the definition of "named set of constraints on one or more identified base specifications", which non-related concepts like a programming language also satisfy (see https://www.w3.org/2017/dxwg/wiki/ProfileContext#Comments.2Fobjections).

I think the above definitions by @tombaker are indeed closer to what we want. But I have some concerns there regarding it being a recursive definition. I don't think the essence of a profile (there being defined as a data specification) is that it related to another specification. However, I see above that the notion of "profiling is about acknowledging a reuse" is strongly present.

FYI, this is what the IETF draft currently says:

In the context of this proposal, a profile is a description of the structural and/or semantic constraints of a group of documents.

The interesting difference being that the above definition also allows "extends" rather than "constrains"; however, I don't see this as a contradiction, as the IETF definition talks about constraints with regard to documents, not other profiles.

What matters from a purely technical perspective is that a profile constrains a document beyond a media type; other parts of the definition might indeed matter for usage, but not down at the negotiation level. So as long as there is no contradiction, all is fine.

@tombaker I would really push for replacing "annotates other specification" by "provides guidance or explanation about usage of other specifications". The addition is not very long and "annotates" can be really confusing.

@RubenVerborgh the problem you see with the recursive definition is one of naming. We agree that we need two things, one for specs based on other specs and one for specs that could be self-standing (https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Jun/0106.html). I was rather in favour of naming "data profiles" the latter, which I believe matches your "I don't think the essence of a profile (there being defined as a data specification) is that it related to another specification".But it seems that more persons are in favour of keeping "profile" for the "recursive" case.
But well again it's a mere issue of naming, and I'd personally be ok changing that in the coming months if we discover something new. What's important to me is the split in two definitions, which captures the main divide we've got in our specs at the moment.

Thanks, @aisaac. I don't mind that strongly; I am only involved with the IETF part, and as long as we have compatibility (IETF will be more generic), then all is good.

@rob-metalinkage I think I'm still a bit reluctant including your comment in the base definition. It's relevant, but for me it's the next step. I.e. we first agree on something short and then we expand on the details of how profiles can exist, what their components (documents) are and how they work together. What I have in mind is the current structure of the Profile Guidance draft, where the base definition is in section 1 and then sections 2 3 and 4 dive into further characterization of profiles, which would include matter from #978 (which is why I'm so interested in having all your thoughts on #978 attached to that issue and not disseminated into the one here).

I want to include into @tombaker proposal the word "combines" as first mentioned by @andrea-perego in https://github.com/w3c/dxwg/issues/963#issuecomment-508577664 as many APs such as CPSV-AP etc are just that. they don't constrain, extend, etc. They just aggregate a few building blocks defined elsewhere.

@pwin Since I was the only one who had yet updated my proposal (above), I have added "combines".
@aisaac Fine with me to replace "annotate" with something more explicit.

I have amended the proposal.

https://www.w3.org/2002/09/wbs/99375/profile-def/

Please can colleague vote on this

@pwin , I confess it is not clear to me which is (are) the definition(s) we are voting on.

Can we have it/them explicitly included here in a specific comment?

@andrea-perego
The definition is in 2 parts, but is a single definition;

data profile

A data specification that constrains, extends, 
combines, or provides guidance or explanation 
about the usage of other data specifications.

data specification

A specification, with human- and/or
machine-processable representations, that defines
the content and structure of data used in a given
context.

The vote is a simple Yes/No to the question, are you happy that we use this for conneg and dcat

I have to say that I really like the essence of this clear, simple definition from the IETF document:

In the context of this proposal, a profile is a description of the structural and/or semantic constraints of a group of documents.

I'm a bit unclear on the "of a group of documents", but I like the "structural and/or semantic".

Also, I'd be happy to shorten the statement "or provides guidance or explanation about the usage of" in Tom's proposal because the "or"s there saddle us with some language ambiguity: "(or provides guidance or explanation) about the usage" or "(or provides guidance or (explanation about the usage)".

If we approve the definition, I'd like to have a chance to word-smith to make sure that it is clear. My gut tells me that we don't need both guidance and explanation in there, so we could clean that up easily.

I'm a bit unclear on the "of a group of documents"

Yes… the intent was "certain documents". Will change that.

but I like the "structural and/or semantic".

This was specifically meant to contrast with media types, which additionally provide syntactic constraints.

@RubenVerborgh I kinda like the IETF definition too, but I'm also unclear on "of a group of documents" because the wording seems to support two quite different interpretations:

  • constraints are somehow contained in a group of documents
  • there are some constraints that apply to a group of documents.

In any case, the term "document" is both very broad and very specific. It is specific because it implies "file", but it is broad because what print or digital file is _not_ a "document"? I take "group of documents" to refer to what we have been calling a "specification", which might consist of a family of related expressions of, say, a vocabulary, e.g. in RDF and PDF.

@kcoyle Good catch re: the ambiguity of "provides (guidance or explanation) about the usage" versus "provides guidance or (explanation about the usage)".

The IETF definition is now:

a profile is a description of structural and/or semantic constraints documents can conform to in addition to the syntactical interpretation provided by more generic MIME types.


It is specific because it implies "file"

It actually means "representation" to me, in the REST sense of the word, but I thought that would be too specific.

I take "group of documents" to refer to what we have been calling a "specification"

That was intended as "certain documents that conform to the specification"; clarified now.

My reading of that IETF definition is that it defines 'profile' to only apply to profiles of MIME types-- certainly a narrower scope that our discussions here.
On close parsing: _a profile is a DESCRIPTION of constraints {on documents that have MIME types}._ Use of 'CAN' is not standard specification language; my reading would interpret CAN to denote 'possible', not permitted (MAY), recommended (SHOULD) or required (MUST).

I don't think it's suitable for our purposes.

My reading of that IETF definition is that it defines 'profile' to only apply to profiles of MIME types

That's not what it says though.

my reading would interpret CAN to denote 'possible'

Actually, yes. It is possible that documents conform to a profile. If we change the _can_ into a MAY, then the constraints seemingly becomes optional. Let me see if I can further refine.

I think I can live with this - (its two definitions - but it has proven necessary to state that there is a special case of specification that relates to "data used in a given context" - this seems to match OK with "certain documents" in the IETF version - and propose a useful clarification.

The only problem is that it may still be too hard to see how the (undefined) term "usage" resolves the ambiguity between re-use and profiling - you need to go through to the end of the data specification definition to the "data in a given context" and successfully infer that therefore a profile must also be constraining the same "given context". IMHO it would be useful to state this up front so the definition implications are clear, and vocab re-use is not swept up in a too broad definition of profiles.

I therefor suggest something along the lines of

A data specification that constrains, extends,
combines, or provides guidance or explanation about the usage other data specifications whereby the "given context" or the profile remains compliant with these "base" specifications.

without this explicit clause it will be too hard for the definitions to be interpreted as a whole, and the ambiguity of the word "usage" is too great - would i be using something if I made a statement like?

:myClass owl:disjointWith you:yourClass

(yes - i'd be using it to clarify semantics of the thing i was defining - but i'm not profiling it in any sense)

(it would be easier if we had a definition equivalent to "conformanceClass" that we could make simple statements about, but I dont beleive we have identified an acceptable term for this "given context"

What is meant with "a description of" ?

@rob-metalinkage

to the "data in a given context" and successfully infer that therefore a profile must also be constraining the same "given context".

I do not believe that this is an inference that can be made from Tom's definitions. The definition of specification is not about A specification, but specifications in general. The definition of profile is that it IS a specification. There aren't two different things here so there's no "same" to be read into it.

Thats not the point at all. Its the "context" that must be consistent with the base specification. Of course context is an undefined term introduced by this definition but i am happy enough with it, except that other people dont seem to understand that its the important part of the functional requirements for profiles.. i dont think "conformance target" is great.. but at least it highlights its an important concept.

Also its a syllogism to impute that what can be inferred from an instance because of the definition of the class is not about that definition...

@rob-metalinkage My point was that there is no "base specification" in Tom's definition. The relationship between specification and profile is an IS A relationship: a profile is a kind of specification, like a dog is a kind of mammal. There is no "specification" that a profile is a profile of, at least not in that definition. Whether that concept is included in, say, the profiles guidance document is not excluded, but the definition here is unrelated to that, and therefore there is nothing that must be "consistent". Specification is a class, not an instance, and a profile is a kind of specification, and is consistent with the definition of specification but not with any instance of specification.

The relationship between specification and profile is an IS A relationship
As per PROF so far:

:Profile rdf:type owl:Class ;
         rdfs:subClassOf dct:Standard ;

So yes, every prof:Profile IS A dct:Standard but also:

:isProfileOf rdf:type owl:ObjectProperty ;
           rdfs:domain :Profile ;
           rdfs:range dct:Standard ;

And it's this that really makes prof:Profile something a bit more than just a rewrite of the dct:Standard class. Profiles really are a profile of something, even if that profiling is trivial (a Standard being a profile of itself).

Don't forget the expected use of:

<something> dct:conformsTo <Profile_Y> .

So things can conform to profiles, identified by some URI.

Am I right in asserting that the above, very long discussion, won't require any changes to these core concepts in PROF?

The global constraint is not enough. It merely says 'if there is a prof:isProfileOf relationship then it is from a prof:Profile to a prof:Standard'. But it does not require that the relation be present in every instance. To achieve that you also need a local constraint on the prof:Profile class

prof:Profile
  rdfs:subClassOf [
      rdf:type owl:Restriction ;
      owl:minCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty prof:isProfileOf ;
    ] ;
.

Aside: IS A is ambiguous. I assume you mean 'is a kind of', rather than 'is an instance of'

@dr-shorthair scripsit:

To achieve that you also need a local constraint on the prof:Profile class

Or you create a profile and declare the constraint using SHACL:

ex:ProfileShape
    a sh:NodeShape ;
    sh:targetClass prof:Profile;    # Applies to all instances of prof:Profile
    sh:property [                 
        sh:path prof:isProfileOf ;
        sh:minCount 1 ;
        sh:class dct:Standard ;
    ] .
.

If you don't have a canonical URI or even artefact, for the dependency, you can always describe it, in a blank-node if necessary:

<> a prof:Profile ;
   prof:isProfileOf  [ a prof:Standard ;
      dct:description "lots of words here if necessary" ;
   ] .

If you want to define 'profile' using a formal, RDF-based notation then you should follow it through all the way.

https://stackoverflow.com/questions/2218937/has-a-is-a-terminology-in-object-oriented-language
http://java8.in/is-a-relationship-and-has-a-relationship/

IS A and HAS A are fairly commonly used both in O-O and in the preceding
philosophical branch. It refers to typing, and therefore is a class
relationship. I don't know of an equivalent that indicates "instance of"
although now that I think about it, "instance of" sounds ambiguous to me
(it would have to be an IS A relationship). I'm going to ponder that one
for a bit.

kc

On 7/14/19 10:10 PM, Simon Cox wrote:

The global constraint is not enough. It merely says 'if there is a
|prof:isProfileOf| relationship then it is from a |prof:Profile| to a
|prof:Standard|. But it does not require that the relation be present in
an instance. To achieve that you also need a local constraint on the
|prof:Profile| class

|prof:Profile rdfs:subClassOf [ rdf:type owl:Restriction ;
owl:minCardinality "1"^^xsd:nonNegativeInteger ; owl:onProperty
prof:isProfileOf ; ] ; .. |

Aside: IS A is ambiguous. I assume you mean 'is a kind of', rather than
'is an instance of'

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/w3c/dxwg/issues/963?email_source=notifications&email_token=AAL53YLURLZZ452V4Y3LTHLP7QBEJA5CNFSM4H36FLV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ4XHSI#issuecomment-511275977,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAL53YPK25NSEQRXIDBK5RDP7QBEJANCNFSM4H36FLVQ.

--
Karen Coyle
[email protected] http://kcoyle.net
skype: kcoylenet

@nicholascar scripsit:

Am I right in asserting that the above, very long discussion, won't require any changes to these core concepts in PROF?

As I see it, yes: Every profile is a kind of standard but not every standard a kind of profile, so the axiom holds:

:Profile rdf:type owl:Class ;
         rdfs:subClassOf dct:Standard ;

And yes, given

:isProfileOf rdf:type owl:ObjectProperty ;
           rdfs:domain :Profile ;
           rdfs:range dct:Standard ;

a standard can be a (trivial) profile of itself.
So

:aSomething dct:conformsTo :aProfile .
:aProfile a prof:Profile ;
    prof:isProfileOf :aStandard .

also means that aSomething conforms to :aStandard.
(aside: This brings up the question if prof:isProfileOf should be declared transitive and also if there is a rule (:a dct:conformsTo :b , :b prof:isProfileOf :c) -> :a dct:conformsTo :c)

@nicholascar yes I believe that all these long discussions don't change what we've tried to specify for some months already. Also the profiles use cases and the profile guidance draft won't be impacted that much I believe.

I'm not a big fan of profiles being profiles of themselves, but as I see it in the formal definitions floated around this is merely a possibility not a duty so I can live with that :-)

@dr-shorthair @larsgsvensson yes I guess we can either define the Profile class with OWL and/or SHACL the way you've done it or similar, and that should end in the PROF ontology. Perhaps it's already there in fact. Btw just checking: your two definitions are equivalent, aren't they?

@aisaac scripsit:

Btw just checking: your two definitions are equivalent, aren't they?

Yes, I think they are.

Decision (13 for, 1 against) in poll. Final text is:

Data Profile

A data specification that constrains, extends, combines, or provides guidance or explanation about the usage of other data specifications.

Data Specification

A specification, with human- and/or machine-processable representations, that defines the content and structure of data used in a given context.

@kcoyle not sure we can put it due for closing. PROF is ok wrt having this definition represented, but Profile Guidance has not been updated yet. Maybe we can just remove the PROF label so that it doesn't stand in the way of PROF?

@aisaac better not to remove the label as we’ll loose the association so marked the “prof-due-for-closing” so the full “due-for-closing” can be added when Profile Guidance gets there.

This was to discuss the counter-proposal that Rob added to the poll on the definition. As the results of the poll were 13 for the definition based on Tom's definition, and 1 against (Rob), I think we can close this, with the assumption that Profile Guidance will use the definition that 13 members voted for, not the counter-proposal.

@kcoyle yes this is probably simpler. As a matter of fact I've just worked the new definition in the document that I (still) want to use to prepare for the next updates on Profile Guidance (https://docs.google.com/document/d/1Y4jP4SGZMnt63EpjTX11-hW6-3mxlaq1i-Lbiw4tx1M/)

Decision (13 for, 1 against) in poll. Alternate text has not gathered adherents. Therefore this issue to discuss the alternate text can be closed. Documents will use the final text, which is:

Data Profile

A data specification that constrains, extends, combines, or provides guidance or explanation about the usage of other data specifications.

Data Specification

A specification, with human- and/or machine-processable representations, that defines the content and structure of data used in a given context.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

riccardoAlbertoni picture riccardoAlbertoni  Â·  4Comments

lvdbrink picture lvdbrink  Â·  6Comments

davebrowning picture davebrowning  Â·  7Comments

andrea-perego picture andrea-perego  Â·  6Comments

chris-little picture chris-little  Â·  5Comments