Schemaorg: Provide HTTPS version of JSON-LD context document

Created on 25 Sep 2020  路  5Comments  路  Source: schemaorg/schemaorg

Currently, via a Link header, JSON-LD clients when requesting https://schema.org (e.g. from the @context field) access https://schema.org/docs/jsonldcontext.jsonld.

Users can also explicitly specify https://schema.org/docs/jsonldcontext.jsonld as their @context.

However, the aforementioned jsonldcontext.jsonld only defines every term using the http:// protocol prefix. Some clients and applications would rather exclusively use https:// versions, for example those in https://schema.org/version/latest/schemaorg-current-https.jsonld.

Although we probably should prefer the behavior that https://schema.org links to the HTTP version athttps://schema.org/docs/jsonldcontext.jsonld, adding a separate jsonld file: maybe jsonldcontext-https.jsonld that users can explicitly set as their context would be helpful.

Most helpful comment

If one is using the jsonldcontext provided and a tool like RDF-dereference, then that library will try to fetch metadata about classes and properties by visiting the fully expanded IRI, which becomes an HTTP, not HTTPS URL because the context file uses an HTTP prefix.

The main use-case is zero-knowledge clients who request the context and then request metadata about the node types by following their IRIs, instead of having to have pre-installed vocabulary files. This is described more in Tim Berners-Lee's original note on Linked Data.

For these clients it would be good to have an option of using only HTTPS IRIs from the context.

All 5 comments

I am a little confused as to your use case here.

The "schema": "http://schema.org/", line in the jsonldcontext file is there as a prefix to identifiers defined in the file. Their purpose is not, as such, to be web addresses.

If you are looking for detailed information about the structure of the vocabulary I would recommend using the one of the vocabulary definition files which are designed for that purpose.

If one is using the jsonldcontext provided and a tool like RDF-dereference, then that library will try to fetch metadata about classes and properties by visiting the fully expanded IRI, which becomes an HTTP, not HTTPS URL because the context file uses an HTTP prefix.

The main use-case is zero-knowledge clients who request the context and then request metadata about the node types by following their IRIs, instead of having to have pre-installed vocabulary files. This is described more in Tim Berners-Lee's original note on Linked Data.

For these clients it would be good to have an option of using only HTTPS IRIs from the context.

Although I am sympathetic to your request, and TBLs Linked Data principles, I cannot see that much can be done at this point, for a couple of reasons.

Firstly, the purpose of the jsonldcontext file is to provide context (mapping of shortcut terms to IRIs) within a JSON-LD document. Although all terms within the vocabulary are referenced within this file, it is not intended as a downloadable definition of those terms. That purpose is satisfied by the downloadable Vocabulary Definition Files where most efforts to account for vocabulary coverage and http/https representation have been concentrated.

Secondly, and in this case the key reason, is the way that normal access to the context file is provided. That being a response to request to https//schema.org, ether directly or via a redirect from http://schema.org, will contain the following Link headers:

Access-Control-Expose-Headers: Link
Link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

This is as per the JSON-LD 1.1 specification

For robustness and performance reasons the Schema.org site consists of a prebuilt set of static pages, with no processing of requests. This means, using link headers, only one version of the context file can be referenced and therefore only one is created.

As indicated, with the current architecture of the site, it would be difficult to satisfy your request without potentially compromising robustness and performance. To that end I am closing this issue whilst noting its contents for a time if or when the architecture of the site is reviewed again.

This means, using link headers, only one version of the context file can be referenced and therefore only one is created.

I think that's perfectly fine.

it would be difficult to satisfy your request without potentially compromising robustness and performance.

I don't believe that's the case.

As I mentioned in the first comment:

Users can also explicitly specify https://schema.org/docs/jsonldcontext.jsonld as their @context.

Although we probably should prefer the behavior that https://schema.org links to the HTTP version athttps://schema.org/docs/jsonldcontext.jsonld, adding a separate jsonld file: maybe jsonldcontext-https.jsonld that users can explicitly set as their context would be helpful.

My request is simply to add another static file, jsonldcontext-https.jsonld, that could be hosted at any path, although https://schema.org/docs/jsonldcontext-https.jsonld makes sense, that users could then reference explicitly with a "@context": "https://schema.org/docs/jsonldcontext-https.jsonld" field in their JSON-LD document. This may be somewhat of an anti-pattern to discovering the context from a Link header, but it is perfectly legal via the spec, and if users rely on dereferencing the vocabulary via HTTPS, it may be worthwhile for them.

As its simply another file to be generated at "compile/render-time" when the rest of the static files are generated, I don't see how it would have any negative impact on the site.

The only downside would be maintaining it and generating it properly on each static render of the site, but that seems like a small issue. I suspect it could even be achieved with cat jsonldcontext.jsonld | sed "/s/https://schema.org/http://schema.org" I also believe that enough users currently or would like to dereference the terms in this way such that it warrants being added as a feature.

@alexkreidler Producing such a file is not the issue. As you point out it probably could be achieved with a simple sed script, which you could even run locally.

My concern is proliferating non-standard routes to partial views of the data that define the vocabulary, therefore introducing dependancies that could easily break in the future. This especially when there are already supported routes to the same data in a more comprehensive format.

Based upon the JSON-LD 1.1 specification, we provide a standard way for applications to obtain a link to the JSON-LD context document by the use of Link Headers. This approach insulates the provision of the document from any future architectural changes to the underlying Schema.org code and site.

Obviously anyone can look into where the link is pointing, bypass the interpretation and request that file URL directly. However, that route is not ideal as it could easily break in the future. What you are suggesting would build upon that not ideal temporary solution.

If as you imply this is a significant problem for many users, who cannot make use of the currently available vocabulary definition files, their concerns should be fed into the broader issue around moving the canonical URI base from http: to https:.

Was this page helpful?
0 / 5 - 0 ratings