Today, creating a cross reference in ReSpec requires Editors to manually search for the id of a term by going into another spec and copying the URL. This means Editors that Editors need to write, For example:
<a data-cite="HTML/webappapis.html#eventhandler">EventHandler</a>
For most Editors, this can be quite labor intensive and error prone. Also, the id being referenced might be removed or changed, causing cross references to break.
Ultimately, spec Editors just should be able to just write markup:
<a>EventHandler</a>
And ReSpec, together with some hints, should know what the user means (i.e., the EventHandler typedef, as defined in the HTML specification).
This project aims to make it easy for Editors to link terms defined in other specs. ReSpec should be able to understand itself where EventHandler is defined and automatically add the link (and citation of the linked spec) to it.
This feature will extend the present data-cite attribute. In the beginning, this feature will link to dfn and IDL fragments only. (later stages may support headings and other components).
In case ReSpec is not able to find a reference, or if there is some ambiguity, it will inform Editors so they can take appropriate action.
<a>EventHandler</a>
<!-- really mean the IDL -->
<a data-lt="EventHandler">event handler</a>
<!-- all references are normative by default. -->
<!-- explicit non-normative references: -->
<a class="informative">EventHandler</a>
<a class="informative" data-lt="EventHandler">event handler</a>
<!-- shorthand syntax. (in future) -->
The {{EventHandler}} typedef.
One should keep in mind that EventHandler (type = typedef) is not as same as event handler (type = dfn).
The above will also be applicable to <dfn> (except the shorthand syntax).
All references are normative by default, unless used inside some non-normative parent (like being nested in some .note, .ednote, figure, .example) or having .informative themselves.
In case of ambiguities, the spec writer should be able to resolve it by providing additional information.
<a>URL parser</a>
<!--
is ambiguous, as defined at:
- https://www.w3.org/TR/appmanifest/#dfn-parse
- https://html.spec.whatwg.org/multipage/infrastructure.html#url-parser
- https://url.spec.whatwg.org/#concept-url-parser
-->
The following markup may be used, to provide additional information:
<!-- following are equivalent in terms of what the resultant link is -->
<!-- each says: look for "URL parser" in spec with shortname `url` -->
<a data-cite="url">URL parser</a>
<a data-cite="url" data-lt="URL parser">URL parsing</a>
<p data-cite="url">
<a>URL parser</a> <!-- unless a local dfn for URL parser exists -->
</p>
<!-- overall markup pattern -->
<a>TERM</a>
<a data-cite="SPEC">TERM</a>
<a data-cite="SPEC" data-lt="TERM">ALT TERM</a>
data-cite to provide more informationThe data-cite attribute can be defined in following ways:
(_in order of increasing precedence (locality) and decreasing risk of ambiguity. each higher precedence overrides the lower precedence_)
body: a space separated list of specification ids (as per SpecRef or W3C Short Names for specs). The terms in entire spec will be searched in these specs. The risk of an unresolved ambiguity is maximum here.
<body data-cite="spec1 spec2 spec3">
section: a space separated list of short names. The terms in this section and its subsections will be resolved in these specs (unless an override is given in a subsection).
<section data-cite="spec1 spec2">
<a>TERM</a> is searched in spec1,spec2
but not in spec3 which is defined at a level of lower precedence.
</section>
That is, the data-cite can be defined at any level (<p>, <span> are also valid), with the most local data-cite overriding above. The closest parent will be used on the TERM.
a or dfn: a single specification short name in which the current element's term will be searched for. The risk of an ambiguity is minimum here.
<a data-cite="spec">TERM</a>
An empty data-cite shall be used to denote a local reference explicitly. Otherwise, all terms whose definitions couldn't be found locally shall be looked up externally. The closest empty [data-cite] ancestor be be considered for local references.
Let term be "request". It is defined in following specs as:
service-workers:dict-memberdfnwebusb:dict-memberfetch:dfninterfacedfnLet specs be value defined by closest parent's data-cite.
If specs is null, it is ambiguous.
If specs is "webusb", it is unambiguous.
If specs is "webusb service-workers", result is ambiguous (defined in both).
If specs is "service-workers", result is ambiguous (defined twice)
If specs is "fetch", result is ambiguous (defined thrice)
Only retrieve terms that have export attribute
For example, "utf-8 encode" is defined in https://encoding.spec.whatwg.org/#utf-8-encode and https://html.spec.whatwg.org/multipage/infrastructure.html#utf-8-encode but the encoding spec is the clearly rightful candidate here.
If we only query the export terms, we can reduce the chances of an ambiguity significantly (possibly altogether) .
Make use of spec from data-cite data
as explained above.
Current specs have higher precedence over snapshots
Make use IDL definitions
Consider "Request" being used as:
<pre class="idl">
partial interface Request {
// Other stuff
};
</pre>
<p>The <dfn data-cite="fetch">Request</dfn> object is defined in fetch.</p>
Here, request is in IDL (interface) (and not dfn). Hence, the ambiguity is resolved to https://fetch.spec.whatwg.org/#request.
In end, if an ambiguity can't be resolved, an error would be given. The author may fall-back to do a manual data-cite providing a hash like data-cite="spec#hash".
We need to send {term,specList} pairs for each term. The following format may be used for request:
POST /xrefs
Content-Type: application/json
Accept: application/json
{
"keys": [
{
"term": "foo bar",
"specs": ["spec1", "spec2"],
"types": ["dfn", "interface"]
},
{
"term": "baz"
}
]
}
Some requirements on API:
types belong to IDL_TYPES, then only the search should be case sensitive.specs and types are optional but recommended attributes for each term.linking_text attribute in Shepherd data should be treated as same as title attribute and be available for search as term.We expect a JSON response of the form:
{
"data": {
"baz": [
{ uri: "#baz", type, spec: "foo", for: [], normative: true }
],
"bar": [
{ uri: "webappapis.html#bar", type, spec: "html", for, normative },
{ uri, type, spec, for, normative }
],
"biz": []
}
}
<a>TERM</a> is found./xrefs with the TERM as term and spec from the elements closest parent's (or its own) data-cite attribute.data-cite will be converted to (or added as) data-cite=spec#uri. This will then be handled by ReSpec as is presently handled.IDL_TYPES, DFN_TYPESdata-for (example {{Event.preventDefault()}})scope field in Shepherd data?IDL_TYPES terms and case insensitive otherwise)With the web service set, we can create a search UI on top of it in ReSpec (similar to specref search interface).
One can search for a term (and optionally mention the specs in which to search for) and get a list of matched terms.
What would be cooler to have is - each term in result having a "copy" button which lets user copy the required markup to add that reference. This will provide an easier workflow in case there are ambiguities that can't be resolved by provided more information.
The list of terms defined externally should be auto-generated.
data-cite and data-lt in ReSpec.data-cite https://github.com/w3c/respec/pull/1723data-lt into account https://github.com/w3c/respec/pull/1736/<a data-cite="SPEC">TERM</a> is given, make use of it to convert to <a data-cite="SPEC#frag(TERM)">TERM</a> If it fails, fallback to SPEC only. https://github.com/w3c/respec/pull/1723<dfn> https://github.com/w3c/respec/pull/1733[[SPEC]] to get context for specs https://github.com/w3c/respec/pull/1751data-cite="webidl" to nearest <section> if it has pre.idldata-link-for https://github.com/w3c/respec/pull/1765{{foo}} to valid cross references. https://github.com/w3c/respec/pull/1765Abandoned:
Can we work with CSSWG people so that Shepherd can provide a filter API?
Or how about a command-line tool that inserts/updates cross-reference data from Shepherd into the target file?
Also, this is an interesting note... From SVGWG:
Shepherd is a test suite manager that includes issue tracking, etc. The functionality of Shepherd has been replaced by Github and the SVG WG will not be using Shepherd.
Edit: It seems the functionality only refers the issue tracking feature.
Can we work with CSSWG people so that Shepherd can provide a filter API?
Yep, we are already on it :) It might be we don't end up using Shepherd at all, but just BikeShed's data (which is based on Shepherd's data).
Or how about a command-line tool that inserts/updates cross-reference data from Shepherd into the target file?
We will see how to best contribute to BikeShed's data - and figure out how to best get ReSpec data into BikeShed's data.
BikeShed's data 1) still requires a web service as it's still BIG 2) is in its own format. I think getting raw Shepherd data in JSON will be easier then, especially when with cache.
Current plan is for @sidvishnoi to dig a bit deeper into the data, sizes, formats ... and into the problem itself. He is planing to have an full outline for us to review on the 1st of June (he is currently heads down doing his final exams 馃馃摎)
Related discussion https://github.com/tobie/specref/issues/467
Note that Shepherd already has all the anchor data in a MySQL database and it gets updated frequently throughout the day. It would be simple to add to Shepherd's existing API to only return specific anchor data for a given set of linking texts (e.g ReSpec could make a single http request with a list of all cross references for the current spec). No need to setup yet another database or try to scrape data that has already been scraped from the primary database.
I'd be happy to implement the Shepherd API side, just let me know what you need.
Thank you @plinss. I'll get back on this as soon as my exams are over and let you know :)
@plinss, @saschanaz, @sidvishnoi has updated the proposal. Would you mind having a look?
@plinss, we are going to try to build a prototype using static data first. In the proposal above, please see the "The Web Service (Shepherd)". We are going to try to work out exactly what fields we need, but would like your early input if something like we are proposing there is possible.
The proposed API looks fine, and should be easily doable. You'll probably also want to be able to send the anchor type information (and allow the client to specify the anchor type) to further reduce ambiguity. e.g. query for an element, vs an attribtute, vs ...
Also, I expect some queries will have a large number of search terms, you might want to have a POST method as well containing a JSON payload of search terms (and options)
POSTing JSON sounds like a great idea - probably better than the GET approach entirely. Much nicer grouping too. Any precedence for the data structure to send or should we roll our own?
Go ahead and roll your own data structure
Closing as the only task left in this is now at https://github.com/w3c/respec/issues/2560
Most helpful comment
Note that Shepherd already has all the anchor data in a MySQL database and it gets updated frequently throughout the day. It would be simple to add to Shepherd's existing API to only return specific anchor data for a given set of linking texts (e.g ReSpec could make a single http request with a list of all cross references for the current spec). No need to setup yet another database or try to scrape data that has already been scraped from the primary database.
I'd be happy to implement the Shepherd API side, just let me know what you need.