Go: proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format.

Created on 3 Dec 2019  ·  13Comments  ·  Source: golang/go

Proposal: GORDO enriched Go documentation format.

Author: Ohir Ripe [Wojciech S. Czarnecki]

Last updated: 2019/01/24

Discussion at https://golang.org/issue/35947

Related to: #7873, #16666, #35896, #18342, #25444 and other "rich format please" issues.

Abstract

_GORDO_ (dʒɔrˈdo) stands for GO Rich DOcs

This proposal is a try to make godoc ecosystem robust enough to be a single documentation method that can serve also end-user programs and production services.

Background

Current state of Go's source documentation processing is good enough for documenting single _implemented things_, ie. functions, variables, constants. It falls short if one must convey a new idea, an unobvious implementation of an algorithm, or even just describe a sequence of events (no lists, sadly).

Godoc heuristic does not allow to keep _overall_ (_package_) docs close to the source, as parts of docs from different files are merged in the lexical order of the source filenames. This makes almost impossible to document a chunk of API in the very file that defines it. (_This proposal tackles this with "refid" identifiers that can be put on documentation parts then used to provide merging order and in-text references._)

Proposal

I propose using a lightweight annotations that allow plain text documentation to have styling and structure hints added by the author. Gordo annotations use 11 non-ascii characters that can be entered as ascii digraphs led by a semicolon:

 ┌───────────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬───┐
  character:    ˘    ´    ¨    ˉ    °    «    »    þ    ¶    §    •  esc
    digraph:   ;b   ;/   ;'   ;-   ;.   ;[   ;]   ;t   ;p   ;s   ;l   ;;
 └───────────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴───┘
 ```
>(_Users accustomed to chords may configure translation via a GORDOIC environment variable. See previous revisions for elaborate description of avaliable entry methods_.)

Translation is done by the `gofmt`, then `godoc` recognizes and interprets these 11 characters according to specification laid out hereafter. 


### styling

° degree °escape || back to normal aka "dismiss" char
´ acute ´italics´ ´italics° 𝑖𝑡𝑎𝑙𝑖𝑐𝑠
¨ diaeresis ¨bold¨ ¨bold° 𝐛𝐨𝐥𝐝
˘ breve ˘ibold˘ ˘bold+italics° 𝒃𝒐𝒍𝒅-𝒊𝒕𝒂𝒍𝒊𝒄𝒔
ˉ macron ˉfixedˉ ˉfixedˉ fixed width span
«» guillemets «notable or related text» p͜a͜y͜ ͜a͜t͜t͜e͜n͜t͜i͜o͜n͜ span

An emphasis (styled text) begins after either acute, diaeresis, or breve character - none followed by a degree - and ends at a breve, acute, or diaeresis of the other emphasis' start, or this emphasis stop. It ends also at a macron, at a left guillemet, or at a degree "dismiss" character. The 'fixed' and 'notable' spans begin and end only with their respective special characters so other three emphases can be used inside.  An empty line ends all running emphases and spans.

> Editing software may apply styles while keeping the syntax visible.  In the final form a style is applied and syntax characters are hidden.


### accessibility

For the screen-readers usage document author can make a style to convey a semantics hint.
Aria labels are introduced in the form of a short list with items starting at bullet-style digraphs.

In this document styles mean:

•´ cited from other text´
•¨ endpoint name¨
•˘ call parameter˘
•ˉ codeˉ

> Seeing users will see this rendered as bulleted list with styled items, not-seeing will hear either a label text or audible hint when reader enters into labelled region. Note that regions are marked **in the source**, hence accessibility tools will be more useful at the terminal, too.

### the refid

A short string identifier that can be attached to a section, paragraph, or _quotable_ span:

§ section quotable section head §(refid)
¶ pilcrow quotable paragraph lead ¶(refid)
» rguillemet « quotable span »(refid)

Refid strings are used to identify parts of the main documentation that can then be referenced elsewhere. Refid tagged part can then be quoted, linked to (in html output), and searched for by the `go doc` tool. Refids should not resemble godoc-searchable identifiers of the package's code, as `go doc` tool should allow to display a part of documentation pointed to by a refid. Refids should be short but informative.

### structure

«' lguillemet quote here a text span, heading or item:
«'refid' 'quote in apostrophes'
«"refid" "quote in double quotes"
«(refid) use no quote characters

The `«"refid"` _quote an internal link_ token always outputs its target's text put between quotation marks as seen after the `«`, or without if parenthesized `«(refid)` form was used. Console output always prints the refid in parentheses after the quotation, *Html* version outputs quoted text as a link to the place of origin instead.  Eg. the source of:

Annolex Editor  §(Sect 2)
... Please read «"Sect 2" for the primer.

should output on the console:

Annolex Editor (Sect 2)
... Please read "Annolex Editor" (Sect 2) for the primer.

but in html it is expected to output a link:

✻ Annolex Editor
... Please read "͟A͟n͟n͟o͟l͟e͟x͟ ͟E͟d͟i͟t͟o͟r" for the primer.
### lists

• bullet • bulleted list item
•a a) lettered list item
•1 1. numbered list item
þ thorn see: link/url list item

 + List items need to be given without blank lines inbetween.
 + List ends at an empty line as any other gordo introduced styling.
 + List items are recognized as such even if user-indented.
 + Console output imposes uniform indentation of lists.
 + Gofmt may impose uniform indentation of consecutive list items in the source.
   (_Other gordo processors may allow for nesting though_).
 + List item start (bullet or thorn) is recognized as such only if placed as the first printable in a line and followed by a space.


### external links

»þ « link description »þ // text description of
þ somesite.tld/path/tolink // an url listed below

External links are introduced via the **«** note ending in a **»þ** digraph.  The url path — without protocol — must be given as an url list item (**þ**) in the last line of the paragraph. This line can be indented.  Up to three **»þ** references can be present in a single paragraph, then all their respective url paths are given in separate lines below:

in our «IEEE-ITSS Open Journal »þ and also on « our faculty »þ site.
þ www.ieee-itss.org/oj-its
þ www.ivt.ethz.ch

The final form of the output, including hypertext protocol used, is defined by the gordo processor. This specification only mandates that the plain text renderer — if used at all — removes gordo special characters and any superfluous space left after this removal — including spaces following the **«** of notable or link description span. Also, links rendered under the sentence should be given numerical indice and be prefixed with protocol:

in our IEEE-ITSS Open Journal¹ and also on our faculty² site.
¹ https://www.ieee-itss.org/oj-its
² https://www.ivt.ethz.ch

> Gordo processor can be configured on public www sites to render external links as indexed plain text urls to prevent link-spam.

### table of contents, in order

Manual TOC is introduced either by a heading that starts with the "TOC" string, or one that have the "toc" refid set:

TOC — Table of Contents
Sisällysluettelo §(toc)

Manual TOC entries, in the form of  •§ or •¶ digraphs follwed by a refid, are used to provide a display order. This allow documentation parts to be written close to the relevant code. Any section or paragraph not listed in a manual TOC is added at the end of generated TOC under the "Misc" top level heading.

•§ refid // a section head, at the main level
•¶ refid // a paragraph lead, at a subsection level
•¶ "with spaces" // use quotes if refid contains space

The rest of the line after refid is __reserved__ for documentation housekeeping.

TOC list needs not to be consecutive. It is ok to have subheadings or even a paragraphs of text between parts of the list. (_Eg. to have TOC divided by "experimental", "staged", "stable", and "deprecated" headings. Then docs maintainer may simply move a toc line between sections to mark its current
status_.)

#### The TOC imposing order on dispersed chunks of documentation is the crux of this proposal

> With this implemented a documentation maintainer can be a separate role, and her edits go to the single file while many individual developers may write docs for their code only. Structure, distinguished spans and refids all are means for that ultimate goal.  Styling is just a useful byproduct. One that completes the professional documentation process.

### docs housekeeping

> _This should be a subject of other proposal but is provided here to explain reserved space of the toc-line._

During gofmt processing of the file that contains the TOC, toc lines are amended with a relative path to the file where refid was declared, a hash of code, and hash of related doc-comment. These hashes and paths are then checked by the **local** godoc instance. If (computed now) hash of code does not match one in the toc, **and** (computed now) hash of the doc-comment still matches, it is a strong signal that documentation diverged from the code (code was edited but its documentation was not). Generated output may then inform reader that documentation is possibly outdated.

### toc-bar

A lone section heading with refid of "toc-bar C" will output (html) TOC as a block separated by the character C. Eg. §(toc-block ⬩) for this document would produce:

> ⬩ [Abstract](#abstract) ⬩ [Background](#background) ⬩ [Proposal](#proposal) ⬩ [Rationale](#rationale) ⬩ [Compatibility](#compatibility) ⬩ [Implementation](#implementation) ⬩ [Open issues](#issues) ⬩ [post scriptum](#ps) ⬩

Order of the bar items is set by the §(toc) section.

### console -toc

TOC and "toc-bar" sections are elided from the `go doc -all` tool output.  The separate `-toc` flag lists all refids, and these refids can be used to select appropriate part of main documentation to show.  Refids of places normally are printed in parentheses on the console, so user can follow them in the next invocation of `go doc` tool.  Where output format allows for hypertext (linking), the manual TOC entries should be displayed though.


### escapes

 + Doubled semicolon lead is always translated to a single dismiss that
  immediately disables translation of a next digraph:
   `;;;; => °;;`,  `;;;. => °;.`
 + Any special character doubled is ordinary: `As bolded ¨under 20°°C¨`
 + One or more special characters following a dismiss character are ordinary:
   `single macron: °ˉ`, `a digraph °»þ`, or `superiors °¹²³`.
 + The "escape" function of dismiss character has higher priority than "end of style":
   `¨bolded °«¶ digraph¨`
 + Degree character that has nothing to dismiss or escape is ordinary.
 + Degree character does not output if it has already been used to dismiss or escape.

Of **all possible** gordo "specials":

° ´ ¨ ˘ ˉ « » • þ ¶ § ´ ¨ ˘ ˉ • þ
;. ;/ ;' ;b ;- ;[ ;] ;l ;t ;p ;s ¹ ² ³ ¦ ¤ …
•1 •a •¶ •§ «' «( «" «. »þ ¶( §( •´ •¨ •˘ •ˉ »(
```
only guillemots, and superior numbers must be escaped, and _degree_ — if styled. Other escapes are unlikely to be needed except for gordo-related docs.

Items of • ¤ … þ need escape only if are first, and are followed by a space. Section and paragraph out of their digraphs are ordinary. The Icelandic þ never may come before space, and the Old English script is not common in technical docs. Nor gordo digraphs are used in natural languages. None ascii digraphs are of valid Go code, too. It leaves: the styled degree, guillemots, and superior numbers ¹²³.

The __«.__ digraph itself is an escape for a _notable_ span that must start with one of __"'(__. Use two dots for span that should begin with a dot: «.. dot leaded notable span».

Rationale

Documentation that can be styled even with only bold and italics, and one that can be structured to fit the domain, may help package authors to be more precise and unambigous, and help documentation consumers to avoid misunderstandings. Now Go packages of just middle complexity often resort to external descriptions of their algorythms and api.

Not because their authors love to use yet other doc tools and are eager to do chores with keeping it synchronized. It is for the (lack of) godoc capabilities that restrict godoc uses to the standard libs. Or at best to the general-purpose Go libraries consumed by other Go code. Just for a lack of rudimentary emphases godoc-compliant __documentation sources cannot be used__ to create user-facing documentation if said user is expected to be not a Go programmer.

This needs to change, as Go now is used to build really huge systems. End-users — admins and api-consuming developers — need documentation that is easy to browse and reflects all changes made to the just staged product.

Gordo allows _package level_ documentation to be kept close to the code it describes and gives the author more control as to its shape and placement of its parts. This should ease us to maintain a well structured documentation being placed at the most relevant file and updated as related code changes.

Compatibility

Gordo uses no semantic constructs that can be mistaken for a technical text written in any language — neither natural nor formal. Out of all gordo "specials" only a few seldomly used non-ascii characters — degree, guillemots and three superscript numbers — may need to be escaped.

Nonetheless, as this proposal extends documentation source syntax, and this syntax parsing methods, there is a miniscule but non-zero possibility that gordo translation step may alter the visible html output of some existing documentation.

Even if this would happen, such a change would likely effect in the font decoration or size and would not affect the meaning.

Implementation

Enabling gordo annotations would need support from both gofmt and godoc. While implementation of basic formatting could be trivial, the real power of the proposed format and methods lie in the ability to make documentation both easy to skim at console and useable as an interactive manual in the browser. The last one needs working internal links between "quotable" and "quote" places implemented as well. Implementing this might need more resources, as implementing the toc-based documentation checks might too. But this work may benefit Go ecosystem as a whole and allow us to keep a single source of truth for both external (eg. grpc) api and for the code implementing it.

post scriptum

Someone whom I respect confessed recently:

_I remember thinking that changing fmt.printf to fmt.Printf in my code was ugly, or at least jarring: to me, fmt.Printf didn’t look like Go, at least not the Go I had been writing. [...] I got used to it, and now it is fmt.printf that doesn’t look like Go to me._

Gordo may look unusual at first sight but I hope for its syntax to be regarded comfortable soon. Unlike styling syntax of _markdown_, and other markups used only to generate html, gordo stylings are barely noticeable in source, unless reader is wilfully scanning for the formatting hints. Structure annotations converse: are concise but stand out on the console.


Revisions

  • r2 [16 December 2019]

    • make ¹²³ as styling surrogates with default GORDOIC=us map enabling many

      national layouts' users to type gordo styling without learning new chords.

    • fix section/paragraph swap (US/EU differences kicked in)

    • explain that authors need almost no characters escaping

    • escape by prefix, so parser need not to look back

    • add unix xmodmap for us-ansi layout users

    • explain functionality of a toc section

    • degree is a _dismiss_ by itself now

    • more elaborate Rationale

    • concise chords table

    • post scriptum added



      • r3 [23 January 2020]



    • Add accessibility section (related: #36685, #22171)



      • r4 [24 January 2020]



    • Promote ascii digraphs to be a main entry method.

    • Remove most of the text related to entry methods and keyboard.

    • Add stress to the "ordering by toc" importance


Proposal Tools

Most helpful comment

Go docs are meant to be unobtrusive plain text. Obscure Unicode markup does not count as plain text. When reading your example, I _did_ notice the "gordo annotations", but I thought something was wrong with the browser's text rendering. That's not a good thing for documentation.

If we add any more support, it is most likely going to be using a very limited subset of Markdown, like maybe just adopting one bullet list syntax. Even that is still a ways down the priority list though.

All 13 comments

There are tons of readily available lightweight markup syntaxes (markdown, asciidoc, reStructuredText, Textile, ...). Why are you proposing yet another markup language?

This seems hard to type. And having to type different things on different operating systems that are translated to various symbols (with a per os GORDO environment variable) seems like a bad idea.

Also using accents in formatting does not make the documents very readable in my personal opinion.

Go docs are meant to be unobtrusive plain text. Obscure Unicode markup does not count as plain text. When reading your example, I _did_ notice the "gordo annotations", but I thought something was wrong with the browser's text rendering. That's not a good thing for documentation.

If we add any more support, it is most likely going to be using a very limited subset of Markdown, like maybe just adopting one bullet list syntax. Even that is still a ways down the priority list though.

@rsc

Go docs are meant to be unobtrusive plain text.

Gordo is meant to preserve Go docs to be unobtrusive plain text.

Obscure Unicode

All characters used in gordo came with the brand new DEC's _VT100_ terminal unit in the year 1983. Thirty six years ago. This set I used in the 1989' software and these characters were available on the dated daisy wheel printers my first client then had.

Obscure

Used daily with latin letters by a billion people or more.

Unicode markup does not count as plain text. When reading your example, I _did_ notice the "gordo annotations", but I thought something was wrong with the browser's text rendering.

These will not render in the browser. These might be visible in the source and there they are the least obtrusive. Click through the raw button, please.

If we add any more support, it is most likely going to be using a very limited subset of Markdown,

Does really **bold**, _italics_, **_bold-italics_** and lists introduced by a _significant whitespace_ allows one to better make sense of the words than ¨´˘ with a space under?


@taruti

There are tons of readily available lightweight markup syntaxes (markdown, asciidoc, reStructuredText, Textile, ...). Why are you proposing yet another markup language?

Because other markups are obtrusive for anyone who reads them in the source.

markdown source:
this version uses the [**Atkin**](https://fylux.github.io/2017/03/16/Sieve-Of-Atkin/) sieve
instead of previously used [**Pritchard's wheel**](https://link.springer.com/article/10.1007/BF00264164) one.

gordo source:
this version uses the «¨Atkin¨»þ sieve instead of previously used «¨Pritchard's wheel¨»þ one.
    þ fylux.github.io/2017/03/16/Sieve-Of-Atkin/
    þ link.springer.com/article/10.1007/BF00264164

markdown renders:
this version uses the Atkin sieve
instead of previously used Pritchard's wheel one.

gordo renders:
this version uses the Atkin sieve instead of previously used Pritchard's wheel one.

This seems hard to type. And having to type different things on different operating systems that are translated to various symbols (with a per os GORDO environment variable) seems like a bad idea.

Please re-read. I on my side will try to edit this part to have it not being understood exactly the opposite.

This seems hard to type.

It is an user's choice how to type gordo. The example provided in the proposal even shows how to type it using only ASCII characters — just like a markdown.

that are translated to various symbols

No. The opposite!

Various characters of user's choice are translated to the fixed set of eleven "gordo" characters.

Author types whatever keystrokes she wants and whatever she finds convenient/avaliable on her national keyboard layout, considering an IDE or editor she uses.
It is the target (cannonical) 11 charcters set that does not change.
GORDO table sets the input, output is fixed and same on all OSes and in all editors.

Also using accents in formatting does not make the documents very readable in my personal opinion.

It depends of what one does want to focus on. If it is the markup a reader needs to analyse, then yes - single dots or rings at top of the line need special attention.

Note though, that for all readers but author the less noticeable markup is, the better.

We (me at least) work with source documentation laid out with fixed-width fonts on screens of certain capacity. The html version is important before - lets us read faster and assess quality better. Where I work with other's source, In my vim I have marked parts of the docs (source) four to six keystrokes away.

Gordo aims to be unobtrusive in the source. So to allow it be as readable on the terminal as on the web while keeping the web version searchable and interactive, in a way.

I didn't say anything about **bold**, _italics_, **_bold-italics_**.
In general we don't want markup in doc comments.
I said we might recognize bullets.

Based on the discussion above and the reactions to the original proposal, this seems like a likely decline.

@rsc Bullet list would be useful but what is proposed here is way too much complex.

@ngrilly May you elaborate more about "too much" complexity, please?

For the styling part I see simple substitutions. The most complex part would be to gather toc references then produce output in order. But IMO it is right price for keeping chunks of documentation right in the files they describe.

The gofmt "complexity"/price is confined to the simple substitutions as well — just to allow both US-English, and other languages users to use ascii digraphs instead of chords.

@rsc,
Note that now there is no other way to impose order but having a single giant doc.go. Lexical sorting of api is good for indexing libraries. Services' api more often than not needs to be described in order.

I sustain my original claim, that in its current state godoc — simple and useful for the general-purpose library code — is not enough for the vast area of today's Go usage.

Dismissing, without a real discussion, proposed way to have documentation kept by the code, and ordered, and readable both in source and in the webbrowser in my opinion stands firm against adverised meritocracy of the proposal consideration process.

I consciously did not announce this proposal on the general list — in hope for meritful discussion with the team here. I apparently was wrong in that hopes.

Note that now there is no other way to impose order but having a single giant doc.go. Lexical sorting of api is good for indexing libraries. Services' api more often than not needs to be described in order.

I sustain my original claim, that in its current state godoc — simple and useful for the general-purpose library code — is not enough for the vast area of today's Go usage.

That is a defensible position.

But substantial amounts of this proposal are about styling text. See the comments above. "Go docs are meant to be unobtrusive plain text. Obscure Unicode markup does not count as plain text." And "In general we don't want markup in doc comments. I said we might recognize bullets." There simply isn't any support for styling text in godoc comments. It's a solution for a problem that doesn't exist.

As far as imposing some order on godoc, see #18342 and #25444. We already have accepted ideas for improving the situation. Someone needs to complete the implementation and get it into the sources. Then let's see where we are.

@ianlancetaylor: I am aware of previous work, or rather attempts to, in this area. Both old, both abandoned (if not silently refused). Mine's is a holistic proposal, not for a patch here and there. Amount of work needed to patch an urgent need (eg. adding bullets) is not substantially lesser than for adding a complete feature — especially in long run.

some order on godoc

We need no "some" order. We need an exect ordering that also is easy to maintain in long spans of time. This proposal's "ordering by toc" allows docs maintainer to rearrange documentation without need to touching code sources — writting and maintaining documentation for an api chunk there is a task for the developer who actually takes care of that code. No other proposal I saw allows for such a separation of concerns.

substantial amounts of this proposal are about styling text.

Excuse me: in this proposal styling section counts three sentences (83 words) and a 512 chars in 4x6 table. (_Substantial amounts of this proposal relate to the keyboard usage, though. Mostly as my overreaction to the perceived — and voiced — concerns regarding whether non-ascii characters can somehow be entered and displayed at all by the ascii keyboard users._)

I sustain — we need at least one form of emphasis in the text meant for the "web" users.
Be it bolds, be it italics — does not matter. We need this because good api often mandates using plain english words as an endpoint label, or as a field descriptor. While native English speaker is able to discern these off a sentence's parts with ease, people who learnt English during college years can be confused. Update: see also "accessibility" section added to the proposal.

unobtrusive plain text

This proposal is all about unobtrusive plain text that is readable in the source files.
I would like to stress again, that "plain" for 2/3 of world's population does not equal "american standard".
_Note also, that most developed countries' governments impose that software they pay for comes with documentation in their country's language._


_Invites: @dsnet. @jimmyfrasche, @griesemer

Note that an accessibility section was added to the proposal.
Update (r4): all text relating to configuring keyboards has been removed. (@kortschak, @bradfitz)

Having emphasis added enables us to produce documentation accessible by blind persons not only in the browser but also in the terminal.

I'm going to restate and emphasize "Obscure Unicode markup does not count as plain text."

Go already supports documentation in any language. That is not what this issue is about.

I feel like there are at least three things here:

1) Ordering of documentation, which could be nice to support in some way #18342 #25444
2) Whether godoc API documentation should have richer formatting (lists, emphasis etc)
3) What should be used for that formatting.

Personally I think that if more formatting is added to godoc using a subset of widely available markup languages is the best way for this, e.g. a subset of Markdown. Many Go programmers are already familiar with Markdown as it is used in many places on the web.

However the custom GORDO symbols + digraphs + escaping does not seem like a good solution for this from my perspective.

There is no change in consensus here, only additional argument made in favor by the original reporter. Declined.

Was this page helpful?
0 / 5 - 0 ratings