We want to get ourselves better docs than those we have on the GitHub wiki at the moment.
We need to decide which documentation system to use, where to host it, and so on. Let's use this issue to map the possibilities we are aware of. I will start:
I think our docs should be:
Any other requirements we should have? Which existing docs should we take inspiration from?
Here is my copy/paste from what I added into the EOSS 2020 1st Quarter Roadmap doc:
Doc Platform Evaluation criteria (a start) -
Hmm, there might be something else to say here about Sphinx:
We could embed comment blocks (i.e., docstrings) inside the model code and use Sphinx to dynamically generate the model documentation by parsing these comments. This helps to ensure that the model documentation remains up-to-date.
Hi all,
from what I've seen Sphinx is a documentation tool for python code. If you
want to apply it to Java code you should use javashpinx which is deprecated
and not maintained.
Gitbook is a cloud service and you have to pay.
Django docs is based on reStructuredText + Sphinks and it it's still
targeted to python code.
I think a good solution for documentation would be to use the Apache Flink
documentation system[1], that is based on Jekill [2] + standard Javadoc.
It translates Markdown pages to html and I find it very easy and powerful
and can be easily internationalized. What do you think?
[1] https://github.com/apache/flink/blob/master/docs/build_docs.sh
[2] https://jekyllrb.com/
On Mon, 6 Jan 2020 at 19:03, Antonin Delpeuch notifications@github.com
wrote:
We want to get ourselves better docs than those we have on the GitHub wiki
at the moment.We need to decide which documentation system to use, where to host it, and
so on. Let's use this issue to map the possibilities we are aware of. I
will start:
- Sphinx: documentation system based on reStructuredText or Markdown,
which can be hosted for free on ReadTheDocs. Example docs:
https://editgroups.readthedocs.io/en/latest/- Gitbook: based on Markdown, it looks like it can be hosted on
gitbook.com. Example docs: https://devdocs.foodsharing.network/- I really like Django's docs - not sure if the framework is reusable
though: https://docs.djangoproject.com/en/3.0/topics/I think our docs should be:
- versioned: one sub-site per version, ideally without duplicating
content too much to keep things maintainable?- translated: it should be easy for people who currently contribute on
Weblate to also translate the docs.Any other requirements we should have? Which existing docs should we take
inspiration from?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/OpenRefine/OpenRefine/issues/2273?email_source=notifications&email_token=AA4Z4JKVOCI4W64CCNUJZN3Q4NXARA5CNFSM4KDHYWFKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IEJFGLA,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA4Z4JKWR625ZOPT5KSXY2LQ4NXARANCNFSM4KDHYWFA
.
In this issue I was not really thinking about code documentation, although that sort of auto-generated docs can be useful too. I was thinking about docs that are fully written up separately from the code. Although these sort of docs can also refer to some particular code parts (classes, methods…) they seem to be less dependent on the actual language. For instance ReadTheDocs (with Sphinx or Markdown) is used in many projects which are not written in Python.
Thanks for the suggestions of the Apache Flink documentation system, it does look nice indeed! It seems to be versioned - but what about localization? I can only see a Chinese version, but it does not seem to translate any of the content itself, just the headings…
In Flink every mardown file has a _zh copy that contain the chinese
translation. This way you can add all the languages you want. You just have
to write a proper config file for each lang and that's it.
Il Mar 14 Gen 2020, 22:42 Antonin Delpeuch notifications@github.com ha
scritto:
In this issue I was not really thinking about code documentation, although
that sort of auto-generated docs can be useful too. I was thinking about
docs that are fully written up separately from the code. Although these
sort of docs can also refer to some particular code parts (classes,
methods…) they seem to be less dependent on the actual language. For
instance ReadTheDocs (with Sphinx or Markdown) is used in many projects
which are not written in Python.Thanks for the suggestions of the Apache Flink documentation system, it
does look nice indeed! It seems to be versioned - but what about
localization? I can only see a Chinese version, but it does not seem to
translate any of the content itself, just the headings…—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/OpenRefine/OpenRefine/issues/2273?email_source=notifications&email_token=AA4Z4JO7CWDRBSYQTO6N4D3Q5YWTNA5CNFSM4KDHYWFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI6HKZI#issuecomment-574387557,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA4Z4JJN4ILZHDZ2NCXLDSLQ5YWTNANCNFSM4KDHYWFA
.
Another possibility is R markdown, which @msaby used to write great course material in French:
https://msaby.gitlab.io/formation-openrefine-BULAC/
But R markdown does not seem to have much localization support (only for its own strings, not for the content written with it).
I think it is worth thinking about using two separate systems for user documentation and developer documentation. The latter does not need to be translated I would say, and might need tighter integration/generation from the code. For user documentation, we are free to use basically anything even if they are designed for other languages initially. Because of that, Sphinx seems still a pretty strong candidate for me. It seems to have good localization support, there is ReadTheDocs for hosting, and it is generally well established. Docsaurus looks good too, with native support for localization and versioning.
Ideally, localization could go through Weblate too, so that translators can translate everything from a single location.
Another platform: Wikibooks or Wikiversity, for instance this book which just started https://fr.wikibooks.org/wiki/OpenRefine
I don't think this is very well suited to host official documentation though.
Hi all. To help us make a decision about what documentation platform/product/method we should move to I've set up a Google sheet
I've added all the options mentioned here so far, and columns for each of the criteria listed. Feel free to add more options into the mix or more criteria. Any help in completing the details for each row is welcome but I will be working through the list and looking for any other likely alternatives over the next week and hopefully we can then make a reasonably swift decision
On another project I'm on, we've decided to use GitBook. It is free for non profits, you just need to write them. It offers dual options to contribute to documentation, via GitHub or via their editor (which we thought allowed more of our members to contribute).
So, in the end, we've chosen Jekyll? Can we close this ticket with the final decisions?
@fpompermaier no decisions made - where did you take that we had chosen Jekyll?
@fpompermaier no decisions made - where did you take that we had chosen Jekyll?
I saw the announce on Twitter [1][2].
[1] https://twitter.com/OpenRefine/status/1236970180787535874
[2] https://github.com/OpenRefine/openrefine.github.com
That relates to the use of Jekyll to deliver the OpenRefine openrefine.org website (which has been the case for sometime), not in any way related to this issue
@fpompermaier this shows yet another reason why we would like you to be more intimately involved with the project, preferably hacking on OpenRefine on a daily volunteer basis :-) (just joking)
I have been adding on to the documentation excel sheet @ostephens supplied.
Based on the research I have done on the options there, here are my top 3 choices (in no particular order):
Sphinx on ReadTheDocs
example: Django documentation which @wetneb said he liked
short description: static documentation generator based on reStructuredText or Markdown, which can be hosted for free on ReadTheDocs
other comments: seems to be slightly steeper in terms of learning curve
Docusaurus v1 on GitHub Pages
example: Katex documentation
short description: static documentation generator that converts Markdown to static HTML files and publish them
Gitbook:
example: https://docs.realm.io/sync/
short description: Rich content and rich text editor that allows teams to document everything from products to internal knowledge-bases and APIs
other comments
- learning curve is not very steep because it provides a rich text and content editor
- no official Mermaid support, but unofficial plugins exist
- personal nit: weird that the boxes below actually lead to a new page when clicked on because I expected them to expand instead:
From what I found, it appears that all of these options satisfy the requirements we have on the spreadsheet (localisation, versioning etc.). I haven't thought more deeply about which would be easier to integrate with OpenRefine yet though.
On a side note, I think having the user and developer documentation on the same documentation system might be easier to manage, and easier for people who are interested in both to navigate as well.
Any thoughts on this? Should we choose from what we have listed here and on the spreadsheet, or still continue to seek out other possible options?
Hi,
I have been following this conversation for a while and have taken a broad look at the options provided in the excel sheets.
I suggest we migrate our documentation to the HUGO. Also with hugo we will be able to publish our docs from the same repository without doing much.
Short Description about HUGO :
Resources Links:
Hugo: https://gohugo.io
Docsy : https://github.com/google/docsy
Docsy Example Template : https://example.docsy.dev/docs/
Some sites made with HUGO :
https://kubernetes.io
https://thanos.io
https://getaether.net
& many more
@kushthedude thanks very much for this suggestion - what would be great is to get this into the spreadsheet along with the other options so we can see all the information together
I have helped to add @kushthedude's recommendation into the spreadsheet. I think it does satisfy what we are looking out for, and apparently has the reputation for being a very fast static site generator. It is definitely worth considering as well!
I was curious about how translations work in Docusaurus, which has an integration with Crowdin, so I started this thread on Weblate's mailing list to see if we could use Weblate instead:
https://lists.cihar.com/hyperkitty/list/[email protected]/thread/GINZJBVKU6ASC2M4ELNQCR3A7H6CLT7T/
In short: there are tools such as po4a which can be used to translate Markdown files via standard localization formats (such as GNUÂ gettext). Those tools could potentially be used to add translation support to documentation frameworks which do not support translation natively (although we should look into the details: adding another moving part might not be as good as using a native support for translation).
Thanks @wetneb
Following on from the document I posted earlier today on the discussion groups (https://docs.google.com/document/d/1VtzXjSqrliAp1R8Xy3yARXIiN5NuV0dNf4apvJz4rl8/edit?usp=sharing) I'm seeing the key use of the documentation platform being discussed here as for:
I think support for versioning (keeping documentation in line with released versions) and translation is essential for whatever solution we choose to deliver our technical and product reference. I also think we need to choose a solution that doesn't require us to do massive development on our part.
While Hugo looks great, I think it will require more investment from us and mean we need to have some level of expertise in Go which isn't part of our current stack. While I see that we could use this for the OpenRefine website as well, I'm in favour of keeping these separate at the moment. (whereas I could definitely see us integrating How Tos and Tutorials into our website).
I'd like to work out exactly how we'd deliver documentation into whichever solution we choose - I think we need to keep the 'reference' documentation (technical and product) close to the code so we can ensure it is updated when code is changed.
Based on what I've outlined above I'm currently favouring Docusaurus v1 (with the assumption we'd migrate to v2 once it supports the same level of functionality for versions/translations)
_So given this was also on @joanneong's list, I think that's two votes for Docusaurus._
Other views?
NB I've created a template for planning work on the different bits of documentation - my plan is to have a Github issue for each, but maybe this is a good place to post the links to:
_[edited to correct link to Technical reference document]_
My plan was to start populating them - but not managed to get there yet - I'll probably do this on Monday now although I've opened them for editing if people want to add anything
I feel the same as @ostephens for the same things, and same concerns. I use Hugo and like it. But definitely would not recommend to go with it for OpenRefine. It's made for static "sites", not focused on "documentation", which is why I like Docusaurus v1 and agree with a follow-on migration to v2 where the hope is that translation support gets even better (it's one of their goals as well for v2).
The only thing that I worry about with Docusaurus is the choice of Content Searching and how well it works. That's super important for our users because with our current GitHub wiki, that's been a pain to show users how to easily search our wiki (using GitHub's own search but knowing where to click to limit to just our Wiki. Not intuitive for our users.) Docusaurus' search seems to be super easy to use, but worry about the indexing choices and pros/cons. I guess we'll see and sort that out later.
Anyways, 👍 for Docusaurus
I'll comment with my own ideas about @joanneong's short-list.
_(edit:Â I am a bit harsh about Sphinx here - I only allow myself that because I am the one who brought it up in the first place)_
The main problem with Sphinx (I think) is the fact that it uses reStructuredText. It is possible to use Markdown as well but to the expense of some features (I think).
In my opinion, RST is not really newcomer-friendly, since it is a bit of a niche format. And to be honest it's also quite ugly. Look at the syntax to add hypertext links:
This is `an example <https://openrefine.org/>`_.
Who came up with this horror? Why on earth an underscore at the end? Just why? Compare that to Markdown:
This is [an example](https://openrefine.org).
The good thing with Sphinx is that it has built in support for localization with GNU Gettext. But ease of use is important too - if the system is too hard to use, there will not be much content to translate…
The only thing that I worry about with Docusaurus is the choice of Content Searching and how well it works.
Indeed! I am also concerned by how Docusaurus seems to be designed to work with specific commercial vendors on various aspects:
Even if we can get freebies from these commercial vendors as an open source project, I am not super happy with being locked in to use these providers.
I have spent quite some time trying to figure out if we can avoid going through Crowdin for localization. It looks like we could use po4a, a generic tool which can be used to translate a set of markdown files (and many other formats) using existing translation tools such as GNUÂ gettext. This means that it could be possible to translate the docs on Weblate directly, just like the UI strings. There are also plans to add native support for translating Markdown files to Weblate.
For search, I assume it should be possible to set up a Google custom site search anyway.
Another issue is that it is primarily designed to serve websites and docs at the same time. There are ways to use Docusaurus for docs only by adding a redirect to /docs/
from the index page though.
Two problems for me:
So what? I think we can go for Docusaurus, with the following setup:
Given that Markdown is pretty widespread, we are not risking too much by doing this migration: it should be possible to migrate to another system to render Markdown (Jekyll, Gitbook, or even going back to the GitHub wiki if we feel nostalgic).
If we are going for docusaurus, How are we planning to do so?
Will we adapt to a theme or will we create a custom theme for OpenRefine?
It would be nice! But not a prerequisite, so we can get other things moving without that.
Are we going to create a seperate repository or are we going to publish the website from the OpenRefine only(We can move the documentation under docs folder and create dynamic markdown generation for same if we don't want an additional repo).
I think there is general agreement that the source of the docs should live in the same repository as the code (OpenRefine/OpenRefine), so that we can encourage contributors to update the docs as they change the behaviour of the tool.
What will be the tentative timeline, what is the priority of the documentation site ?
I would like the system to be running ASAP. It is high priority as this is a prerequisite for people to engage in meaningful documentation efforts. If there is good agreement in the team around the architecture, I could set things up next week.
For hosting, I prefer netlify rather than GitHub Pages, as with netlify we will get custom PR builds, external content checking and various other feature.
Sure, happy to try that too.
I would like the system to be running ASAP. It is high priority as this is a prerequisite for people to engage in meaningful documentation efforts. If there is good agreement in the team around the architecture, I could set things up next week.
I will be happy to help to set-up the external-source generation script and the initial layout for the website. After which we can go to importing the docs.
If we are going for docusaurus, How are we planning to do so?
@kushthedude the point of setting up the various project documents I linked to was help with this sort of planning - ideally get some level of process and organisation around the project.
If google docs isn't the right place to do this, that's fine, but I don't think Github issues really give the structure we need - for me the plan should tell us what we are going to do, and then we have issue for the tasks (e.g. "setup external-source generation script").
I'd really appreciate it if we can get some structure around this - the docs I posted above were my attempt to do this:
Overall project document
Technical reference (aimed at developers)
Product reference (aimed at users)
Project documentation (aimed at developers)
I'd really stress that "documentation" isn't a single thing - so we need to think about what we are migrating, what is on Docusaurus (or whatever platform we choose), what other requirements we have etc.
I know we are all eager to get this project moving, and I don't want to slow us down, but I also want us to do it "right"
Owen,
I think we can use some GitHub Projects to help provide additional
structure for our planning and link to issues.
I've setup one quickly (Kanban Automated template) and need to know if you
have access to it and can try clicking on a + sign to add a quick note in
"To do", as well as seeing the blue links for "Manage" on the bottom of the
board columns?
Your view should look like this:
[image: image.png]
If it doesn't then I need to fix the permissions, so let me know.
We can also create more buckets in the Kanban board swimlane if that make
sense. Like the "In progress" should probably be broken into 3 buckets,
"Phase1" "Phase2" "Phase3"
We can also have MORE THAN 1 Project for the Documentation (to break things
up at a very high level if you want, but I think that labels and milestones
will suffice with an overlay of Project cards
This works fairly well for planning purposes like what Antonin has for
Wikidata, and I had for the UI improvements.
I think having a call for about 2-3 hours will help to sort this out for
everyone and I'm happy to help us get organized.
Forgot the link! https://github.com/OpenRefine/OpenRefine/projects/8
@ostephens are you suggesting we discuss things in Google Docs instead of here?
I think your docs are a great start but I did not understand them as something that was meant to be a discussion platform - I was expecting you to fill in the gaps with your proposed approach to the problem.
If it is instead meant to be a discussion platform, we can lock this issue and point people there (but I am not sure how easy it is to track who wrote what, and map Google accounts to GitHub usernames…)
My personal experience in using GitHub issues and/or project is awful, while is awesome with Jira.
Moreover, I feel very comfortable with the process that has been gradually established (in many years) within the Flink community.
I hope this could help..
@wetneb I don't mean to pass discussion over to Google docs, but the question asked was:
If we are going for docusaurus, How are we planning to do so?
The implementation plan can be discussed, but should be documented IMO - so rather than leap from discussion to implementation I'd like discussion -> plan ->implementation
The google docs were an attempt to be able to do the "plan" stage. But I'm happy to look at alternative approaches - I just don't think making this discussion longer and longer is the solution :)
Ok, it looks like someone needs to step up and write plans in these google docs then. Happy to do that if it can get gears moving.
I don't necessarily think that's down to one person to do - I've started, but didn't get as much done this week as I'd hoped
No problem Owen, I'll try to help this week-end.
@wetneb @ostephens TIP: Docusarus v2 does have a "Docs Only" mode and "Blog Only" mode.
No problem Owen, I'll try to help this week-end.
I will also try to help in updating the information over weekend. Hope to see an OpenRefine Docs Site soon 🤩
@antoine2711 why did you unpin this?
I think it is good to keep this visible, this should help contributors get involved in the design of the new documentation architecture.
I'm supportive of @ostephens "plan, then implement" approach. Have we decided on Docusaurus?
I don't have strong opinions about markup languages or tooling. I'm happy to have the tech writers, etc who'll be contributing the bulk of the content have the bulk of the say in the decision.
I think the discussion about the platform has slowed down now. We have way more votes for Docusaurus than any other platform, so let's go for Docusaurus.
Here is a document to flesh out the category structure of the product reference:
https://docs.google.com/document/d/1erdK-IsYdMRRDJotIlZII_xNYWfLXuJqADdAEGgrhOo/edit?usp=sharing
Feel free to add comments and edit to reach something consensual.
Since this ticket is mostly about the choice of platform, we can now close this (after #2513).
We can have other tickets to migrate existing content from the wiki, and write up new content.
Most helpful comment
I have been adding on to the documentation excel sheet @ostephens supplied.
Based on the research I have done on the options there, here are my top 3 choices (in no particular order):
Sphinx on ReadTheDocs
example: Django documentation which @wetneb said he liked
short description: static documentation generator based on reStructuredText or Markdown, which can be hosted for free on ReadTheDocs
other comments: seems to be slightly steeper in terms of learning curve
Docusaurus v1 on GitHub Pages
example: Katex documentation
short description: static documentation generator that converts Markdown to static HTML files and publish them
Gitbook:
example: https://docs.realm.io/sync/
short description: Rich content and rich text editor that allows teams to document everything from products to internal knowledge-bases and APIs
other comments
- learning curve is not very steep because it provides a rich text and content editor
- no official Mermaid support, but unofficial plugins exist
- personal nit: weird that the boxes below actually lead to a new page when clicked on because I expected them to expand instead:
From what I found, it appears that all of these options satisfy the requirements we have on the spreadsheet (localisation, versioning etc.). I haven't thought more deeply about which would be easier to integrate with OpenRefine yet though.
On a side note, I think having the user and developer documentation on the same documentation system might be easier to manage, and easier for people who are interested in both to navigate as well.
Any thoughts on this? Should we choose from what we have listed here and on the spreadsheet, or still continue to seek out other possible options?