Incubator-mxnet: [RFC] Documentation of MXNet

Created on 11 Oct 2016  Â·  17Comments  Â·  Source: apache/incubator-mxnet

There has been some complaints about the documentation quality of MXNet. We actually have a lot of documents, but they are a bit scattered and maybe difficult to locate. In this issue, we attempt to make a framework so that we can better re-organize the existing documents, and provide guidelines for people to contribute more documents.

Please feel free to add your comments and opinions below to help make the documentation better.

Main TODOs

  • [ ] Introduction & Tutorial
  • [ ] How To (assigned to @winstywang)

    • [ ] Re-organize existing examples (and make sure they runs correctly?)

    • [ ] Add missing how-tos

  • [ ] API

    • [ ] [doctest](https://docs.python.org/2/library/doctest.html) for operators (assigned to @pluskid) --- see PR #3513

    • [ ] Symbolic and NDArray op: add brief examples, see also comments below

    • [ ] Add API Ref for other components (see below)

  • [ ] FAQ
  • [ ] UI Bug

    • [ ] Menu not shown on iOS, so we can only see the front page

      Organization


The organization will be similar to the existing system.

  • Introduction
  • Tutorials
  • How To / Cookbook / Usecase
  • API / References
  • FAQ
  • Developer's Guide

Guidelines for Documents

Introduction

The goal of _Introduction_ is to get an overview of MXNet, and a portal to get started (download & installation).

  • The same contents as the README?
  • With links to _Download and Installation_, and other language bindings.

Tutorials

Each tutorial should be step-by-step detailed document for something. A tutorial is an entry point of a new user without prior experience of MXNet. It should

  • Be self-contained, and correctly runnable — a user will quit if the first example he/she tries does not run.
  • Be simple and easy to run (e.g. even without CUDA).
  • Covers many useful components of MXNet.
  • With expected outputs when the users run it.
  • With step-by-step explanations. Brief explanations of components (e.g. DataIter), with links to more detailed documentation.

iPython notebook might be a good format for detailed runnable code with documentations. But we should add the scripts to our regression test using tools (e.g. runipy) that could run notebooks from the command line.

How To

HowTos are targeting users with basic understanding of MXNet and would like to do a specific thing with it. Organized by tasks. We have a rich examples repository that we could re-use.

For runable examples, the documentation can be a README.md in that specific folder.

APIs

Organizations

  • Overview: index to each components and what do they do.
  • Operators: general operators
  • Loss Layers: XXXLoss or XXXOutput
  • Module API and legacy Model API
  • Optimizers
  • Callbacks
  • Initializers
  • Eval Metrics
  • Data Iters
  • Plugin
  • NDArray
  • Internals

    • Executors

    • KVStore

    • Context

    • Monitor

Each component ideally should have

  • A summary of what it is
  • A list of existing classes that users can use
  • A description of the interface and optionally how to write a user-customized (callback, eval-metric, etc. can link back to HowTos)

It would be very helpful if each API function could have a brief example of how that function is called, especially for those automatically generated functions. The Concat operator is a good example. The document says it takes data as Symbol[] and num_args. For people not familiar with the calling convention, it is hard to figure out how to use this operator. Having the examples below greatly improves the document. It does not need to be a fully runnable example, a brief line like

cat = mx.sym.Concat(a, b, dim=dim)

is already useful. Currently the extra doc for operators for Python can be attached via symbol_doc.py. Some random example of good API reference with brief examples include: Python, numpy, Lasagne, Keras, etc.

FAQ

  • fix_gamma
  • How to automatically resume training from saved snapshot
  • How to set stage-wise learning rate
  • Why GPU is slower than CPU
  • Why multi-GPU data-parallism is slower than single-GPU
  • How to get intermediate outputs of a network
  • ...

    Developer's Guide

Existing system design doc, NNVM, etc.

We also need a doc for the instructions of making a release:

  • run regression test
  • doc changelog
  • tag
  • ...
Call for Contribution

Most helpful comment

If possible the examples should be run as part of the test-suite so that we can make sure that they will always work.

All 17 comments

It would be great if you can do the examples in R too!! :)

If possible the examples should be run as part of the test-suite so that we can make sure that they will always work.

MXNet doc should be putted to one whole website, the catalog of this website should be organization very convenient for user to search/rquery, we can study from other dl libaray docs.

the following docs are done very good to me:
tensorflow : https://www.tensorflow.org/
lasagne: https://lasagne.readthedocs.io/en/latest/
PaddlePaddle: http://www.paddlepaddle.org/
keras: https://keras.io/

and the api doc we can take example by torch nn convlution: https://github.com/torch/nn/blob/master/doc/convolution.md

@vchuravy Yes, doctest might be a good idea.

Current mxnet doc website's style is not so friendly. There's no clear separate between input variable and output variable. I suggest using the default style such as keras in the final doc.

There are Amazon folks working on this . Better coordinate. @sandeep-krishnamurthy

@pluskid: Great set of points. Doing this would greatly improve user experience for MXNet users.
I agree with most of your suggestions on organizing the content and adding new content. Me along with some more people at Amazon are working on coming up with a proposal on improving docs page navigation and organization of content.

Most of our current plan overlaps with your suggestions. I will be adding a correspondence here with more details (combining ideas we have and people's suggestion in this thread) by end of day Friday.

At high level I would like to take this forward in multiple steps as below:

  • [ ] Present the proposal for re-organization of mxnet docs.
  • [ ] Make required changes for high level re-organization of contents.
  • [ ] Improve Installation set up guide
  • [ ] Improve Tutorial section
  • [ ] Improve "How to" section. Add contents on applications, training etc. pulling from mxnet/examples.
  • [ ] Re-organize "API/Python" section.
  • [ ] Re-organize "API/Scala", "API/Julia", "API/R" sections.
  • [ ] Re-organize "FAQ" section
  • [ ] Create "Developers Guide" section. It might be good idea to have "Community" with sub-options "Committer", "Issues", "mailing list" ?
  • [ ] Explore usage of doctest.
  • [ ] Explore integration test set up on examples. Explore running IPython notebooks.

We can create separate issues for each of above tasks and track them?

Do let me know your thoughts and suggestions team.

Top Level

  • Get Started
  • Tutorials
  • How To
  • API
  • Deep Learning Concepts
  • Architecture
  • Community
  • Fork me on Github ribbon (A cross ribbon at top right corner)

Get Started

  • Introduction
    Objective to be met by this section: Cover features of MXNet, a line or two comparing with tensorflow, create interest on MxNet
  • Example
    A simple working example. Should not depend on some large dataset / cuda etc.
  • Setup and Installation guide
    Need lot of clean up in current installation guide.
    Still better to have quick installation script that users can run.
    To start with cover common OSes like Ubuntu/Amazon Linux (RHEL)/ Mac.
    Refer - http://torch.ch/docs/getting-started.html#_
  • Recommended Readings
    Link to Tutorials
    Link to How Tos
    Link to API page
    Link to community page

Tutorials

  • Pull out tutorials from API section and move them here.
  • Absorb Julia tutorials here
  • Refer above comment by @pluskid
    We have a rich examples repository that we could re-use. Pull them here.For runnable examples, the documentation can be a README.md in that specific folder.
  • Applications

    • Computer Vision



      • Image Classification


      • Segmentation


      • Detection


      • Neural Art



    • Natural Language Processing



      • Recurrent Neural Networks


      • Convnet Text classification


      • NCE Loss



    • Speech Recognition



      • Speech LSTM


      • Baidu CTC


        and so on..



Try to add more conceptual details.

How To
Needs to be organized as suggested by pluskid above
_FAQ_
Remove build and install. It is redundant
Few things can move to How to section.
Remove relation to CXXNet, Minerva, Purine2 ?
Probably this section can then be renamed to "Troubleshooting" ?

API

  • Should contain only APIs. Move out tutorials, examples etc. to other appropriate sections.
  • It is preferred to have only one page and have different tabs for each programming language. Refer spark examples here - http://spark.apache.org/examples.html

Deep Learning Concepts
Currently we have common deep learning concepts in Architecture page. Move them here.

  • Deep Learning Design Notes
    This section will be updated with self-contained design notes on various aspect of deep learning systems, in terms of abstraction, optimization and trade-off.
  • Programming Models for Deep Learning
  • Dependency Engine for Deep Learning
  • Squeeze the Memory Consumption of Deep Learning
  • Efficient Data Loading Module for Deep Learning
  • Survey of RNN Interface

Architecture
Have MXNet system specific details here.
Also, have sections on code walk through etc. Objective is to enable readers to understand system followed by architecture of mxnet and then link them to appropriate code sections.

Community

  • Link to how to contribute
  • Link to committers guide : details for committers on merge, regression test etc.
  • Link to email-list
  • Link to issues folder
  • Link to "Roadmap" section. We can capture next big things to be done in mxnet. We can even write a one liner in docs and link back to issue in issues folder.

I will create separate issues to track these tasks. Do let me know what you guys think about the proposal.

We can iteratively fix section by section.

Submitted PR for

  • High level headers
  • Fork me on github ribbon
  • Rename Package to API

https://github.com/dmlc/mxnet/pull/3526

We should also have a Glossary section. A line description and probably link to more detailed documentation for that term (if available)

Deep Learning concept seems too verbose.

BTW, can someone who know about CSS/HTML help adjust the font/linespacing for our docs? I think we need a different font and larger linespacing (esp between title and content).
Referece: https://keras.io/layers/core/

Then may be "Deep Learning Basics"? Or "Basics"?

I feel "Deep learning concepts" and "Architecture" can be merged. Because they are both describing the mxnet internals.

@VoVAllen I share the same opinion with you. Currently it is quite confusing which part is parameters and which part is return values.

Major re-organization of content in "get_started", "tutorials", "architecture", "community" pages submitted. PR - https://github.com/dmlc/mxnet/pull/3545

We now have a HowTo tag for issues. We will be tagging issues that looks like common practice with this, and periodically go through the tagged issues and summarize the solutions into the howto or FAQ section of the doc.

I would suggest bookdown package for creating tutorials. It looks very nice for me, and you can find great examples like https://topepo.github.io/caret/index.html

Was this page helpful?
0 / 5 - 0 ratings