Nbconvert: Conversion to LaTeX of output of `colSums()` in R could be improved

Created on 5 May 2018  Â·  6Comments  Â·  Source: jupyter/nbconvert

Example: Given an R data frame data with columns ever_self_employed (0 missing entries), log_tot (0 missing entries), and treated (712 missing entries), the output of running

colSums(is.na(data))

in Jupyter Notebook is converted to

    \begin{description*}
\item[ever\textbackslash{}\_self\textbackslash{}\_employed] 0
\item[log\textbackslash{}\_tot] 712
\item[treated] 0
\end{description*}

This doesn't look very much like the original output from the notebook.

  1. While \_ is necessary for the variable names with underscores to be rendered correctly in LaTeX, the insertion of \textbackslash{} is not required, and in fact changes the output from what it looked like in the original Jupyter notebook.
  2. The alignment isn't great -- the alignment should be a two-column table, with the first column right-aligned, and the second column left-aligned.
  3. There are no line breaks between the rows, even though there should be.

Here is what I propose as a fix for this specific example:

  1. Change the environment from description* to longtable (longtable instead of tabular so that the list breaks over pages when it is very long, as was necessary in another example I hade which was too long for an MWE).
  2. Explicitly wrap the text in the left column with \textbf{}.
  3. In the formatting part of the notebook, after the line \usepackage{longtable} (which is already in the preamble anyway for pandoc support), add another line setting \setlength{\LTleft}{-1cm plus -1fill} so that the resulting longtable is approximately left-adjusted (the default setting makes the longtable look centered, which is different from its adjustment in the original notebook).
  4. Use &'s and \\ to state the cells and rows of the longtable explicitly.

Here is the code I have for this specific example which seems to more faithfully reproduce the output from the original notebook:

 \usepackage{longtable} % longtable support required by pandoc >1.10
    % according to answer here: https://tex.stackexchange.com/questions/32726/center-wide-longtable-not-tabular-or-tabularx/32729
    % can be used to avoid the table being too far in the center
    \setlength{\LTleft}{-1cm plus -1fill}

...

    \begin{longtable}{rl}
\textbf{ever\_self\_employed} & 0 \\
\textbf{log\_tot} &  712 \\
\textbf{treated} & 0
\end{longtable}

Admittedly the \setlength{\LTleft}{-1cm plus -1fill} seems to make the table a little too far to the left in some cases.

All 6 comments

@flying-sheep is this actually a conversion, or is that Latex produced by the R code?

If it's doing a conversion, it's using pandoc, which we don't have much control over.

@takluyver If it's a conversion (I think it is), should I close this and file it with pandoc?

I've got a feeling that IRkernel creates latex output itself, so it may not
be a conversion. Can you have a look at the raw contents of the ipynb file
and check which formats it has?

On Sun, 6 May 2018, 3:07 p.m. krinsman, notifications@github.com wrote:

@takluyver https://github.com/takluyver If it's a conversion (I think
it is), should I close this and file it with pandoc?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyter/nbconvert/issues/806#issuecomment-386878233,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUA9bq6ZDpz11k0AshRyCWrUQmsimyaks5tvvWYgaJpZM4Tzytk
.

@takluyver Is it possible that this should be filed with IRkernel then? I just want to know what will be most helpful.

Anyway, I think this is the code for the relevant output in the .ipynb file, but I'm not 100% certain of my understanding of the JSON structure of the notebook file.

    {
     "data": {
      "text/html": [
       "<dl class=dl-horizontal>\n",
       "\t<dt>ever_self_employed</dt>\n",
       "\t\t<dd>0</dd>\n",
       "\t<dt>log_tot</dt>\n",
       "\t\t<dd>712</dd>\n",
       "\t<dt>treated</dt>\n",
       "\t\t<dd>0</dd>\n",
       "</dl>\n"
      ],
      "text/latex": [
       "\\begin{description*}\n",
       "\\item[ever\\textbackslash{}\\_self\\textbackslash{}\\_employed] 0\n",
       "\\item[log\\textbackslash{}\\_tot] 712\n",
       "\\item[treated] 0\n",
       "\\end{description*}\n"
      ],
      "text/markdown": [
       "ever_self_employed\n",
       ":   0log_tot\n",
       ":   712treated\n",
       ":   0\n",
       "\n"
      ],
      "text/plain": [
       "ever_self_employed            log_tot            treated \n",
       "                 0                712                  0 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }

Yup, that looks like it. That means that the Latex is being produced by the kernel. The repr repo is most likely to be the relevant one:

https://github.com/IRkernel/repr

If you want to file an issue there, feel free to link to it and close this one.

Done.

Thank you for your help!

Was this page helpful?
0 / 5 - 0 ratings