For some reason the second bulletpoint-list (the one in the k-means section) is not converted to LaTex in this file: https://github.com/scikit-learn-contrib/hdbscan/blob/5b1ed1dfef216c4083567061f533a47609c4e3b2/notebooks/Comparing%20Clustering%20Algorithms.ipynb
Instead the section is converted to the following LaTeX code:
So, in summary, here's how K-Means seems to stack up against out
desiderata: * \textbf{Don't be wrong!}: K-means is going to throw points
into clusters whether they belong or not; it also assumes you clusters
are globular. K-Means scores very poorly on this point. *
\textbf{Intuitive parameters}: If you have a good intuition[...]
The rest of the bulletpoint-lists in the notebook are converted correctly, and they seem to be formatted in exactly the same way as the one which fails.
Try adding a two spaces after the immediately preceding new line to indicate a line break per markdown as expected by pandoc. Does that solve your problem?
Two newlines (one empty line) before the list seems to do the trick. Pretty confusing though because without the empty line, it is still rendered fine in the HTML view.
I'm guessing that's because we're using pandoc for LaTeX export, and it generally has more consistent behaviour than the markdown spec requires around blank lines and lists. There is wide variation in how markdown is actually implemented.
However, I should mention, I actually get perfectly well formatted LaTeX on the linked notebook (which doesn't have the two newlines).
So, in summary, here's how K-Means seems to stack up against out desiderata:
* **Don't be wrong!**: K-means is going to throw points into clusters whether they belong or not; it also assumes you clusters are globular. K-Means scores very poorly on this point.
* **Intuitive parameters**: If you have a good intuition for how many clusters the dataset your exploring has then great, otherwise you might have a problem.
* **Stability**: Hopefully the clustering is stable for your data. Best to have many runs and check though.
* **Performance**: This is K-Means big win. It's a simple algorithm and with the right tricks and optimizations can be made exceptionally efficient. There are few algorithms that can compete with K-Means for performance. If you have truly huge data then K-Means might be your only option.
\begin{itemize}
\tightlist
\item
\textbf{Don't be wrong!}: We inherited all the benefits of DBSCAN and
removed the varying density clusters issue. HDBSCAN is easily the
strongest option on the 'Don't be wrong!' front.
\item
\textbf{Intuitive parameters}: Choosing a mimnimum cluster size is
very reasonable. The only remaining parameter is \texttt{min\_samples}
inherited from DBSCAN for the density based space transformation.
Sadly \texttt{min\_samples} is not that intuitive; HDBSCAN is not that
sensitive to it and we can choose some sensible defaults, but this
remains the biggest weakness of the algorithm.
\item
\textbf{Stability}: HDBSCAN is stable over runs and subsampling (since
the variable density clustering will still cluster sparser subsampled
clusters with the same parameter choices), and has good stability over
parameter choices.
\item
\textbf{Performance}: When implemented well HDBSCAN can be very
efficient. The current implementation has similar performance to
\texttt{fastcluster}'s agglomerative clustering (and will use
\texttt{fastcluster} if it is available), but we expect future
implementations that take advantage of newer data structure such as
cover trees to scale significantly better.
\end{itemize}
What version of pandoc are you using?
I'm using pandoc 1.19.2.1 (from the Arch Linux repositories).
$ pandoc --version
pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.5, texmath 0.9, skylighting 0.1.1.4
Installed using Homebrew on Mac OS X
Most helpful comment
Two newlines (one empty line) before the list seems to do the trick. Pretty confusing though because without the empty line, it is still rendered fine in the HTML view.