I noticed Markdown.parse
doesn't parse HTML tags properly while writing docs with Documenter.jl
(xref: https://github.com/JuliaDocs/Documenter.jl/issues/176).
Most Markdown parsers support this feature, so I think Base.Markdown
should do as well.
For example, two consecutive hyphens are recognized as an em dash as follows:
julia> Markdown.parse("<!-- comment -->")
<!– comment –>
CC: @MichaelHatherly
julia> versioninfo()
Julia Version 0.5.0-rc1+0
Commit cede539* (2016-08-04 08:48 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-4288U CPU @ 2.60GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, haswell)
I fear that we're going to end up having a full HTML parser in Base :|
Nice! Which Javascript library are you going to use?
I fear that we're going to end up having a full HTML parser in Base :|
Yeah, that's not something I'd like to end up happening. Most markdown parsers seem to just use some regex monstrosities to catch raw HTML, which appears to work alright.
Reminds me of this: http://stackoverflow.com/a/1732454
How about using CommonMark? It already has the libcmark library that supports HTML tags.
Does CommonMark support some form of table syntax yet @bicycle1885? From the last time I looked through the spec I didn't come across anything.
I think it would probably be a good idea to wrap libcmark anyway (at some point) even if it's just to make it easier to check how much of CommonMark we actually adhere to, which is most likely not much at the moment.
No, CommonMark seems to be very conservative to add extensions like table syntax. I'm not sure but I think we can do some preprocessing to convert table syntax extension to HTML tables before passing a string to libcmark.
I fear that we're going to end up having a full HTML parser in Base :|
In CommonMark, the intention is that it should be possible to recognise HTML using simple rules, i.e. without a full parser. You don't have to do any processing on it so it's not to tricky to match up <
and >
characters and avoid escaping that section.
I currently use this NodeJS hack as a work around for generating AWS documentation: https://github.com/samoconnor/AWSCore.jl/blob/master/src/HTML2MD.jl
It would be nice if HTML in markdown just worked.
Most helpful comment
In CommonMark, the intention is that it should be possible to recognise HTML using simple rules, i.e. without a full parser. You don't have to do any processing on it so it's not to tricky to match up
<
and>
characters and avoid escaping that section.