Crystal: Improve Markdown implementation - Or: Should we separate compiler tools?

Created on 23 Jun 2017  ·  91Comments  ·  Source: crystal-lang/crystal

The Markdown parser and renderer is still very basic and misses important features like for example raw html.

I suggest to implement the CommonMark specification which is very reasonable and the de-facto standard reference for Markdown implementations. There is an entire test suite jgm/CommonMark to validate implementations. Currently Markdown.to_html passes only 137 of 621 (22 %).

It would also be nice if the Markdown parser/renderer can be easily extended to add support for more specialized features like Github Flavoured Markdown (e.g #2217)

help-wanted refactor stdlib

Most helpful comment

I think this is all just a big mistake. Markdown in docs has a very tiny priority. The next big steps are getting parallelism tried out and done well and Windows, not focusing on extracting tools outside of the compiler, which is transparent for users and probably a maintenance burden for us and Crystal distributors.

But improving Markdown is already here without a cost at all (I already sent a PR) and brings an immediate improvement without breaking any API (the main Markdown.to_html method is still there).

If I never continued commenting on this issue this entire "let's wait and extract compiler tools" discussion wouldn't even exist.

But do as you wish.

All 91 comments

GitHub Flavored Markdown (GFM) has the spec now.

Yes, that's the CommonMark spec plus a few extensions, so GFM is essentially a superset of CommonMark. These extensions are:

  • Tables (#2217)
  • Task List Items (#3459)
  • Striketrough
  • Extended Autolinks
  • Disallows Raw HTML

As asterite and ysbadden stated it is probably best if the default implementation does not include extensions.

There is already a Crystal implementation for CommonMark at https://github.com/ujifgc/crmark, though it is a direct port of the JavaScript reference implementation, so it can certainly be optimized in many ways.

Instead of implementing a parser in Crystal we could also use the existing C implementation. It supports custom extensions as well, though I have no idea how to incorporate them.

It would be good to adopt the full CommonMark specs in the stdlib. But I think it will be better to have a crystal native implementation. That way we could toggle extensions like GFM which many probably want.

If the failing CommonMark specs can be grouped and the design of the parser the reference implementation can be ported then we might be in a better shape to improve the markdown implementation as you suggest.

The markdown parser is mainly used to generate the docs from crystal code. And it is slightly used in the playground. That is basically why the current implementation didn't need all markdown features.

What do you mean with grouping the failing specs? The spec file is essentially a literate description (HTML at spec.commonmark.org) with embedded examples from which the test cases are extracted. So the tests are part of the hierarchical document structure. The default test runner for example can be limited to specs from specific sections with parameter --pattern.

I would prefer a Crystal implementation as well. Seems like I was mistaken about extensibility of cmark. There is however an open PR to include extensions. Github have forked cmark to implement their syntax additions.
@ysbaddaden has already created a shard with bindings to cmark: ysbadden/crystal-cmark.
So it would probably be very easy to replace the current implementation with cmark. But that would not give extensibility, it would probably still be difficult even if cmark supported it.

Besides the already mentioned projects, there are also huacnlee/remarkdown, an extension of stdlib's Markdown with support for GFM, and icyleaf/markd a WIP markdown parser in a very early stage.
/cc @icyleaf @huacnlee

I created markd just to validate how to write a CommonMark and better support extensions. In my other project named wasp is a static site generator. It requires powerful markdown parser for the users.

cmark is not the best solution to wasp, and same to crystal community.

BTW, PL#4496 is another discussion.

I'm sure that GFM should be default. Because it will be default everywhere.

What makes you sure of this? Even Github understands GFM as a set of extensions to the standard CommonMark. A default implementation should use the most basic common denominator. It is a better approach to add certain extensions to the basic set when they are necessary, instead of removing them when they are not needed.
However, we should consider including GFM into the stdlib and make it easily accessible (maybe as Markdown::GFM).

Just because any modern language/framework documentation browser based on GFM. But Yes, GFM as option in stdlib is OK.

Don't you guys think that two markdown parsers are a little bit of overkill? I personally would advocate that none should be embedded in Crystal at all. Unless I'm losing some details, I really see no need for markdown parsers when you could simply implement it as a shard and import as needed.

@kazzkiq crystal docs depends on Markdown.

I've said it before and I'll say it again: If the only reason the markdown parser is in the stdlib is so that it can be used in crystal doc then the solution is to develop a method to separate it into a shard and still have the compiler depend on it. This could very well include simply vendoring sourcecode into src/compiler/vendor and treating it as compiler sourcecode.

@straight-shoota in https://github.com/crystal-lang/crystal/issues/4613#issuecomment-310908551 i meant that if the failing specs can be grouped by features, then this issue is more actionable / can be tackled down based on that list of features. Maybe that list is just the TOC of http://spec.commonmark.org/0.27/ (now that I've opened the link).

That been said, PRs are welcome to move the crystal stdlib markdown module towards CommonMark.

If the module migrate to a separate shard that is another story. Let's first have a commonmark native implementation we are all happy about.

The current implementation probably needs some architectural changes to properly support all of CommonMark and to make it extensible.
For example, it is recommended to use a two-stage parser for block nodes and inline nodes.

As a side node: documentation flags (#3519) could also be implemented as an extension to the default markdown class and can then be used by the docs generator.

Now markd is 100% compliant to Commonmark, and pass all specs.

Here is the result of a sample markdown file parse at MacBook Pro Retina 2015 (2.2 GHz):

Crystal Markdown   3.28k (305.29µs) (± 0.92%)       fastest
           Markd 305.36  (  3.27ms) (± 5.52%) 10.73× slower

parse cost time:

preparing input: 1.218ms
block parsing: 1.685ms
inline parsing: 2.187ms
renderering: 1.472ms

note: preparing input is only process the source as String not File.

Wow, that's great! A performance loss is certainly to be expected, but I suppose there is room for optimization...?
What "sample markdown" did you use for this benchmark?

@straight-shoota Updated the result, i found and used a complete commonmark source as a sample to benchmark. 🤣

Add ujifgc/crmark, It was best with commonmark support for now.

     crystal markdown   3.06k (327.25µs) (± 1.25%)       fastest
                markd 278.73  (  3.59ms) (± 1.12%) 10.96× slower
crmark in :commonmark 635.85  (  1.57ms) (± 1.43%)  4.81× slower
crmark in :markdownit 118.54  (  8.44ms) (± 4.52%) 25.78× slower

I can't install ysbaddaden/crystal-cmark, it missed the benchmark:

$ shards
Updating https://github.com/icyleaf/markd.git
Updating https://github.com/ujifgc/crmark.git
Updating https://github.com/ysbaddaden/crystal-cmark.git
Using markd (f58ed78fd0cdcc6e9dd274ac9f8696bc778dea84)
Using crmark (457c602725834429cd40544fbcaa505034637c8b)
Installing common_mark (0.1.0)
Postinstall cd ext && make
Failed cd ext && make:
/bin/sh: line 1: cd: ext: No such file or directory

Would you care comparing the performance to crystal-cmark (libcmark bindings)? And maybe some implementations in other languages (cmark and commonmark.js are reference implementations)?

@icyleaf use branch: master on common_mark, the latest 0.2.0 release isn't tagged and it's trying to use 0.1.0.

Simple benchmark, each ran 10 times. source code

bm_crystal_builtin average cost 8.6305ms, min 7.899ms, max 9.948ms
bm_crystal_crmark average cost 18.2027ms, min 17.303ms, max 19.282ms
bm_crystal_markd average cost 14.9459ms, min 14.08ms, max 15.783ms
bm_node_commonmarkjs average cost 114.2585ms, min 109.916ms, max 119.234ms

Updated and add common_mark (thanks @RX14).

     crystal markdown   2.69k (372.18µs) (± 0.93%)       fastest
                markd 257.93  (  3.88ms) (± 1.18%) 10.42× slower
crmark in :commonmark 640.94  (  1.56ms) (± 1.53%)  4.19× slower
crmark in :markdownit 123.85  (  8.07ms) (± 3.67%) 21.69× slower
        crystal-cmark   1.86k (536.57µs) (± 3.64%)  1.44× slower

without built-in markdown

                markd 249.45  (  4.01ms) (± 3.86%)  6.99× slower
crmark in :commonmark 511.05  (  1.96ms) (± 2.72%)  3.41× slower
crmark in :markdownit 111.65  (  8.96ms) (± 2.43%) 15.61× slower
        crystal-cmark   1.74k (573.86µs) (± 3.68%)       fastest

I'd say we move the Markdown class from the std inside the compiler's source code and make it private to it. If you want to use markdown, use a shard. As @RX14 says, the only reason we have Markdown in the compiler's source code is because we use it for docs. If we later find out we like a shard, the compiler can depend on it.

Thoughts?

The current markdown implementation is insufficient in many ways, even for the job of producing the docs. Therefore I'd like to see an improvement to the markdown employed by the compiler, be it in stdlib or from an external shard (I don't know about the practical application of this).

I'd question if it would be worth sticking around with the current implementation if it gets hidden in the compiler's source. It's not good enough for this purpose and it's unlikely to get improved if nobody uses it except from the compiler.

@straight-shoota The current std is documented with it and I didn't find it lacking. We could perfectly support a subset of it, like for lists, links and codeblocks.

Yes, for the current stdlib documentation it is sufficient. But I've run into problems several times documenting my own code, mostly cause by missing support for raw HTML. While this shouldn't be needed for most documentation purposes, there are occasions where e.g. a table would help a lot. This is a common feature used in many API documentations.
Plus, the markdown renderer is not only used for the inline API documentation but also to render README.md. For this purpose it feels like a necessity to have full support of Common Mark and don't settle for anything short of that.

@straight-shoota surely since we all agree it's substandard, surely the best thing to do is to limit it's usage to to docs tool. If someone wants to implement all of commonmark properly and completely, i'm sure we can vendor in that shard and use it in the compiler.

Go documentation is excellent and they don't have markdown. They do have pre blocks. With that and spaces you can write tables. Like this:

header1    header2    header3
    1.0        2.0       3.0

I still think simply saying "The docs use a limited subset of markdown" is fine.

@RX14 What about icyleaf/markd? It seems to be quite complete. I haven't taken an in-depth look at the implementation details, but it would be worth taking it into consideration.

@asterite pre "tables" are very limited, don't work well with variable screen sizes. I wouldn't settle for that if it isn't hard to get "real" tables.
Though it's to true that great documentation is about the content, not fancy features. Still, it doesn't hurt to have some basic tools to make it better.

Oooh... markd looks good! We could try using it in the compiler, specially because it's written in pure Crystal so no more dependencies needed for omnibus-crystal. The performance loss doesn't matter much, and it could always be improved later.

I am extremely impressed by markd (cc @icyleaf). It's well speced and seems to be well-written. I'm certain we could vendor a release of that in and use it for the compiler.

I'm impressed too. I benchmarked it and the slow part is using Regex, but I think that's unavoidable. But still, parsing 1000 times Crystal's Readme.md takes about 0.8 seconds. So maybe generating docs for the whole Crystal std will take 1 second more, I don't know, and that's totally acceptable. And for other smaller projects the difference will be less.

I think we can use shards in the compiler. Before compiling the compiler we should execute shards. If someone wants to create a PR for this, I'm all in for it :-)

@icyleaf Great work!

It should be possible to improve performance by replacing regex with a dedicated parser. That's certainly in the game, but will require some work. And it should be good to go without that.

Ill work on this.

I don't think the compiler should depend on any crystal code outside crystal-lang organization.

If @icyleaf gives it consent the shard could be moved / forked. Maybe the Markdown namespace can be used.

I am not sure about using shard in the compiler as well yet. If we want to split in different repos we could use git modules. That might imply not making it a shard per-se, but it is a tool that wont affect that much the architecture IMO

Another idea would be to move the documentation generator to a completely separate project (and binary). I was wondering how Elixir did it for docs, because they also use markdown, and it turns out it's a separate project indeed: https://github.com/elixir-lang/ex_doc (last commit is from José Valim, so it's also a core project). The doc generator needs the compiler as a dependency, but that's fine. And it's a less critical tool than the compiler itself.

This also means the compiler doesn't need to know about docs, nor use ECR. We could probably keep extracting these tools to different binaries. The only downside is that every binary will be kind of big, because the compiler's code is big. Maybe 20MB each executable. Maybe not a big problem.

In fact it seems ex_doc can depend on different markdown implementations, which is also great because no more "But I want to use this markdown implementation because it has feature X".

The playground also depends on markdown. We would need to extract that also. That is 2 or 3 times the compiler then. I'm ok with that. brew formula is 60MB so we are speaking of 100MB probably.

The downside of that is that sometimes bugs/breaks are harder to detect across different repos.
i.e: the language docs would be generated with the stable or with head version? why there would be a release of the docs tool if it's not for the new release of the compiler? Eventually it will make sense for sure, but right now it will be 1:1 versions I think.

If we extract that to different repos let's keep the git history 🙏 on the new repos.

:-1: for extracting the tools to different repos. The insides of the compiler are still very unstable and in-flux. Extracting to multiple repos is likely to just complicate refactoring.

I don't see the problem with depending on a shard, personally. We fix the git sha in the shard.lock which means all changes to the code in the shard have to be explicitly okayed inside this repo. I can see the argument that it complicates the build process unnecessarily - the solution there is to use git subtree and vendor the repo in manually without shards. If someone wants to go through the shard code with a fine-tooth comb each time we vendor an update in to see if there's any backdoors that's fine by me. Other than that, I'm not quite sure what the actual technical issue is.

Here's another idea: we move the tools (I'm thinking about the doc generator and the playground) to other repos, but make these tools depend on the crystal binary, not the compiler's source, or at least just the syntax part of the compiler, which is very lightweight and changes less.

For the docs, this means the compiler provides the documentation in JSON format. Then any tool can process this JSON and generate HTML/PDF documentation. This is in fact tracked by #2772 . The only thing is that the doc generator currently formats and highlights crystal code, so it must depend on the lexer and formatter (well, this is not a requirement, but it's nice). I made a program that depends on the compiler's syntax and formatter, and its size, when compiled with --release, is 1MB. So I think that's more than acceptable.

I checked the syntax highlighter, it didn't change since July 2016, which means the lexer didn't change since then, and I don't think it will change. And even if it does, the tool can be updated after a new release.

For the playground this means that you can feed the compiler some text, and it will modify it so that it will return it instrumented. Right now this is what happens, for example this code:

a = 1
a

becomes this:

a = _p.i(1) do
  1
end
_p.i(2) do
  a
end

Then the playground defines a source file like this:

require "compiler/crystal/tools/playground/agent"

class Crystal::Playground::Agent
  @@instance = Crystal::Playground::Agent.new("ws://localhost:#{port}/agent/#{session_key}/#{tag}", #{tag})

  def self.instance
    @@instance
  end
end

def _p
  Crystal::Playground::Agent.instance
end

and combines both sources so that the playground agent sends results through a web socket.

We can either make the compiler just provide the first tool, the one that "instruments" the code. Or in fact we can leave it without it and require the tool to do it, because the tool can require just the syntax part of the compiler and do the transformation (right now it's a file of about ~200 lines).

This means the compiler won't have this logic. But it will also not have an embedded HTTP::Server in it. We get rid of that dependency and the compiler's size will shrink, and compile times will also decrease. And we can remove those ugly flag?(:without_openssl) and flag?(:without_zlib) from all over http's source code, thus simplifying the whole code and the build process.

Maybe a new release will break this tools if they depend on the compiler's source. It's just a matter of updating that and releasing a new version (just like every other tool or shard does), which is much less critical than releasing a new compiler version. And the best thing is that these tools can evolve independently of the compiler, and releases can be more often, because they are less critical in their nature.

Also, the compiler gets rid of all html, css, and js files, which probably don't belong inside a compiler.

How would this change the interface with the user? How would we package these external binary tools, and how would the user call them?

We don't. We don't include these tools in the compiler. If a user wants them, they can install them through the corresponding projects. This is how ex_doc for Elixir works, it doesn't come with the base installation of Elixir. And I think that's good. Sometimes you just want to write an app and don't need a doc generator, nor need the playground.

I really like about Crystal that the compiler already brings most important tooling with it. All batteries included. That's really very helpful, especially for beginners.

This can also be achieved if these tool source code is separated from the compiler: the binaries could just be distributed with the crystal package. And they can be integrated into the compiler in the same way crystal deps is just a wrapper call to shards install.

Yes. It's an independent problem. What I'm saying is that having these tools as separate projects will simplify things, not make them worse. Probably the crystal formula from homebrew can depend on crystal-doc and crystal-playground if we wanted too, and for linux it could be a meta-package that contains the other tools.

@ysbaddaden already suggested this change some time ago and I thought it was worse, but now I'm convinced it's for the best.

Extracting features from the monolitic compiler doesn't have to change the user interface. We already embed Shards into the official packages, and can call it through crystal deps. Extracted tools could still be compiled and distributed in the official packages, and called in the same vein as they are today.

The benefit is having a set of smaller projects that are easier to delve in, less scary than digging into src/compiler, faster to compile, each with their own set of issues and improvements, along with the ability to rely on external Shards (thought this shouldn't be abused).

Yeah, this proposal looks good. crystal-doc and crystal-playground are not incredibly different to type than what they are currently. The only issue i have is one of discovery, will we put references to these external tools in crystal --help?

I understand that crystal-doc etc. wouldn't usually be used directly but through crystal doc instead. This way the user has a common interface for all the tools. And of course crystal --help should list all commands, if they are directly included in the same binary or external.

I think how the tool is invoked is not important. In Elixir is mix docs, in Rust it's rustdoc, in Ruby it's rdoc, etc.

In fact, I'd like to remove crystal deps as an alias of shards. It's very confusing. Sometimes shards would say its name (shards) when running crystal deps, so eventually you'll know what shards is about. And I think by now everyone (that uses Crystal) knows about it.

We shouldn't keep crystal docs as an alias for another tool.

For example:

$ crystal deps
Missing shard.yml. Please run 'shards init'

@asterite I was thinking the same thing recently but I didn't want to bring it up in this thread. We should deprecate for 1 release then remove. Or just print "use shards" but not invoke it for 1 release.

Yes, crystal deps should be removed because it's a different tool called shards.
But I'd still think it would be great to have compiler tools - even from external binaries - transparently integrated into a single CLI. The user doesn't even have to know he is actually running crystal-doc.

I don't like moving crystal doc to a separate repository because that can limit development of advanced features in it, and/or require unnecessary work on making the compiler tailored for it. It's just not time for that yet. Other than that, sounds great. Maybe worth doing closer to 1.0?

Having the compiler depend on a separate repository (markd) is, I think, a dismissed option at this point. In addition to what was said before, it has a blocking technical reason behind it -- it makes it impossible to introduce breaking changes in the compiler.

Vendoring the code of that repository (copying and periodically syncing it without allowing direct changes to it) would also fail due to the same problem.

I'd like to see Markd integrated with the standard library, but maybe it's not ready for that, and maybe we don't want Markdown in standard library anyway.

In IRC @RX14 suggested another option -- copying and periodically syncing the source code of Markd to this repository, and directly editing (a.k.a. forking) it only if absolutely necessary. I argued against it (what about maintenance and what about distro packaging), but the concerns turned out to be minor. It is a simple enough option that would work.

I still don't mind the splitting option, for the record, I just think it might be a bit too much work.

Maybe we can create a shard crystal-lang/crystal-utils to make compiler std more lightweight.

Just as another point of data, rust uses cargo to build it's compiler out of modules. Those modules frequently depend on crates outside of the tree, and as far as I can see there's no vendoring. I think using shards will be fine after 1.0, when we can guarantee that we don't have to fix the shard code halfway though. Before that though, why not simply copy the release of markd we want into the compiler source and then edit it if we have breaking changes to the compiler?

I think it's wiser to use a markd shard forked in crystal-lang/markd directly.

This way we can be confident about breaking changes and be able to send/pull commits from the original project.

It would really be nice to get this going forward. The limitations of the current markdown parser are holding back improvements of the API docs. It would be helpful in many places to use tables and sometimes direct HTML markup.

The main problems are:

  • the parser is rather simplistic and doesn't support features like numbered lists
  • the parser isn't likely used much outside the compiler, those wanted a full markdown support use external shards like markd
  • the presence of markdown in the stdlib is questionable. I don't know language doing this, and why choosing this markdown vs another flavored markdown vs restructured text or any other markup language?

Solutions

  • Parsing markdown is only used in the doc generation, which is rendered as HTML. Using a JS library like marked which parses markdown to render HTML will do the job.

  • using an external shard can also be considered, but the opinions on how to do this are mixed. Furthermore, as said above, I dont't think a markdown parser is in its right place in the stdlib.

Maybe one step we could agree on for now is to remove Markdown from the stdlib and change it to a private internal implementation for the compiler (as proposed in https://github.com/crystal-lang/crystal/issues/4613#issuecomment-328351783).

We all agree that the current Markdown implementation is far from complete and lacking in many ways. It does not accord to the quality usually expected of stdlib libraries. It's better to provide no Markdown library than an ineffectual one.

We could simply advise to use markd instead. While not essential, it would probably be a neat corporate action to transfer markd to crystal-lang org. This would express official support for the replacement of the former stdlib library. I think this would be a win for everyone. What do you think @icyleaf ?

Sure, I will to contribute mard to Crystall community. 🖖

--
/**

  • About me
    *
  • @author icyleaf
  • @url http://icyleaf.com
  • @version always beta
    */
    2018年12月3日 +0800 06:33 Johannes Müller ,写道:

Maybe one step we could agree on for now is to remove Markdown from the stdlib and change it to a private internal implementation for the compiler (as proposed in #4613 (comment)).
We all agree that the current Markdown implementation is far from complete and lacking in many ways. It does not accord to the quality usually expected of stdlib libraries. It's better to provide no Markdown library than an ineffectual one.
We could simply advise to use markd instead. While not essential, it would probably be a neat corporate action to transfer markd to crystal-lang org. This would express official support for the replacement of the former stdlib library. I think this would be a win for everyone. What do you think @icyleaf ?

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

So we recently moved Markdown from a public place to an internal compiler source.

We are now in this state (~ is "meh", - is negative, + is positive):

  • (~) there's no Markdown in the standard library
  • (-) the docs command uses a buggy, incomplete Markdown implementation

We can choose one of the following:

1. Do nothing

  • (~) no Markdown in std but people can use markd or common-cmark for Markdown in their code
  • (-) docs doesn't support the full markdown spec and it's buggy

2. Copy markd to an internal compiler location

I recently send a PR to markd with some optimizations. It's now about 2.5x times slower than crystal-cmark, which I think is acceptable.

markd passes the official Markdown spec suit.

  • (~) there's still no Markdown in the standard library but people can use markd or common-cmark for that in their projects
  • (+) docs supports the entire Markdown spec

3. Copy markd to the standard library

  • (+) there's Markdown in the standard library. I believe this is good because having it here means everyone will probably use it. You won't end up with a project that, through transitive dependencies, depend on multiple markdown implementation. Plus the entire spec already passes and its performance is pretty good. It has options for GitHub flavored markdown and many others. And through usage we can continue improving it.
  • (+) docs supports the entire Markdown spec

What do we do?

I prefer the last option.

In options 2 and 3 we'd move markd to the compiler's source and it means running more specs. I think what we can do is place these libraries inside a spec/lib/... directory and run those specs separately from the std. In fact we could do that for JSON, YAML, CSV, XML and many others: treat them as shards/libraries, even though they are inside the standard library. Then we'd split the spec runs in CI for each of those libraries, keeping compilation times and memory consumption low.

Otherwise, we could start with option 2 and if it turns out to work really well we could promote it to option 3.

I understand that we could continue discussing whether we want the compiler to depend on shards and how to do that. But right now the state is not good (bad docs and no Markdown) so we can improve that situation first and then take care of how we organize the compiler.

👍 for option 3
🥇if the main api with respect to what we used to have in the std is not changed

Just to remove some bias from the ++ third option, I'd like to mention some other metrics 😄

1.

  • (+) No added complexity. 🤷‍♂
  • (+) Markd can iterate independently of Crystal releases.

2.

  • (~) A little bit more complexity for the compiler, but don't need to maintain and spec it separately.
  • (+) Markd can iterate independently of Crystal releases and only sync the code from the shard occasionally.

3.

  • (-) More work for core team to maintain Markdown in the stdlib when it could just be developed independently.
  • (~) Markd releases are tied to stdlib releases, so new features might not propagate as quickly as they could.

IMO I don't think 1 is a good option. 2 would be acceptable. It would essentially add a dependency on a shard, but with the source code vendored into the Crystal repo.

I do prefer 3, though. Even if it's not strictly required to have Markdown available as a tool in stdlib, I think it's a great feature. I remember when I started working with Crystal, I was pleasently surprised to see a Markdown implementation in the stdlib - apart from other useful tools (it was only later that I learned how buggy it was. But the intention counts 😆 ).
The API for Markdown is pretty slim and there are no huge demands for customizability, so it's doesn't add a lot of edges.

I'd definitely support compartmentalizing individual libraries inside the standard library.

🥇if the main api with respect to what we used to have in the std is not changed

Definitely! That's something I thought too. Or at least the main API which is just to turn some text into HTML (I think custom renderers will have to change, but mainly because the old std code didn't take some things into account).

I do prefer 3, though. Even if it's not strictly required to have Markdown available as a tool in stdlib, I think it's a great feature.

Cool! I'll try to work on this then when I have time. It should be relatively simple: just copy the code and rename some things. The hardest part would be to fit the current docs renderer, which has some custom logic to highlight code and link types and methods, into markd's renderer. But it shouldn't be that hard.

Regarding "Copy markd to the standard library":
Might be nice to preserve all the Git commits. Or it might not be, maybe there are some caveats.

The result: https://github.com/crystal-lang/crystal/compare/master...oprypin:markd (see at the bottom)

Here's how to do it:

git remote add markd https://github.com/icyleaf/markd
git fetch markd master
git checkout -b markd-move FETCH_HEAD
git rm -r .circleci .vscode .gitignore .travis.yml shard.yml 
mkdir -p spec/lib/markd
git mv spec/fixtures spec/*.cr spec/lib/markd/
git mv *.md LICENSE src/markd/
git commit -m "Move files to prepare merging into Crystal stdlib"

git remote add origin https://github.com/crystal-lang/crystal
git fetch origin master
git checkout -b markd FETCH_HEAD
git merge --allow-unrelated-histories markd-move -m "Merge https://github.com/icyleaf/markd"

What's wrong with git subtree?
It would be good to not link markd versions with Crystal releases.
This isn't a good reason enough to expose a third party library just because it's used by a compiler tool.
One can choose a library version, update it (and gain features/security patches) independently of the Crystal language.

Git subtree would be option 2 (technically, it could also be used with 3, but I don't think that makes much sense). Moving it entirely into stdlib has the benefit of having it available to every Crystal program.

What would be the API? If it's the same as markd (Markd::), it will conflict with the one installed with shard.yml. If different, a fork will likely be needed.
It is not an option to force every users bumping the library version when bumping the Crystal compiler version too.

@j8r markd is already feature complete and passes all specs. There haven't been recent commits that fix stuff in the library, just optimizations (it seems) or new features. So I think moving this to the standard library is good:

  • it's unlikely to change that often
  • it's available to all users without using an extra shard
  • having it in std means it's more likely to be used and noticed and so more bugs and performance optimizations could be solved/added

@j8r I agree that having libraries tied to Crystal's stdlib release schedule can be problematic. But as @asterite said, it's a huge issue regarding Markdown. There are already other libraries in the stdlib (HTTP, SSL for example) that would benefit much more from having individual releases.

In the future we might eventually ending up with the stdlib consisting of essentially a set of standard shards that come installed with crystal by default, but individual projects can use different versions of them (installed in local lib). We don't have that setup now, and I don't think it's urgent before 1.0. But we'd like to have markdown support in stdlib, so the best solution currently is to merge markd.

Ok that may be fine if you consider the markdown implementation very stable. In fact, the main issue I find is to put everything possible (I exaggerate) in the crystal/crystal monorepo.

Can we still keep in its own repo, and include it with subtree (or whatever) in the stdlib (option 3)?
This would be a shame that @icyleaf, the markd creator, won't be able to review PRs and contribute to the stdlib's markdown.
Furthermore, the owners will have to backport changes upstream and downstream.
For the record, Golang and Rust do this division in some extent (not with subtree, though).

This would be a shame that @icyleaf, the markd creator, won't be able to review PRs and contribute to the stdlib's markdown

Why not?

Furthermore, the owners will have to backport changes upstream and downstream

I imagine markd will be moved to the std and then it wouldn't make sense for markd to continue existing beside the std.

Can we still keep in its own repo, and include it with subtree (or whatever) in the stdlib (option 3)?

This only affects Crystal std/compiler developers, so just a bunch of people compared to the entire Crystal userbase. So I personally don't see this decision as having a lot of importance and it could always be tackled later.

Why not?

Because he won't be the owner of the repo anymore, just another Crystal contributor.

I imagine markd will be moved to the std and then it wouldn't make sense for markd to continue existing beside the std.

Not really. It would also be harder to contribute than if markd lives in its own repo. Review would be longer, CI too, the crystal-lang/crystal have more setup requirements.
Wanting a custom markdown parser to build our own flavor, or implement an existing one, will require to fork the whole Crystal repo (EDIT: or advanced git-fu).

If there is a chance for the markdown module to be available directly in the std-lib I would like to avoid having a release without it. Right now 0.31.0 might come out without a markdown module.
That will cause dependencies to move out of the std lib since there is no deprecation in place.

While we settle how/if do this, should rollback #8115 to avoid reverting a breaking change in a release?

@j8r You're making generally valid points and more separation might be a good path for the future. But currently, we have a monolith repository. And I think this is fine for now and we don't need to change this setup. That would require a lot of effort and simply isn't necessary at the moment.

Besides, your argument only targets markdown, because the markd implementation is currently a separate shard, suggested to be moved into stdlib. It has not been a part of the stdlib before now, but the same is true for to many other standard libraries. The only difference is, they're currently already in stdlib. I don't think it makes sense to go a separate way right now just for markdown.

@bcardiff Yes, we should avoid breaking markdown in 0.31.0. But I suppose, if we settle on @asterite's proposal, Markd could be imported rather quickly. If we target for a release at the beginning of October, this should work out.

Another way to see it: we can import markd into the standard library and rename it to Markdown. markd can still exist and you can use it instead of that if you really want to. Markdown can be seen as the markdown implementation provided with the std, whose updates follow the Crystal release cycle and contributing to it is "slower" than contributing to markd (which might not be true because here we are many while in markd it's just one). It just happens that right now Markdown has the same implementation as markd, or uses that under the hood.

I think I won't have time to do this, so if someone can take over I'll be very grateful 🙏

If someone wants to tackle this, they will need to:

  • copy markd by moving it to the standard library under src, but specifically the code in this PR. Include the specs and put them in spec/src/markdown (I think the specs are very simple and short so we can include them there). Make sure to put the helpers in that same file as private methods.

    • rename Markd to Markdown

    • avoid reopening HTML and overriding decode_entities and others

    • implement the custom doc renderer by somehow subclassing Markd::HTMLRenderer: the crystal docs renderer will highlight code and do some links in code sections

    • remove the old internal Markdown module

@j8r It's alright to move markd to crystal-lang organization, I believe this move will make markd and
crystal both better :)

Actually, let me try once. more today. Yesterday I bumped into a compiler bug that was recently reported and I desisted, but maybe I can find a workaround.

Which bug was it? (maybe someone can have a try at it while you have fun with markd?)

Ah, nevermind the bug. I somehow copied the text from https://github.com/crystal-lang/crystal/issues/8163#issuecomment-529963516 into the markdown code and it was exploding because of that. No idea how that happened.

Later today I'll submit the markd inside std PR. It won't have the git history, sorry. But the history can be kept in markd's repo.

I have the code but I won't send the PR. It seems Brian and Juan agreed that it might not be good to have Markdown in the standard library and that we should move crystal docs to an external tool where we could use external shards there.

So for now it might be better to just do nothing here.

We were iterating with @waj on what to do here, our proposal would be:

  • fork icyleaf/markd to crystal-lang/crystal-markdown

    • keep @icyleaf as maintainer of crystal-lang/crystal-markdown

    • rename markd to use the top level Markdown namespace that was available before

    • check that the main api from Crytstal 0.30.0 is honored by crystal-lang/crystal-markdown

  • encourage users to include crystal-lang/crystal-markdown as a dependency if needed in the Crystal 0.31.0 changelog
  • Delay for 0.32.0 the discussion on how to use crystal-lang/crystal-markdown in the compiler
  • Leave the docs tool as is for 0.31.0, with it existing bugs

This way we avoid copying the whole shard, allow @icyleaf to keep iterating, offer migration from Crystal 0.30 to 0.31, aim for reducing the std-lib size and maybe split the compiler.

fork icyleaf/markd to crystal-lang/crystal-markdown

What's the purpose of that? Why can't markd continue existing as a shard on its own? Why do we need to become its owners?

Avoiding dependencies outside the organization is a safe policy for avoiding them to disappear.
Is easier to keep permissions for the core-team.

An option for the future would be to have the markdown library as a dev dependency (that could be bundled, or not, in the release).
In fact, generating the API docs is optional, and it could even be possible to generate docs with no markdown.
This would prevent the chicken and egg issue @ysbaddaden was pointing out if the library becomes a hard requirement.

In the future the docs tool might even get out of the compiler directly. Yet bundled in the distribution packages. Doing that could also remove the chicken and egg situation of shards used in the compiler if needed.

Again, I want to focus on what the users should use as a replacement of the moved Markdown in order include that in the release 0.31.

We can focus on how to improve the docs with better markdown support later.

@bcardiff If we want to postpone the decision to after 0.31.0, I suggest to hold off on moving markd to crystal-lang/crystal-markdown as well. That's a change that looks good and might be done anyway, but maybe we come to a different solution. I'd rather avoid it until the immediate goal is settled.
Postponing would also mean to revert the removal of stdlib's `Markdown' for 0.31.0.

That would be a full postpone. :-) But I'm fine with it.

While we settle how/if do this, should rollback #8115 to avoid reverting a breaking change in a release?
https://github.com/crystal-lang/crystal/issues/4613#issuecomment-529927441

I think this is all just a big mistake. Markdown in docs has a very tiny priority. The next big steps are getting parallelism tried out and done well and Windows, not focusing on extracting tools outside of the compiler, which is transparent for users and probably a maintenance burden for us and Crystal distributors.

But improving Markdown is already here without a cost at all (I already sent a PR) and brings an immediate improvement without breaking any API (the main Markdown.to_html method is still there).

If I never continued commenting on this issue this entire "let's wait and extract compiler tools" discussion wouldn't even exist.

But do as you wish.

Was this page helpful?
0 / 5 - 0 ratings