Describe the bug
In situations where a document has repeated headers, the ID is duplicated which is a big no-no for HTML validations (only one instance of an ID is allowed).
This is extremely common in changelogs where H3 headers are Added/Changed/Removed/Fixed etc. It flat out screws up accessibility as screen readers would likely ignore duplicate IDs and will cause formatting issues when an ID clashes with another outside the scope of the markdown render.
You cannot use IDs unless they are unique, so consider using "class" instead as a quick fix, and consider applying a configurable prefix to avoid clashing ('md-' perhaps).
To Reproduce
Steps to reproduce the behavior:
Create duplicate headers such as in this demo. Then copy rendered HTML into W3's HTML5 validator to see it fail validation
Expected behavior
No conflicts/render or screen reader issues.
Hi @alystair thanks for the bug report!
You can disable emitting id for headings with an option like so:
var options = { headerIds: false };
var str = `# test heading
Some words
## Subheading
Blahblah
## Subheading`;
var html = marked(str, options);
Was this behavior working in a previous version of marked and broke in the latest release?
I've only started using markedjs in the last month, but and I've always loaded it via your CDN link https://cdn.jsdelivr.net/npm/marked/marked.min.js
Definitely consider changing the default behaviour.
We are trying to make marked spec compliant so we are most likely going to remove the headers eventually since they are not part of any markdown spec.
I changed the title to reflect that this is not a bug we introduced but rather a proposal to change the default options so that marked is complaint with the spec.
This would be a breaking change so the soonest release this could land in would be 0.5.0
@UziTech I can implement this. Any reason to delay or should I change this soon?
I think the idea behind adding header IDs is so you can link to each header in a markdown document with a hash (i.e. # heading 2 can be linked to by using "https://site.com/page#heading-2"). Using classes instead of ids would remove this linking functionality.
Although it isn't in the spec it looks like GitHub adds a "-[number]" to the id if there are duplicates (demo)
Good point! Now the question is, do we implement this to match GitHub even though it isn't in the GFM spec?
My guess is they left it out of the spec because they didn't want to commit to it yet 🤷♂️
The reason it's true by default was to maintain consistency for users. Originally the default for Marked was to have the header ids with no way to turn them off. Making false the default could cause users to need to update their code.
@alystair is correct re id usage in HTML, which was a big reason for wanting to get out of the header id business; thereby, leaving it to developers to implement via extension and work with. I believe @UziTech is also correct in the history and why id is being used and why a post-processor like GitHub uses is probably a more viable approach.
@styfle why headerIds: false doesn't work? I have "marked": "^0.3.19"
Why this option is enabled in first place? It make no sense for me.
Who need this ID's?
Got it, it works with 4!
Why this option is enabled in first place?
@canyoudev See this comment.
why headerIds: false doesn't work?
The feature to disable heading ids wasn't implement until v4.0.0 as you have discovered.
This is also indicated in the options documentation.
Also note, I have found an implementation of heading IDs that we might want to adopt called github-slugger (probably want to use this as a reference instead of a new dependency).
There is another problem with header ids. The non-ASCII characters becomes - instead of understandable strings. I see the two options with this:
privet for привет)Also, I see another problem. What if I put an image into heading?
# Title 
Or if I put custom HTML into heading? What will be header id?
There's a workaround in
https://github.com/markedjs/marked/issues/919#issuecomment-325099458
const renderer = new marked.Renderer();
renderer.heading = function(text, level) {
return `<h${level} id="${text}">${text}</h${level}>`;
};
marked(mdString, { renderer: renderer });
This is easy to replace id to something like nanoid
But I think will be better to use github-slugger if we find a solution for problem with non-text into heading.
Good point, I created #1354
Outsourcing can be good for this as long as we can do a slower migration path e.g. keep the old style lexer override until we modify each and every comment that has references to existing id's in the DB. This definitely can be user content breaking across all sites that reference us but I realize that there are quite a few issues with auto "bookmark"ing and it could use some improvement.
@styfle
I wanted to get some early feedback on this
I'll see if I have a spare few to test the PR on our dev and see how much content really needs to be modified. Hopefully the outsourcing doesn't change their style output either because one DB traversal for this is enough imho. :)
After much discussion in the previous PR, I created a new one to fix this issue: #1401
Most helpful comment
I think the idea behind adding header IDs is so you can link to each header in a markdown document with a hash (i.e.
# heading 2can be linked to by using "https://site.com/page#heading-2"). Using classes instead of ids would remove this linking functionality.Although it isn't in the spec it looks like GitHub adds a "-[number]" to the id if there are duplicates (demo)