Nim: Doc anchors invalid

Created on 7 Oct 2018  路  27Comments  路  Source: nim-lang/Nim

Example page:

https://nim-lang.org/docs/strutils.html

The anchor for toHex is:

~
toHex%2CBiggestInt%2CPositive
~

and the link for toHex is:

~~~

toHex%2CBiggestInt%2CPositive

~~~

the link in this case is not valid, because it has not been properly encoded. It
should be like this:

~~~

toHex%252CBiggestInt%252CPositive

~~~

because of this none of the links actually work. I say the root cause here is
the anchor - by using a reserved URI character you are asking for trouble. A
better anchor id would be:

~
toHex-BiggestInt-Positive
~

or here is how Ruby does it:

~
toHex-2CBiggestInt-2CPositive
~

References

http://wikipedia.org/wiki/Percent-encoding#Percent-encoding_the_percent_character

Easy Showstopper Stdlib

Most helpful comment

because of this none of the links actually work

Are you sure? Because all the links are working fine here.

All 27 comments

because of this none of the links actually work

Are you sure? Because all the links are working fine here.

I'm also on Firefox and it works.

The % you see are produced by the url-encoding process so re-encoding them as %25 makes no sense (to me, at least).

@LemonBoy the root cause is that the anchors have been erroneously encoded:

~
toHex%2CBiggestInt%2CPositive
~

the fix is to either stop encoding the anchors:

~
toHex,BiggestInt,Positive
~

or properly encode the links:

~
toHex%252CBiggestInt%252CPositive
~

I see. Paging @kaushalmodi here since he checked the generated HTML validity.

or properly encode the links:

I kept misreading that sentence and thought you meant the anchors. My bad.

toHex%252CBiggestInt%252CPositive

That's wrong encoding..you are encoding the % meant for encoding the ,. So your suggestion will decode to toHex%2CBiggestInt%2CPositive.

In current links, %2C decodes to ,.

Hmm, so the links work fine on Firefox and Chrome.

But Edge is being oversmart and is re-encoding the % in "%2C" with "%25". So Edge is corrupting the links to become "%252C". Need to investigate ..

@kaushalmodi ideally the anchor would be like this:

toHex,BiggestInt,Positive

and the link would be like this:

toHex%2CBiggestInt%2CPositive

however someone has erroneously encoded the anchors, like this:

toHex%2CBiggestInt%2CPositive

this is the root cause, because if you want to link to this now, you have to
deal with the reserved character %. if you insist on leaving the anchors as
is, you must encode the percent sign or you are in violation of RFC3986, as
previously commented.

There is no good reason to encode the anchors as has been done, as a comma
is not a reserved character with respect to an anchor.

Thanks for the details. Can you help me confirm that I understood properly:

  • By anchors, you mean the id attributes in a tags? And they shouldn't contain % encoding?
  • By links, you mean the href attributes in a tags? And they should be encoded?

@kaushalmodi

  1. your definitions are correct

  2. anchors can contain any character, as long as it is properly escaped. in the sense of an HTML attribute value - % is not special - but it is not a good idea to use it because youd then have to encode any links to that anchor - further - even if you DID happen to want to use a reserved character - like " or & - you would NOT percent encode those even then - you would HTML encode those like " &

  3. if you are linking to something that has no reserved characters, then you dont have to do anything - you can just do #whatever-the-anchor-is - its only when you decide to use reserved characters in your anchor, like % - then at that point you would need to percent encode links to that anchor

  4. simple solution - either stop using reserved characters with your anchors % - or if so then encode then with the links

https://stackoverflow.com/questions/5320177/what-values-can-i-put-in-an-html-attribute-value

PS: please dont take this the wrong way - but this is really basic HTML - easily searchable how to do this stuff online - it makes me sad to have to explain this stuff :(

PS: please dont take this the wrong way - but this is really basic HTML

I am just trying to help fix this. If I don't understand anything, I will ask. It's as simple as that.

I am not a web dev.

@kaushalmodi it seems odd then that you would be in charge of checking the generated HTML validity - if you dont know HTML

@Araq So it looks like we need another special $ var for nimdoc.cfg that encodes only the % chars in the id attr.

Earlier, we had $itemSym for the id attr in nimdoc.cfg. As the unencoded % chars were causing problems with HTML validation, I thought that encoding that attr with $itemSymOrIDEnc would fix that.

It looked like that fixed that becauses htmltest tool passed, and the links were functional on browsers like Firefox and Chrome. I should pull out the IE next time too.

So now we need a $itemIDEnc that encodes just the % chars in the id attr. Should I add that?


@cup

It seems odd then that you would be in charge of checking the generated HTML validity?

I am not "in charge".

I was simply using a tool to check the validity, and made fixes until I saw the checks pass. I was simply trying to fix all the broken links in the doc (which I did). I wish your HTML expertise would have come of help as I was slogging through that.

Now it would be nice if you can just help fix this instead of making unnecessary comments. Better yet, say that you are working on this and submit a PR. Or help me do it.

Ref: https://github.com/nim-lang/Nim/issues/9097#issuecomment-425528039

The uri module probably needs an encodeId which is different from encodeUrl. The former would encode just the % chars. Does that proposal sound good?

I hope to then use the same proc in both docgen and rstgen so that the links between theindex and rest of the docs match.

@kaushalmodi that does not sound good. if given this

~
toHex%2CBiggestInt%2CPositive
~

encodeUrl should produce this

~
toHex%252CBiggestInt%252CPositive
~

for example with javascript:

~
encodeURIComponent('toHex%2CBiggestInt%2CPositive')
"toHex%252CBiggestInt%252CPositive"
~

if nim encodeUrl is not doing that - then it needs to be fixed - not to add another function

I believe that there is a misunderstanding..

  • The original link is toHex,BiggestInt,Positive.
  • encodeUrl is then used to encode that and put the same resultant string in both id and href attributes (in different relevant a tags of course).
  • So we ended up with the same toHex%2CBiggestInt%2CPositive in both id and href.

encodeUrl should produce this

toHex%252CBiggestInt%252CPositive

What you are asking for (correct me if wrong) is to double-encode toHex,BiggestInt,Positive only for the id attr i.e. "toHex,BiggestInt,Positive".encodeUrl.encodeUrl. Now that does not look right to me.

So I was proposing a separate encodeId that encodes just %.

@cup To clear this up, let's start from scratch ( https://github.com/nim-lang/Nim/issues/9097 ).

Take one of the links in nim docs, like: https://nim-lang.github.io/Nim/system.html#+%,IntMax32,IntMax32.

  1. How would the id tag linking to +%,IntMax32,IntMax32 look like in system.html?
  2. How would the links (href) linking to that id look like?

My solution was to set the id to %2B%25%2CIntMax32%2CIntMax32 and the href to https://nim-lang.github.io/Nim/system.html#%2B%25%2CIntMax32%2CIntMax32 (which as we now know is incorrect for the id).

So the id should now become +%25,IntMax32,IntMax32? If so? We are now encoding just the % for id, right?

@kaushalmodi here is what needs to be done

  1. start with string toHex,BiggestInt,Positive

  2. put that exact string - no encoding - as the ID

  3. run that string through encodeUrl to produce toHex%2CBiggestInt%2CPositive

  4. put that string as HREF

OR - if you want to do it the hard way (DONT DO THIS):

  1. start with string toHex,BiggestInt,Positive

  2. run that string through encodeUrl to produce toHex%2CBiggestInt%2CPositive

  3. put that string as the ID

  4. run that string through encodeUrl to produce toHex%252CBiggestInt%252CPositive

  5. put that string as HREF

@cup

See my earlier comment. The main reason why this encoding was done was because we have URLs with literal % chars: https://nim-lang.github.io/Nim/system.html#+%,IntMax32,IntMax32.

put that exact string - no encoding - as the ID

Now if we do <a id="+%,IntMax32,IntMax32">, wouldn't that be a problem?

@kaushalmodi if you start with this string:

~
+%,IntMax32,IntMax32
~

before you set it as the ID - you need to check for reserved characters

~
" &
~

http://stackoverflow.com/questions/5320177/can-i-put-in-an-html-attribute-value

since this string has none - it can be passed as is to the ID. then before
you can set it as the HREF - you need to check for reserved characters

~
! * ' ( ) ; : @ & = + $ , / ? # [ ]
~

http://wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters

since we have a match - we need to percent encode before we can set it as the
HREF:

~
%2B%25%2CIntMax32%2CIntMax32
~

@cup

Thanks.

I have made your suggested fix.

  • It passes htmltest.
  • Links still work on Firefox.

Can you try this out on IE: https://peaceful-stonebraker-57010c.netlify.com/ ?

Once you give a go, I'll submit the PR.

@kaushalmodi works :)

however i would like to review the PR before its merged - if that is acceptable

@cup Awesome. Sure (though you will be surprised seeing how simple the PR is :P)

@kaushalmodi this still seems to be broken:

http://nim-lang.org/docs/strutils.html

@cup Which specific links are broken? Also can you confirm that on the devel docs instead of stable docs? I don't know if this fix got applied to stable docs.

Devel version of strutils docs: https://nim-lang.github.io/Nim/strutils.html

@kaushalmodi ah - thanks for clearing that up - here is result:

devel docs | stable docs
-----------|------------
fixed | broken

i will use devel docs until the stable docs are updated - cheers

Was this page helpful?
0 / 5 - 0 ratings