Crystal: Outdated API doc links in Google search (bad SEO)

Created on 15 Apr 2018  Â·  37Comments  Â·  Source: crystal-lang/crystal

For example, https://www.google.com/search?q=crystal+lang+namedtuple
finds https://crystal-lang.org/api/0.21.1/NamedTuple.html
(current version is 0.24.2).

This is bad because

1) people see old docs
2) the links from different versions fight each other for dominance instead of joining forces.

Python had this problem for a while, they seem to have solved it by adding a canonical ref to all their pages. Perhaps technically it's not the intended use of this tag, but it has definitely worked.
For example, https://www.google.com/search?q=python+socket+doc
finds the "latest" page
because https://docs.python.org/3.5/library/socket.html
contains <link rel="canonical" href="https://docs.python.org/3/library/socket.html" />

Proposed solution: edit every existing Crystal doc page in storage and add this tag.
For example,
https://crystal-lang.org/api/0.21.1/BigInt.html

--- a/BigInt.html
+++ b/BigInt.html
@@ -6,6 +6,7 @@
   <link href="css/style.css" rel="stylesheet" type="text/css" />
   <script type="text/javascript" src="js/doc.js"></script>
   <title>BigInt - github.com/crystal-lang/crystal</title>
+  <link rel="canonical" href="https://crystal-lang.org/api/latest/BigInt.html" />
 </head>
 <body>

I'm not entirely sure if this will work the same, because in Python the /latest/ (/3/) page actually exists as an alias, and is not a redirect. So maybe that would need to be changed as well.

infrastructure

Most helpful comment

I don't mind old versions gone from google search, we just should have a (link to a) version selector in the docs themselves.

All 37 comments

May be worth doing an experiment first: edit only one known badly linked page and see how it evolves in the search.

Yeah, that's really an issue. The proposed solution should work, although it does not exactly fit the intended purpose of canonical links according to RFC 6596. But it can be used for this and I don't think there is a reasonable alternative. Google webmaster docs even mention that not only duplicated but also similar pages can be consolidated using a canonical link.

What about showing a little message on the top of the site when you are not on the newest version or master?
So maybe something like this:
outdated.png

That would certainly be helpful but retroactively introducing manual edits to pages is not an easy task. The change I'm suggesting (add the same item to ALL pages) is the simplest possible operation of that type, and it can be added to newly generated pages immediately.

And such a message would not help to remove outdated API version from search engine results (at least not much).

It actually would, because the latest version would automatically have the most links pointed to it.

Regardless, we could do both, it's just harder.

Besides adding a canonical base url, the option to add a custom js would allow some tweaks for either adding a banner as other languages, analytics or edit. Depending on the project & host of the docs. WDYT?

@bcardiff Why not extend the idea to custom templates?

I don't think such a complete customization is either necessary nor particularly useful. Having the ability to inject some code into each page (for analytics etc.) should be sufficient.

If you need full customization, it's relatively easy to just create a custom HTML generator which uses the exported JSON data.

This is outside the scope. We don't even necessarily need any code changes to introduce the modification. Please just start with the experiment and manual changes :|

@Sija custom templates will either a) require to build the doc generator since the templates are .ecr and compiled inside the compiler. or b) switch to a template that are interpreted. Injecting a hand made .js file is enough to cover multiple other scenarios like the one I listed: edit page, GA, jump to newer version.

@oprypin and others. I've just manually edit https://crystal-lang.org/api/0.24.1/Array.html that is the top result for google:"crystal array" and append the canonical <link rel="canonical" href="https://crystal-lang.org/api/latest/Array.html" />. Let's see how the crawlers deal with that.

This is not resolved and the PR should not have been merged.

@oprypin Why not? According to Google docs solution provided here is correct.

@bcardiff @oprypin The experiment seems to have been successful: Google results for crystal array now ranks https://crystal-lang.org/api/0.24.2/Array.html as first result, which is the current redirect target of https://crystal-lang.org/api/latest/Array.html

The downside of this approach is, it seems that outdated API docs can't be discovered through Google search at all. This could however be useful in certain circumstances to figure out how a previous API version worked. I don't know if there is a valid solution for this, either. And it's most probably better to have the latest versions be more prominent. We just need to be aware that the backlog gets hidden from search.

I don't mind old versions gone from google search, we just should have a (link to a) version selector in the docs themselves.

@jhass It's not bad per se, but imagine some code using a method or type from stdlib that doesn't exist anymore. If you want to know about that method and don't find it in the current API docs, you'd probably try a web search. And it would be nice if it would eventually show up somewhere.

Although the PR was merged prematurely, it's still changeable since it's only used in master for now.

The use case pointed by @straight-shoota is important, but I am not sure what could be a better approach right now. For sure old docs could be changed, indexed or even regenerated eventually if needed.

Having a version selector could also be done with an injected JS ;-).

Maybe a future pass through docs generator could improve multiple version handling, or maybe some other integration.

Maybe we could remove the canonical link from outdated versions of the API docs... probably not directly when a new version is released but after a cooldown period (for example until another release). This way the current version would always point to the latest URL as canonical location. Older versions should loose importance over time so they can be allowed to show up on search results because the current one should hopefully rank higher..

I think this canonical change has not been applied to all old versions, but it would be good to do so.
Also, please reopen the issue until that is done.
(@sdogruyol)

I am just starting Crystal and hitting this issue a lot. For example https://www.google.co.jp/searchq=crystal+ordered+hash resulted in top search being https://crystal-lang.org/api/0.24.2/Hash.html instead of 0.26.1.

This seems to be a consistent issue for programming languages. My search for ruby hash just got documentation for v2.0.0. Rails searches often get outdated pages from apidock.com.

I wonder if it would help to avoid the redirect of https://crystal-lang.org/api/latest/Hash.html - that would increase the likelihood of people sharing links for latest and hopefully boost its SEO. I imagine it would also keep any deprecated pages searchable - https://crystal-lang.org/api/#{VERSION}/XXXXX.html

@guycall From a SEO perspective it might be better to keep links pointing to latest. But there is a semantic issue here: Usually, you want to link to a specific API version. In a new release, everything might have changed but that would also break the reference. In some cases, you might want links to always point to the latest version, but that's probably not as common.

I think that simply applying the canonical change to old versions would have a great effect but it still was not done for some reason. Only people with direct access to the host can do it though.

Does someone know what happen in the SEO realm when the canonical responds with a 404? That will happen when types got deprecated for example.

I am usually hesitant to touch already generated files. But it’s on my bucket add the canonical to all pages and also add some plain html banner to inform the user that there is a new version of the api.

@oprypin this implies back-porting the canonical change for each version starting from 0.20.0 (the older version the API is available), and regenerating all the docs.

@j8r, no, it really doesn't. Just write a script to add it with regular expressions or something. That's what I meant all along.

Does someone know what happen in the SEO realm when the canonical responds with a 404?

There's a question on StackExchange, though no really substantial answer: https://webmasters.stackexchange.com/questions/109449/what-is-the-seo-impact-of-canonical-links-pointing-to-404-pages

But the worst that can happen is that the page won't show up in search results. That's not really an issue since it's outdated anyway.

and also add some plain html banner to inform the user that there is a new version of the api.

This would be a great enhancement!

I would guess the version bar on apidock.com helped them a lot with their SEO. Even if a user landed from a Google search onto the wrong version, they could easily navigate to the correct version. Hence Google sees longer sessions on apidock.com and not the user back button to Google.

screen shot 2018-09-20 at 07 02 08

This obviously doesn't help get the latest version to the highest ranking in Google, but it definitely helps the user.

in crystal-lang/crystal-website#79 @ukd1 suggests weighting pages using a sitemap. I'm not sure how this would play out, but we could try it. It shouldn't be too difficult to set up.

So the canonical change, rather than being applied retroactively, was reverted in https://github.com/crystal-lang/crystal/pull/8348.

This also reverts #5990 which tried an alternative approach to solving the search priority issue using canonical URLs. But this completely removes older versions from search results.

Umm, that's good?


As it stands now, on Google you indeed do not run into any API docs pages between 0.25 and 0.31. I also suspect these pages boost /latest/ strongly enough that currently we're fortunate to almost always find latest docs in searches (0.33 at the moment).

So the confirmed working solution (also used by Python, which is a big deal) is abandoned, and the sitemap idea was started but also seems not used yet.


Also, according to my understanding, sitemaps would not help at all.
https://support.google.com/webmasters/answer/183668

Google does not currently consume the <priority> attribute in sitemaps.

It would not help at best. At worst (though unlikely) it could make things worse.

List only canonical URLs in your sitemaps. If you have two versions of a page, list only the (Google-selected) canonical in the sitemap. If you have two versions of your site (for example, www and non-www), decide which is your preferred site, and put the sitemap there, and add rel=canonical or redirects on the other site.

anything to fix this would be extremely helpful, just a note my other pain point for me is figuring out _when_ something was deprecated/changed/renamed, but that is beyond the scope of this

this completely removes older versions from search results.

Umm, that's good?

I don't think so. It means deprecated and removed features wouldn't show up in search results at all.

If sitemap priority really doesn't do anything and there's no other solution, we might have to return to canonical. That's probably the lesser evil. But if there's any chance, I'd like to find a way to keep old versions in the index.

Maybe you could somehow auto generate an index of removed symbols that
links to the old docs, then add back the rel=canonical? That way, searches
for current APIs will give the current results, but searches for deprecated
/ removed ones will give the index.

On Mon, Apr 6, 2020, 6:42 PM Johannes Müller notifications@github.com
wrote:

this completely removes older versions from search results.

Umm, that's good?

I don't think so. It means deprecated and removed features wouldn't show
up in search results at all.

If sitemap priority really doesn't do anything and there's no other
solution, we might have to return to canonical. That's probably the lesser
evil. But if there's any chance, I'd like to find a way to keep old
versions in the index.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/crystal-lang/crystal/issues/5952#issuecomment-610092405,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAM4YSMRXROKC2IB2RVNCS3RLJSEXANCNFSM4E2WSE2Q
.

ping? :runner:

If someone gives me access to where the docs are hosted I can add the canonical stuff.

I'll make PR to re-add --canonical-base-url. It seems to have been a mistake to remove that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oprypin picture oprypin  Â·  3Comments

Papierkorb picture Papierkorb  Â·  3Comments

will picture will  Â·  3Comments

relonger picture relonger  Â·  3Comments

lbguilherme picture lbguilherme  Â·  3Comments