Lighthouse: New audit proposal (SEO): Anchor href crawlability

Created on 15 Apr 2020 · 12Comments · Source: GoogleChrome/lighthouse

Hey LH team! Would like to make proposal for a core Lighthouse SEO audit.

Provide a basic description of the audit

The Anchor href audit asserts that hyperlinks are crawlable. This audit falls within the SEO category. This audit would not ping the target link to check it's up.

How would the audit appear in the report?

This audit would be part of the SEO category with some sort of message to indicate links are crawlable, or they are not. When in DevTools, we can link to the failing anchor element.

How is this audit different from existing ones?

There is a link text audit which is about the descriptiveness of link text. But this is more about checking the anchor is crawlable from an SEO perspective.

What % of developers/pages will this impact?

From some initial checks, seems most of the popular websites _do_ have anchor tags with href="#" or some sort of javascript: href and this audit may impact them. We would like to go through some of these cases and understand the reasoning behind them, e.g. developer convenience, technical limitations, and what some potential remedies could be e.g. better documentation/evangelism, outreach, a more relaxed audit.

How is the new audit making a better web for end users?

Search engine crawlers help users find what they're looking for. Flagging to website owners that their links cannot be crawled may lead to fixes and thus improved search engine results for end-users.

What is the resourcing situation?

Me (@umaar) to develop the audit, @AVGP on docs.

Any other links or documentation that we should check out?

https://support.google.com/webmasters/answer/9112205?hl=en
https://moz.com/learn/seo/anchor-text
Looks like there's already a AnchorElements gatherer, so that should be perfect for this!

What do you think?

P1.5 docs

Source

umaar

All 12 comments

is the check basically that all <a> elements have a href attribute? non-empty value i guess?

I noticed that axe doesn't have this test, and that was kinda surprising to me. 🤔 Here's a thread on it where they were undecided if <a onclick=...> should fail: https://github.com/dequelabs/axe-core/issues/139
also this one https://github.com/dequelabs/axe-core/issues/1039

as mentioned in there, there's the <a name=foo> case to consider.
and <a> with an addEventListener handler attached.

do we have any more info on the common antipatterns this is trying to combat? i see the examples here, but wonder if we know of frameworks that have used these patterns. knowing some real world examples would help inform this audit better.

paulirish on 16 Apr 2020

This is something we would definitely like to add in the SEO audits as it's a common issue we're seeing and we're advising for in the webmaster guidelines.

We're working with the crawler team to find out what cases we need to cover, but starting with missing or empty href is a solid starting point.

While skip-links with no href are valid from an accessibility standpoint (I guess?), these could be done with fragments which would remove reliance on JS and we're considering not failing such links.

This scenario is an issue in content discovery for search crawlers, that's why we're looking at it for the SEO audits section.

AVGP on 16 Apr 2020

Looks like axe had the rule but then removed it, also see the corresponding docs for the href-no-hash rule.

I'm not sure on exactly what can be crawled and what cannot, but in the meantime here's a gist of potentially non-crawlable anchors from popular websites. Looks like there are some of the following:

href="#"
no href but then a role="button" which gets intercepted by JS
<a id="top"></a>
<a name="top"></a>
href="javascript:void(0)"
no href but onclick="remove()"
no href but ng-click="remove()"
href="javascript:;"

That's the "what". Skimming through the gist should give us a better clue as to "why".

With JS frameworks, the vibe I get is that they'll support outputting regular hyperlinks, but sometimes conventions emerge which do something different.

Thoughts on starting out with an audit which only checks for a missing/empty href? And then we could tweak it when we learn what the crawler actually does. Can also do any more research we think would be useful!

umaar on 16 Apr 2020

👍1

Do we have any additional info on how anchors are crawled? When a crawler can't parse a href, does it click the anchor tag to see what URL it lands on?

Shall we make a list on what should pass/fail? Here's a strict starting point, any thoughts?

<a href="#top"> pass
<a href="mailto:[email protected]"> pass
<a href="https://example.com"> pass
<a href="foo"> pass
<a href="/foo"> pass
<a href="#:~:text=string"> pass (text fragments are new and hopefully accepted by crawlers?)
<a href="#"> fail
<a href=""> fail
<a href> fail
<a href="javascript:void(0)"> fail
<a href="file:///foo.png"> fail
<a onclick="window.location='http://example.com'"> fail
<a href="javascript:void(0)"> fail
<a name="top"> fail? (The name attribute on the a element is obsolete. Consider putting an id attribute on the nearest container instead. source)
<a id="top"> fail? (should we be recommending that these sorts of IDs are put elsewhere on elements which are not anchors)
<a> fail? (it's allowed in the spec though, If the href attribute is not specified, the element represents a placeholder hyperlink.)

Depending on what we decide, the anchor element gatherer already returns a href, however it's the computed property rather than the attribute, e.g.

<a> -> resolves to an empty string ''
href="" -> resolves to the current page http://example.com/
href="#thing" -> http://example.com/#thing
href="#" -> http://example.com/#

To make things a bit easier, would it make sense to extend the anchor elements gather to return a rawHref property which contains the result of el.getAttribute('href')?

Few things to consider here:

What does the crawler do (and does it penalise or simply ignore some of those above examples)
How strict should this audit be considering the popularity of those techniques
Implementation wise, does extending the anchor elements gatherer to return a rawHref property sound sensible

umaar on 23 Apr 2020

I think the rules make sense with the note that (16) is a fail, IMHO.
Even though people may use it legitimately according to the spec, it invites room for error, e.g. some older framework once created <a router-link="/something"> links that would pass if (16) was a passing rule. There's no way we can exhaustively catch all possible properties that frameworks might come up with, so I think we'd fail here and users may choose to ignore the guidance on the grounds of "works as intended" for them.

AVGP on 24 Apr 2020

👍1

@umaar did we have items to follow up on or did we end up deciding they weren't worth it and this can be closed? (<span href=">, etc)

patrickhulce on 26 May 2020

Think we're good! The <span href=""> is nothing we need to action on. The only other thing might be some web.dev docs that I was asking about, but I'm going to have a catchup with Martin this week anyway, so will get some confirmation about that then!

umaar on 26 May 2020

👍1

I'll need to figure out if we edit this ourselves, if Kayce helps us or if
Lizzi is the one to ask.. I'll find out tomorrow I think =)

On Tue, May 26, 2020, 23:02 Umar Hansa notifications@github.com wrote:

Think we're good! The is nothing we need to action on. The
only other thing might be some web.dev docs that I was asking about, but
I'm going to have a catchup with Martin this week anyway, so will get some
confirmation about that then!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-634277222,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAC2MRQP47QJQ3ZA7CBG3Y3RTQU5VANCNFSM4MI3JGKA
.

AVGP on 26 May 2020

👍1

@umaar In a 3rd-party pagination component, the developers chose <a rel="nofollow"> over the <button> tag for the pagination buttons. As a result of the lighthouse update, our seo score took a hit.

Uncrawlable Link
1<a rel="nofollow">
2<a rel="nofollow">
...

decimoseptimo on 20 Jul 2020

Sure but, shouldn't we be better off avoiding the warning altogether.
Adding it to the exception list.

On Mon, Jul 20, 2020 at 3:22 AM Umar Hansa notifications@github.com wrote:

@decimoseptimo https://github.com/decimoseptimo oh could they add a href?
Maybe like page 2

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-660941705,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEAXN4YLGOQW6YBOUCGEDLLR4QLFXANCNFSM4MI3JGKA
.

decimoseptimo on 20 Jul 2020

Interesting case.

On one hand, the warning is accurate b/c that invariably is an uncrawlable
link. Yet, the nofollow tells us that you don't care about this link wrt
crawling. I think filtering nofollow links from the audit makes sense and I
am sorry I missed that in the original spec for the audit. 🙏

On Mon, Jul 20, 2020, 23:22 Miguel Valenzuela notifications@github.com
wrote:

Sure but, shouldn't we be better off avoiding the warning altogether.
Adding it to the exception list.

On Mon, Jul 20, 2020 at 3:22 AM Umar Hansa notifications@github.com
wrote:

@decimoseptimo https://github.com/decimoseptimo oh could they add a
href?
Maybe like page 2

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-660941705
,
or unsubscribe
<
https://github.com/notifications/unsubscribe-auth/AEAXN4YLGOQW6YBOUCGEDLLR4QLFXANCNFSM4MI3JGKA

.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-661339969,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAC2MRUND4AIQDTSYUX23DTR4SYPVANCNFSM4MI3JGKA
.

AVGP on 20 Jul 2020

👍1

Don't know if this is the right place to ask this. In May this year, we introduced <a href=""></a> into some of our pages to overcome a minor problem with JS overlay focus. Our site visitors started to drop soon after that. Not sure if this change is the cause. Would appreciate is someone can advise if we have impacted SEO with this change. The HTML itself is valid.

BTW, Lighthouse returns an SEO score of 100 even with this anchor tag.