Hey LH team! Would like to make proposal for a core Lighthouse SEO audit.
The Anchor href audit asserts that hyperlinks are crawlable. This audit falls within the SEO category. This audit would not ping the target link to check it's up.
This audit would be part of the SEO category with some sort of message to indicate links are crawlable, or they are not. When in DevTools, we can link to the failing anchor element.
There is a link text audit which is about the descriptiveness of link text. But this is more about checking the anchor is crawlable from an SEO perspective.
From some initial checks, seems most of the popular websites _do_ have anchor tags with href="#" or some sort of javascript: href and this audit may impact them. We would like to go through some of these cases and understand the reasoning behind them, e.g. developer convenience, technical limitations, and what some potential remedies could be e.g. better documentation/evangelism, outreach, a more relaxed audit.
Search engine crawlers help users find what they're looking for. Flagging to website owners that their links cannot be crawled may lead to fixes and thus improved search engine results for end-users.
Me (@umaar) to develop the audit, @AVGP on docs.
https://support.google.com/webmasters/answer/9112205?hl=en
https://moz.com/learn/seo/anchor-text
Looks like there's already a AnchorElements gatherer, so that should be perfect for this!
What do you think?
is the check basically that all <a> elements have a href attribute? non-empty value i guess?
I noticed that axe doesn't have this test, and that was kinda surprising to me. 🤔 Here's a thread on it where they were undecided if <a onclick=...> should fail: https://github.com/dequelabs/axe-core/issues/139
also this one https://github.com/dequelabs/axe-core/issues/1039
as mentioned in there, there's the <a name=foo> case to consider.
and <a> with an addEventListener handler attached.
do we have any more info on the common antipatterns this is trying to combat? i see the examples here, but wonder if we know of frameworks that have used these patterns. knowing some real world examples would help inform this audit better.
This is something we would definitely like to add in the SEO audits as it's a common issue we're seeing and we're advising for in the webmaster guidelines.
We're working with the crawler team to find out what cases we need to cover, but starting with missing or empty href is a solid starting point.
While skip-links with no href are valid from an accessibility standpoint (I guess?), these could be done with fragments which would remove reliance on JS and we're considering not failing such links.
This scenario is an issue in content discovery for search crawlers, that's why we're looking at it for the SEO audits section.
Looks like axe had the rule but then removed it, also see the corresponding docs for the href-no-hash rule.
I'm not sure on exactly what can be crawled and what cannot, but in the meantime here's a gist of potentially non-crawlable anchors from popular websites. Looks like there are some of the following:
href="#"role="button" which gets intercepted by JS<a id="top"></a><a name="top"></a>href="javascript:void(0)"onclick="remove()"ng-click="remove()"href="javascript:;"That's the "what". Skimming through the gist should give us a better clue as to "why".
With JS frameworks, the vibe I get is that they'll support outputting regular hyperlinks, but sometimes conventions emerge which do something different.
Thoughts on starting out with an audit which only checks for a missing/empty href? And then we could tweak it when we learn what the crawler actually does. Can also do any more research we think would be useful!
Do we have any additional info on how anchors are crawled? When a crawler can't parse a href, does it click the anchor tag to see what URL it lands on?
Shall we make a list on what should pass/fail? Here's a strict starting point, any thoughts?
<a href="#top"> pass<a href="mailto:[email protected]"> pass<a href="https://example.com"> pass<a href="foo"> pass<a href="/foo"> pass<a href="#:~:text=string"> pass (text fragments are new and hopefully accepted by crawlers?)<a href="#"> fail<a href=""> fail<a href> fail<a href="javascript:void(0)"> fail<a href="file:///foo.png"> fail<a onclick="window.location='http://example.com'"> fail<a href="javascript:void(0)"> fail<a name="top"> fail? (The name attribute on the a element is obsolete. Consider putting an id attribute on the nearest container instead. source)<a id="top"> fail? (should we be recommending that these sorts of IDs are put elsewhere on elements which are not anchors)<a> fail? (it's allowed in the spec though, If the href attribute is not specified, the element represents a placeholder hyperlink.)Depending on what we decide, the anchor element gatherer already returns a href, however it's the computed property rather than the attribute, e.g.
<a> -> resolves to an empty string ''href="" -> resolves to the current page http://example.com/href="#thing" -> http://example.com/#thinghref="#" -> http://example.com/#To make things a bit easier, would it make sense to extend the anchor elements gather to return a rawHref property which contains the result of el.getAttribute('href')?
Few things to consider here:
rawHref property sound sensibleI think the rules make sense with the note that (16) is a fail, IMHO.
Even though people may use it legitimately according to the spec, it invites room for error, e.g. some older framework once created <a router-link="/something"> links that would pass if (16) was a passing rule. There's no way we can exhaustively catch all possible properties that frameworks might come up with, so I think we'd fail here and users may choose to ignore the guidance on the grounds of "works as intended" for them.
@umaar did we have items to follow up on or did we end up deciding they weren't worth it and this can be closed? (<span href=">, etc)
Think we're good! The <span href=""> is nothing we need to action on. The only other thing might be some web.dev docs that I was asking about, but I'm going to have a catchup with Martin this week anyway, so will get some confirmation about that then!
I'll need to figure out if we edit this ourselves, if Kayce helps us or if
Lizzi is the one to ask.. I'll find out tomorrow I think =)
On Tue, May 26, 2020, 23:02 Umar Hansa notifications@github.com wrote:
Think we're good! The is nothing we need to action on. The
only other thing might be some web.dev docs that I was asking about, but
I'm going to have a catchup with Martin this week anyway, so will get some
confirmation about that then!—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-634277222,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAC2MRQP47QJQ3ZA7CBG3Y3RTQU5VANCNFSM4MI3JGKA
.
@umaar In a 3rd-party pagination component, the developers chose <a rel="nofollow"> over the <button> tag for the pagination buttons. As a result of the lighthouse update, our seo score took a hit.
Uncrawlable Link
1<a rel="nofollow">
2<a rel="nofollow">
...
Sure but, shouldn't we be better off avoiding the warning altogether.
Adding it to the exception list.
On Mon, Jul 20, 2020 at 3:22 AM Umar Hansa notifications@github.com wrote:
@decimoseptimo https://github.com/decimoseptimo oh could they add a href?
Maybe like page 2—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-660941705,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEAXN4YLGOQW6YBOUCGEDLLR4QLFXANCNFSM4MI3JGKA
.
Interesting case.
On one hand, the warning is accurate b/c that invariably is an uncrawlable
link. Yet, the nofollow tells us that you don't care about this link wrt
crawling. I think filtering nofollow links from the audit makes sense and I
am sorry I missed that in the original spec for the audit. 🙏
On Mon, Jul 20, 2020, 23:22 Miguel Valenzuela notifications@github.com
wrote:
Sure but, shouldn't we be better off avoiding the warning altogether.
Adding it to the exception list.On Mon, Jul 20, 2020 at 3:22 AM Umar Hansa notifications@github.com
wrote:@decimoseptimo https://github.com/decimoseptimo oh could they add a
href?
Maybe like page 2—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-660941705
,
or unsubscribe
<
https://github.com/notifications/unsubscribe-auth/AEAXN4YLGOQW6YBOUCGEDLLR4QLFXANCNFSM4MI3JGKA.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleChrome/lighthouse/issues/10590#issuecomment-661339969,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAC2MRUND4AIQDTSYUX23DTR4SYPVANCNFSM4MI3JGKA
.
Don't know if this is the right place to ask this. In May this year, we introduced <a href=""></a> into some of our pages to overcome a minor problem with JS overlay focus. Our site visitors started to drop soon after that. Not sure if this change is the cause. Would appreciate is someone can advise if we have impacted SEO with this change. The HTML itself is valid.
BTW, Lighthouse returns an SEO score of 100 even with this anchor tag.