Almanac.httparchive.org: Finalize assignments: Chapter 10. SEO

Created on 21 May 2019  路  17Comments  路  Source: HTTPArchive/almanac.httparchive.org

Section | Chapter | Authors | Reviewers
-- | -- | -- | --
II. User Experience | 10. SEO | @rachellcostello @ymschaap @AVGP | @clarkeclark @andylimn @voltek62

Due date: To help us stay on schedule, please complete the action items in this issue by June 3.

To do:

  • [x] Assign subject matter experts (coauthors)
  • [x] Assign peer reviewers
  • [x] Finalize metrics

Current list of metrics:

  • Structured data rich results eligibility (ratings, search, etc,)
  • Lang attribute usage and mistakes (lang='en')
  • <link> rel="amphtml" (AMP)
  • <link> hreflang="en-us" (localisation usage)
  • Breakdown of type of structured data served (ld+json, microformatting, schema.org + what @type)?
  • Indexability - looking at meta tags like <meta> noindex, <link> canonicals.
  • <meta> description + <title> (presence & length)
  • Status codes and whether pages are accessible - 200, 3xx, 4xx, 5xx.
  • Content - looking at word count, thin pages, header usage, alt attributes images
  • Linking - extract <a href> count per page (internal + external)
  • Linking - fragment URLs (together with SPAs to navigate content)
  • robots.txt (It is mentioned in Lighthouse, can we parse the content or only confirm its existence? E.g. check if has a sitemap reference - seems it does list the potential issues)
  • If the desktop site is responsive/mobile-ready, or a specific mobile site (redirect, UA)? (Can we find if these are different sites?)
  • Descriptive link text usage (available in Lighthouse data)
  • speed metrics (FCP, server response time)

馃憠 AI (coauthors): Finalize which metrics you might like to include in an annual "state of SEO" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to you, the subject matter experts, to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.

The metrics should paint a holistic, data-driven picture of the SEO landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.

Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.

Additional resources:

Most helpful comment

@rachellcostello @ymschaap @AVGP @clarkeclark @andylimn @voltek62 if everyone's happy with the current list of metrics in https://github.com/HTTPArchive/almanac.httparchive.org/issues/12#issue-446806234 could you tick the final TODO checkbox item and close this issue? Thanks!

All 17 comments

Rick, love to help out as a peer reviewer

Great, thanks for volunteering! Just added you as a reviewer and invited you to the @HTTPArchive/reviewers team.

Hey Rick, I'd be keen to peer review too!

@andylimn great, thanks! Just added you.

@ymschaap could you go to https://github.com/HTTPArchive and accept the invitation to the Authors team? That will enable me to assign this issue to you.

Re: SPA-only sites % (so which don鈥檛 support SSR)
So I tried multiple ways to expose SPA's without SSR but couldn't find a good query without too many false positives. So I would drop that (it would've been interesting).

Looking at the Lighthouse SEO data, there might be some other interesting metrics.

@rviscomi Where do I make changes to the final list of metrics? The Web Almanac Brainstorm Google doc or propose in this thread?

This issue will be the canonical source of metrics, so feel free to use the doc to iterate then copy them here when you're ready.

Current list of metrics:

  • Structured data rich results eligibility (ratings, search, etc,)
  • Lang attribute usage and mistakes (lang='en')
  • <link> rel="amphtml" (AMP)
  • <link> hreflang="en-us" (localisation usage)
  • Breakdown of type of structured data served (ld+json, microformatting, schema.org + what @type)?
  • Indexability - looking at meta tags like <meta> noindex, <link> canonicals.
  • <meta> description + <title> (presence & length)
  • Status codes and whether pages are accessible - 200, 3xx, 4xx, 5xx.
  • Content - looking at word count, thin pages, header usage, alt attributes images
  • Linking - extract <a href> count per page (internal + external)
  • Linking - fragment URLs (together with SPAs to navigate content)
  • robots.txt (It is mentioned in Lighthouse, can we parse the content or only confirm its existence? E.g. check if has a sitemap reference - seems it does list the potential issues)
  • If the desktop site is responsive/mobile-ready, or a specific mobile site (redirect, UA)? (Can we find if these are different sites?)
  • Descriptive link text usage (available in Lighthouse data)
  • speed metrics (FCP, server response time)

@rachellcostello want to add/change anything missing?

Here's my feedback on the current list of metrics:

In response to your question @ymschaap - pagination won't usually be relevant for homepages, just sections of websites like category pages listing products or blog sections listing articles.

I wouldn't include the meta keywords tag as this isn't used by search engines anymore so it's become kind of obsolete. Page titles and meta descriptions should definitely be included though.

For canonical tags, it would be interesting to see if they are self-referencing, if it's possible to check if the URL of the page and the URL in the canonical tag are an exact match or not?

It might also be useful to add another level of detail to the links information by classifying them by type. E.g. a href links, onclick links, JavaScript links etc. Martin Splitt's slide on problematic links for SEO is a great example of the bad types to watch out for!

Screenshot 2019-05-28 at 18 15 21

I like the idea of having a speed metric. TTI or FCP would be useful from a UX perspective, and something like server response time would be useful from a search engine crawling perspective.

Everything else is looking good!

Great, I updated the 'current list of metrics' with your remarks and will move it also in the 'brainstorm' doc.

If there's still time / space, I'd be happy to review or help in whatever shape or form :)

also @ymschaap could we flag fragment URLS ("#") that are used to load different content in SPAs as problematic?

Yes would be great to have you Martin! Adding you as a coauthor, let me know if you'd prefer to review.

Awesome, thanks @rviscomi - I'd love to help author it :)

@AVGP initially SPAs and their implementation were on the list, but I couldn't figure out a reliant way to flag those. But yes, lets now add fragment URLs in there and maybe we can find a query to get the right data out.

@rachellcostello @ymschaap @AVGP @clarkeclark @andylimn @voltek62 if everyone's happy with the current list of metrics in https://github.com/HTTPArchive/almanac.httparchive.org/issues/12#issue-446806234 could you tick the final TODO checkbox item and close this issue? Thanks!

The current list of metrics is perfect.

Thanks everyone!!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rviscomi picture rviscomi  路  5Comments

rviscomi picture rviscomi  路  6Comments

MSakamaki picture MSakamaki  路  4Comments

rviscomi picture rviscomi  路  3Comments

MSakamaki picture MSakamaki  路  6Comments