Almanac.httparchive.org: Triage all proposed metrics (396 of 396 done)

Created on 4 Jun 2019 · 24Comments · Source: HTTPArchive/almanac.httparchive.org

Assigned: @HTTPArchive/data-analysts team

Due date: No later than July 1

Any metrics that require augmenting the test infrastructure (eg custom metrics) must be ready to go when the July crawl starts. This ensures that when the crawl completes at the end of July, we can query the dataset and pass it off to authors for interpretation in August.

As of now there are 350+ metrics spread over 20 chapters.

Part | Chapter | Able To Query | Not Feasible | Grand Total
-- | -- | -- | -- | --
I | 01. JavaScript | 24 | 1 | 25
I | 02. CSS | 39 | 7 | 46
I | 03. Markup | 4 | 1 | 5
I | 04. Media | 20 | 5 | 25
I | 05. Third Parties | 13 | | 13
I | 06. Fonts | 40 | 7 | 47
II | 07. Performance | 24 | | 24
II | 08. Security | 36 | 5 | 41
II | 09. Accessibility | 32 | 6 | 38
II | 10. SEO | 15 | | 15
II | 11. PWA | 6 | | 6
II | 12. Mobile web | 19 | 2 | 21
III | 13. Ecommerce | 10 | 3 | 13
III | 14. CMS | 11 | 1 | 12
IV | 15. Compression | 3 | 1 | 4
IV | 16. Caching | 14 | 1 | 15
IV | 17. CDN | 13 | 3 | 16
IV | 18. Page Weight | 3 | | 3
IV | 19. Resource Hints | 10 | | 10
IV | 20. HTTP/2 | 14 | 3 | 17
| Grand Total | 350 | 46 | 396

I've copied all of the metrics for each chapter to this sheet (named "Metrics Triage"). To edit the sheet please give me your email address to add to the editors list. What we need to do is go through the list of metrics for each chapter and assign a status from one of the following:

To Be Reviewed
Need More Info
Not Feasible
Able To Query
Custom Metric Required
Custom Metric Written
Query Written

The lifecycle is:

All metrics start as TBR
- Move to NMI if the metric is vaguely worded or otherwise unclear what is being asked for. Get in touch with the chapter author(s) and straighten out what the expected data should look like.
- Move to NF if the metric cannot be queried using the HTTP Archive dataset or other publicly available datasets on BigQuery (eg CrUX). This is the "done" state for metrics which cannot progress any further.
- Move to ATQ if the metric is able to be queried from the dataset based on the latest schema
- Move to QW if the metric has a corresponding query written. This is the ideal "done" state for all metrics.
- Move to CMR if the metric can only be queried with the addition of a custom metric
- Move to CMW if the metric has had a corresponding custom metric written. Metrics in this state must also have a corresponding query written and moved to QW when complete.

Custom metrics should only be added as a last resort and must adhere to strict performance requirements. We test on millions of pages so any complex/slow scripts would impede the crawl. Because we anticipate needing many custom metrics, we'll implement everything as individual functions within a single custom metric whose output is a JSON-encoded object with each result as its own sub-property. More on this when we get there.

Add your name in the Analyst column to take responsibility for moving it through the metric lifecycle.

Once we're ready to begin writing queries, we will create a thread on https://discuss.httparchive.org for each chapter, listing all queryable metrics. Hopefully we can crowdsource some of the querying by tapping into the power users on the forum.

analysis

Source

rviscomi

Most helpful comment

Today's the day! I've marked all 5 remaining Need More Info metrics as Not Feasible. We're finally done with the triage! Thanks again to the entire @HTTPArchive/data-analysts team for your hard work going through these ~400 metrics.

I'll be syncing the custom metrics with the HTTP Archive server today so they're included in tomorrow's July crawl.

rviscomi on 30 Jun 2019

🚀2 🎉2

All 24 comments

Copy will do this

On Tue, Jun 4, 2019, 11:05 Rick Viscomi notifications@github.com wrote:

Assigned #33
https://github.com/HTTPArchive/almanac.httparchive.org/issues/33 to
@tjmonsi https://github.com/tjmonsi.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/HTTPArchive/almanac.httparchive.org/issues/33?email_source=notifications&email_token=AAUF5VV6YYNDFWW4VGWNSDDPYXLVVA5CNFSM4HSXT6X2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORZBXJOI#event-2386785465,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUF5VUMYMD5VSPLUTFXMNDPYXLVVANCNFSM4HSXT6XQ
.

tjmonsi on 4 Jun 2019

👍1

@HTTPArchive/data-analysts reminder to please go through the Metrics Triage sheet when you have the time.

There was a lot of info in the first post so here's a condensed version:

Request edit access to the sheet. I don't have everyone's email address otherwise I'd give access now.
Go through the Metrics Triage tab and add your GitHub name to the Analyst column for any metrics you'll be responsible for.
Triage metrics marked To Be Reviewed and change their status depending on their feasibility.

The next step will be to start writing queries and custom metrics using the HTTP Archive forum to discuss solutions.

rviscomi on 6 Jun 2019

I understand we can create custom metrics for the next crawl. Which is really cool. I'm just unsure what this enables. For example for the SEO Chapter we would want to count the number of h1, h2, h3 elements and their string length. How would I go and create a custom metric? Do you have an example of a custom metric (e.g. piece of code)? Are there docs? Who tests and writes the code?

Once I understand the custom metrics capabilities, I could fill out the Metrics Triage sheet.

ymschaap on 14 Jun 2019

👍1

Good question! Custom metrics are JS snippets you can execute on each page. They are run by our legacy crawl system and the code for existing metrics is here: https://github.com/HTTPArchive/legacy.httparchive.org/tree/master/custom_metrics

For example, see the doctype custom metric. To test it, you can run it directly on webpagetest.org under the "Custom" tab:

Note that all WPT custom metrics must have [metricName] at the start of the script. This is excluded in the HTTP Archive code and generated automatically based on the file name.

You'll see the output in the WPT results:

For complex metrics like almanac.js you will need to inspect the JSON results directly to see the output. The test ID for the results is in the URL. Simply append ?f=json to see the JSON results. For example: http://webpagetest.org/result/190624_6W_f5211bdf38d897fb4cb5a4f0872eb1f6/?f=json

Then you can find the custom metric by going to data.median.firstView.almanac:

Let me know if you have any other questions!

rviscomi on 14 Jun 2019

Sorry if I missed this somewhere, but do we need to do something extra to get the right permissions to query the sample datasets created in #34 and/or have our test queries not billed to us individually? :)

patrickhulce on 17 Jun 2019

👍1

I've updated the permissions of the sample_data dataset so anyone can query it.

The goal for that dataset is to allow @HTTPArchive/data-analysts to explore the schema and validate their queries. The table sizes should be small enough so any queries fit comfortably within the free monthly quota. When we run the analysis against the full dataset, I hope to have BQ credits for everyone to cover any expenses.

rviscomi on 18 Jun 2019

👍1

@HTTPArchive/data-analysts we're behind on triaging all of the metrics so I think we need to take a different approach. There are 350 metrics and 12 analysts, so that's an average of 30 metrics per analyst. If we divide and conquer that way, we should be able to meet the July 1 deadline. I'll go through the triage sheet and assign each analyst to approximately 30 metrics each grouped by chapter. I'll update this issue with a table of the assignments.

I've updated the sheet with Analyst assignments and updated the summary table with each analyst's total metric status.

@khempenius and @patrickhulce since you're both authors and expressed interest only in taking on analyst roles for your respective chapters, I didn't add you to any new chapters. @fhoffa I coaxed you into this so I didn't give you too many metrics to work on. Let me know if any of you are willing to take on more metrics, it'd be a big help.

@beouss you expressed an interest in joining the team but never accepted your invitation. If you're still interested I'll assign you some metrics.

rviscomi on 19 Jun 2019

I reviewed the CSS + SEO chapters metrics assigned to me.

@rviscomi CSS chapter relies heavily on regexes + counts, but don't see how a Custom Metrics could be better (e.g. there is not a js selector to find a media queries and parse them based on breakpoint, right? See metric 02.15. Or 02.2).

@rviscomi how do we provide the Custom Selector script (I have three).

@rviscomi Several metrics in the Security chapter have the data available in an JSON array in the httparchive.request dataset. Should I mark them 'able to query' although I am not the analyst for these?

ymschaap on 21 Jun 2019

Thanks @ymschaap! I'll get back to you about the other questions but for now I'll say that if you know a metric can be queried and you're not assigned to it, assign yourself and update its status field.

rviscomi on 21 Jun 2019

Rick, I went through the accessibility metrics assigned to me. I think with clever regexes, we can get most of the items. There are a few that I need clarification on.

dougsillars on 21 Jun 2019

👍1

@dougsillars ping me with any questions you have :)

obto on 22 Jun 2019

@ymschaap ok, getting back to your questions.

how do we provide the Custom Selector script (I have three).

I've created custom_metrics/almanac.js in the repo that is responsible for testing each site. I've added instructions to the top of the file to add a new metric. It contains three custom metrics so far, including one of yours as an example of how to include a complex metric.

CSS chapter relies heavily on regexes + counts, but don't see how a Custom Metrics could be better (e.g. there is not a js selector to find a media queries and parse them based on breakpoint, right? See metric 02.15. Or 02.2).

Agreed, our best option is regex parsing the raw stylesheet content. It will be tedious but feasible, I think.

@HTTPArchive/data-analysts

Good news!

I was able to get coupon codes worth 120 TB for all of the analysts, to help offset the expenses of querying such a large dataset.

All metrics due by Friday!

All metrics must be in a "resolved" state by Friday: Able To Query, Query Written, or Not Feasible.
For metrics marked Custom Metric Required, see above for instructions on implementing the custom metric. When the PR is merged, switch it to Able To Query.
For metrics marked Need More Info, follow up with authors directly to get answers to your questions. Any metrics still with this status by Friday will be switched to Not Feasible.

Here's a list of everyone's outstanding metric count:

Analyst | Custom Metric Required | Need More Info | To Be Reviewed
-- | -- | -- | --
@dotjs | | | 10
@dougsillars | 6 | 9 | 1
@jrharalson | | | 19
@paulcalvano | | 1 | 31
@raghuramakrishnan71 | 1 | | 20
@rviscomi | 15 | 27 |
@tjmonsi | | | 19
@ymschaap | | 1 | 1

rviscomi on 24 Jun 2019

@rviscomi how do i avail the coupon codes? I'm going to create my own project for this so that I will not use up my company's free credit

tjmonsi on 24 Jun 2019

👍1

@tjmonsi @voltek62 @beouss to get your coupon code, make sure you've joined the #web-almanac Slack channel and DM me to get your code.

For everyone else on the @HTTPArchive/data-analysts team, you should have received your code already. Some analysts, like @dougsillars and @khempenius, have already been given special access for past projects and don't need coupons.

rviscomi on 24 Jun 2019

@rviscomi might use them later on. I might not be able to finish all of mine as there some things I don't understand yet and can't figure out how to check in the dataset.

I'll try to finish what I can on the fonts, there is one overlapping though which was the latency (i have a comment on mine)

tjmonsi on 27 Jun 2019

⏰ @dougsillars @raghuramakrishnan71 @tjmonsi you three have metrics assigned to you marked as "Custom Metric Required". We need to have those custom metrics in HTTP Archive before the July 1 crawl begins, ideally by today. See https://github.com/HTTPArchive/legacy.httparchive.org/pull/160 for an example PR that adds custom metrics. There are instructions at the top of almanac.js describing the workflow.

Let me know ASAP if you need any help implementing the metrics.

rviscomi on 28 Jun 2019

@rviscomi where should I branch out and merge? in legacy.httparchive.org as well?

tjmonsi on 28 Jun 2019

Yes, thanks!

rviscomi on 28 Jun 2019

ok... working on it now

tjmonsi on 28 Jun 2019

🚀1

@rviscomi question, do you want to add prefetch and preload as well, aside from getting urls with preconnect?

tjmonsi on 28 Jun 2019

That question is best asked of the chapter authors, who requested the metrics. If you can point me to the metric you're referring to I can try to infer what they're looking for.

Edit: oh if you're referring to 06.15 then that metric can be queried using the custom metric @ymschaap just merged (link-nodes). If that's the case you can change that metric status to Able To Query.

rviscomi on 28 Jun 2019

@rviscomi I was just thinking of adding it there before the crawler runs just in case. I didn't see it yet as of now. I am already testing preconnect that returns an array of url. Anyway, I am just done and will do a PR

tjmonsi on 28 Jun 2019

@rviscomi no worries. At least I learned how to do create custom metrics hehe. :)

tjmonsi on 28 Jun 2019

😄1 👍1

I'll be syncing the custom metrics with the HTTP Archive server today so they're included in tomorrow's July crawl.

rviscomi on 30 Jun 2019

🚀2 🎉2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Discuss] Interactive data visualizations

rviscomi · 5Comments

Home page contributor count style bug

rviscomi · 5Comments

Create templates for content pages

rviscomi · 5Comments

Finalize assignments: Chapter 13. Ecommerce

rviscomi · 6Comments

Typo in JS featured snippet

rviscomi · 3Comments