Assigned: @HTTPArchive/data-analysts team
Any metrics that require augmenting the test infrastructure (eg custom metrics) must be ready to go when the July crawl starts. This ensures that when the crawl completes at the end of July, we can query the dataset and pass it off to authors for interpretation in August.
As of now there are 350+ metrics spread over 20 chapters.
Part | Chapter | Able To Query | Not Feasible | Grand Total
-- | -- | -- | -- | --
I | 01. JavaScript | 24 | 1 | 25
I | 02. CSS | 39 | 7 | 46
I | 03. Markup | 4 | 1 | 5
I | 04. Media | 20 | 5 | 25
I | 05. Third Parties | 13 | | 13
I | 06. Fonts | 40 | 7 | 47
II | 07. Performance | 24 | | 24
II | 08. Security | 36 | 5 | 41
II | 09. Accessibility | 32 | 6 | 38
II | 10. SEO | 15 | | 15
II | 11. PWA | 6 | | 6
II | 12. Mobile web | 19 | 2 | 21
III | 13. Ecommerce | 10 | 3 | 13
III | 14. CMS | 11 | 1 | 12
IV | 15. Compression | 3 | 1 | 4
IV | 16. Caching | 14 | 1 | 15
IV | 17. CDN | 13 | 3 | 16
IV | 18. Page Weight | 3 | | 3
IV | 19. Resource Hints | 10 | | 10
IV | 20. HTTP/2 | 14 | 3 | 17
| Grand Total | 350 | 46 | 396
I've copied all of the metrics for each chapter to this sheet (named "Metrics Triage"). To edit the sheet please give me your email address to add to the editors list. What we need to do is go through the list of metrics for each chapter and assign a status from one of the following:
The lifecycle is:
Custom metrics should only be added as a last resort and must adhere to strict performance requirements. We test on millions of pages so any complex/slow scripts would impede the crawl. Because we anticipate needing many custom metrics, we'll implement everything as individual functions within a single custom metric whose output is a JSON-encoded object with each result as its own sub-property. More on this when we get there.
Add your name in the Analyst column to take responsibility for moving it through the metric lifecycle.
Once we're ready to begin writing queries, we will create a thread on https://discuss.httparchive.org for each chapter, listing all queryable metrics. Hopefully we can crowdsource some of the querying by tapping into the power users on the forum.
Copy will do this
On Tue, Jun 4, 2019, 11:05 Rick Viscomi notifications@github.com wrote:
Assigned #33
https://github.com/HTTPArchive/almanac.httparchive.org/issues/33 to
@tjmonsi https://github.com/tjmonsi.—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/HTTPArchive/almanac.httparchive.org/issues/33?email_source=notifications&email_token=AAUF5VV6YYNDFWW4VGWNSDDPYXLVVA5CNFSM4HSXT6X2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORZBXJOI#event-2386785465,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUF5VUMYMD5VSPLUTFXMNDPYXLVVANCNFSM4HSXT6XQ
.
@HTTPArchive/data-analysts reminder to please go through the Metrics Triage sheet when you have the time.
There was a lot of info in the first post so here's a condensed version:
The next step will be to start writing queries and custom metrics using the HTTP Archive forum to discuss solutions.
I understand we can create custom metrics for the next crawl. Which is really cool. I'm just unsure what this enables. For example for the SEO Chapter we would want to count the number of h1, h2, h3 elements and their string length. How would I go and create a custom metric? Do you have an example of a custom metric (e.g. piece of code)? Are there docs? Who tests and writes the code?
Once I understand the custom metrics capabilities, I could fill out the Metrics Triage sheet.
Good question! Custom metrics are JS snippets you can execute on each page. They are run by our legacy crawl system and the code for existing metrics is here: https://github.com/HTTPArchive/legacy.httparchive.org/tree/master/custom_metrics
For example, see the doctype custom metric. To test it, you can run it directly on webpagetest.org under the "Custom" tab:

Note that all WPT custom metrics must have [metricName] at the start of the script. This is excluded in the HTTP Archive code and generated automatically based on the file name.
You'll see the output in the WPT results:

For complex metrics like almanac.js you will need to inspect the JSON results directly to see the output. The test ID for the results is in the URL. Simply append ?f=json to see the JSON results. For example: http://webpagetest.org/result/190624_6W_f5211bdf38d897fb4cb5a4f0872eb1f6/?f=json
Then you can find the custom metric by going to data.median.firstView.almanac:

Let me know if you have any other questions!
Sorry if I missed this somewhere, but do we need to do something extra to get the right permissions to query the sample datasets created in #34 and/or have our test queries not billed to us individually? :)
I've updated the permissions of the sample_data dataset so anyone can query it.
The goal for that dataset is to allow @HTTPArchive/data-analysts to explore the schema and validate their queries. The table sizes should be small enough so any queries fit comfortably within the free monthly quota. When we run the analysis against the full dataset, I hope to have BQ credits for everyone to cover any expenses.
@HTTPArchive/data-analysts we're behind on triaging all of the metrics so I think we need to take a different approach. There are 350 metrics and 12 analysts, so that's an average of 30 metrics per analyst. If we divide and conquer that way, we should be able to meet the July 1 deadline. I'll go through the triage sheet and assign each analyst to approximately 30 metrics each grouped by chapter. I'll update this issue with a table of the assignments.
I've updated the sheet with Analyst assignments and updated the summary table with each analyst's total metric status.
@khempenius and @patrickhulce since you're both authors and expressed interest only in taking on analyst roles for your respective chapters, I didn't add you to any new chapters. @fhoffa I coaxed you into this so I didn't give you too many metrics to work on. Let me know if any of you are willing to take on more metrics, it'd be a big help.
@beouss you expressed an interest in joining the team but never accepted your invitation. If you're still interested I'll assign you some metrics.
I reviewed the CSS + SEO chapters metrics assigned to me.
@rviscomi CSS chapter relies heavily on regexes + counts, but don't see how a Custom Metrics could be better (e.g. there is not a js selector to find a media queries and parse them based on breakpoint, right? See metric 02.15. Or 02.2).
@rviscomi how do we provide the Custom Selector script (I have three).
@rviscomi Several metrics in the Security chapter have the data available in an JSON array in the httparchive.request dataset. Should I mark them 'able to query' although I am not the analyst for these?
Thanks @ymschaap! I'll get back to you about the other questions but for now I'll say that if you know a metric can be queried and you're not assigned to it, assign yourself and update its status field.
Rick, I went through the accessibility metrics assigned to me. I think with clever regexes, we can get most of the items. There are a few that I need clarification on.
@dougsillars ping me with any questions you have :)
@ymschaap ok, getting back to your questions.
how do we provide the Custom Selector script (I have three).
I've created custom_metrics/almanac.js in the repo that is responsible for testing each site. I've added instructions to the top of the file to add a new metric. It contains three custom metrics so far, including one of yours as an example of how to include a complex metric.
CSS chapter relies heavily on regexes + counts, but don't see how a Custom Metrics could be better (e.g. there is not a js selector to find a media queries and parse them based on breakpoint, right? See metric 02.15. Or 02.2).
Agreed, our best option is regex parsing the raw stylesheet content. It will be tedious but feasible, I think.
@HTTPArchive/data-analysts
I was able to get coupon codes worth 120 TB for all of the analysts, to help offset the expenses of querying such a large dataset.
Here's a list of everyone's outstanding metric count:
Analyst | Custom Metric Required | Need More Info | To Be Reviewed
-- | -- | -- | --
@dotjs | | | 10
@dougsillars | 6 | 9 | 1
@jrharalson | | | 19
@paulcalvano | | 1 | 31
@raghuramakrishnan71 | 1 | | 20
@rviscomi | 15 | 27 |
@tjmonsi | | | 19
@ymschaap | | 1 | 1
@rviscomi how do i avail the coupon codes? I'm going to create my own project for this so that I will not use up my company's free credit
@tjmonsi @voltek62 @beouss to get your coupon code, make sure you've joined the #web-almanac Slack channel and DM me to get your code.
For everyone else on the @HTTPArchive/data-analysts team, you should have received your code already. Some analysts, like @dougsillars and @khempenius, have already been given special access for past projects and don't need coupons.
@rviscomi might use them later on. I might not be able to finish all of mine as there some things I don't understand yet and can't figure out how to check in the dataset.
I'll try to finish what I can on the fonts, there is one overlapping though which was the latency (i have a comment on mine)
⏰ @dougsillars @raghuramakrishnan71 @tjmonsi you three have metrics assigned to you marked as "Custom Metric Required". We need to have those custom metrics in HTTP Archive before the July 1 crawl begins, ideally by today. See https://github.com/HTTPArchive/legacy.httparchive.org/pull/160 for an example PR that adds custom metrics. There are instructions at the top of almanac.js describing the workflow.
Let me know ASAP if you need any help implementing the metrics.
@rviscomi where should I branch out and merge? in legacy.httparchive.org as well?
Yes, thanks!
ok... working on it now
@rviscomi question, do you want to add prefetch and preload as well, aside from getting urls with preconnect?
That question is best asked of the chapter authors, who requested the metrics. If you can point me to the metric you're referring to I can try to infer what they're looking for.
Edit: oh if you're referring to 06.15 then that metric can be queried using the custom metric @ymschaap just merged (link-nodes). If that's the case you can change that metric status to Able To Query.
@rviscomi I was just thinking of adding it there before the crawler runs just in case. I didn't see it yet as of now. I am already testing preconnect that returns an array of url. Anyway, I am just done and will do a PR
@rviscomi no worries. At least I learned how to do create custom metrics hehe. :)
Today's the day! I've marked all 5 remaining Need More Info metrics as Not Feasible. We're finally done with the triage! Thanks again to the entire @HTTPArchive/data-analysts team for your hard work going through these ~400 metrics.
I'll be syncing the custom metrics with the HTTP Archive server today so they're included in tomorrow's July crawl.
Most helpful comment
Today's the day! I've marked all 5 remaining
Need More Infometrics asNot Feasible. We're finally done with the triage! Thanks again to the entire @HTTPArchive/data-analysts team for your hard work going through these ~400 metrics.I'll be syncing the custom metrics with the HTTP Archive server today so they're included in tomorrow's July crawl.