Site-kit-wp: Implement API error stats collection

Created on 9 Sep 2020 · 30Comments · Source: google/site-kit-wp

Feature Description

When the plugin receives an API error response, we should track those errors for users who have opted into sharing usage stats.

Currently we can only see error rates for certain APIs and we can't tell which request endpoints are causing the errors. Collecting errors stats will help us monitor, track and report on representative error rate data.

_Do not alter or remove anything below. The following sections will be managed by moderators only._

Acceptance criteria

Whenever an API error occurs, if the user has opted in to tracking, an Analytics tracking event should be sent:
- Category should be api_error.
- Name should be a string in the form of {method}:{type}/{identifier}/data/{datapoint} where placeholders represent request arguments by the same names.
- Label should be {error.message} (code: {error.code}[, reason: {error.data.reason}])
  
  _reason should only be included if present on the error data_
- Value should be the integer status code (use error.data.status if present, otherwise error.code).
This tracking should be added for both the legacy API layer and the new googlesitekit-api module.
For batch requests in the legacy API layer, errors should be tracked per individual request, not only the entire batch. For example, the API datapoint used for the event name should be the one relevant for the individual request rather than the batch itself.

Implementation Brief

Add a centralized error reporting helper method named trackAPIError in a new util/api.js module. The function accepts an error and extracts the code, message and potentially reason, and sends this data off to the analytics tracking helper trackEvent. Note that trackEvent already checks if the user has opted into stats tracking before sending any data.

async function trackAPIError( { method, datapoint, type, identifier, error } ) {
    await trackEvent( 
        'api_error', 
        `${ method }:${ type }/${ identifier }/data/${ datapoint }`, 
        `${ error.message } (code: ${error.code}${ error.data?.reason ? ', reason: ' + error.data.reason : '' } ])`,
        error.data?.status || error.code
    );
}

Data errors are handled in `dataAPI.handleWPError` and `API.siteKitRequest`.

In assets/js/components/data/index.js: For each call to handleWPError, add method ('POST' for set and combinedGet, 'GET' for get), datapoint, type and identifier ( in :combinedGet and :get; add a catch clause in set: and add there as well)
In dataAPI.handleWPError:
Change the accepted parameters:
handleWPError( method, datapoint, type, identifier, error ) {

Also, add:
trackAPIError( { method, datapoint, type, identifier, error } );

In API.siteKitRequest in assets/js/googlesitekit/api/index.js:

Report errors after the catch phrase: } catch ( error ) {:

trackAPIError( { method, datapoint, type, identifier, error } );

Testing

Add Jest tests:

In assets/js/googlesitekit/api/index.test.js in the tests in API.get and API.set, add test that verifies the dataLayer method has the correct data pushed to it when an error is returned.

Use the same patterns that we already use for testing trackEvent (in assets/js/util/tracking/index.test.js) - using a mock function on the push method of the dataLayer target object.
Add new tests for dataAPI in a new file assets/js/components/data/index.test.js:
- Add tests for get, set and combinedGet (which should test multiple cases: no errors, all errors, one+ errors) similar to described above for API tests.
Add a new test file util/api.test.js
- Add tests for trackAPIEvent following same patterns that we already use for testing trackEvent.

QA Brief

To QA this feature, you need to change the reporting account to an Analytics account you have access to. In addition you will need to reproduce error condition to see them reported. We may want to look at introducing a way to mock API error responses using the tester plugin.

Changelog entry

Add Analytics tracking events for API request errors.

P0 Enhancement

Source

adamsilverstein

All 30 comments

Question about one part of the AC:

... if it is numeric

@felixarntz are you aware of cases where a non numeric error.data.status or error.code would be returned? What if APIs return a string code, eg '403'? Seems like we should try to pass some code regardless.

adamsilverstein on 15 Sep 2020

Thanks @adamsilverstein!

The AC says:

Name should be the API datapoint without the /google-site-kit/v1 prefix.

however, the IB has this as simply datapoint in the utility. I believe the confusion here is a matter of semantics (@felixarntz correct me if I'm wrong), but I think the intended format of this value would be {type}/{identifier}/data/{datapoint}. One thing I think that's missing here, although not relevant for all endpoints is that we should include the method as well as some support multiple (e.g. settings). i.e. {method}:{type}/{identifier}/data/{datapoint}

Add a centralized error reporting helper method named handleAPIError in assets/js/util/errors.js

Nitpicking the name a little bit, how about trackAPIError( { method, type, identifier, datapoint, error } ) ? That's a bit more accurate to what it does I think 🙂
As for the location, I think this might be better suited to go in a new util/api.js module since it seems a bit closer to the APIs than error utils, which are more about handling an error only.

For the legacy API errors are handled in getDataError functions

Why not use dataAPI.handleWPError ? The signature there could be updated to accept the additional pieces that define a request (type, identifier, etc), but that function is already called for both single requests and each response of a batch request – no component changes necessary. The location of where it's called within combinedGet would probably need to move or something else would need to change to provide the request data, but it shouldn't be very complicated. We would also need to update dataAPI.set to call this as well.

Finally, the IB doesn't outline any tests to add, but I think it would be valuable to add tests for each case to make sure it works as expected. That is, adding Jest tests for the utility which use:

dataAPI.get
dataAPI.combinedGet (multiple cases here: no errors, all errors, one+ errors)
dataAPI.set
API.get
API.set

Given these additions, I think it would warrant a larger estimate as well.

aaemnnosttv on 17 Sep 2020

{method}:{type}/{identifier}/data/{datapoint} makes sense for the name if we have all that info. I will adjust the IB.

Nitpicking the name a little bit, how about trackAPIError

Sounds good. I was trying to impart that it will maybe track the API error (depending on the user's opt-in settings), however your naming better matches the other similar helpers. I'll move to a new file as well.

Why not use dataAPI.handleWPError

Ah, good idea - I missed that common handler. I will update the IB and include adding that to the set method as well.

I think it would be valuable to add tests for each case to make sure it works as expected.

Good point. Thanks for the summary of what is needed. I'll add some details and add to the IB.

adamsilverstein on 17 Sep 2020

@aaemnnosttv I don't see where we use a method other than POST, which requests are you referring to that can use more than one method?

adamsilverstein on 17 Sep 2020

@adamsilverstein for example modules/{slug}/data/settings supports GET and POST, another that supports both is AdSense's use-snippet datapoint.

You can review all of the module-specific datapoints and what methods are supported by looking at the various get_datapoint_definitions methods.

Any request made with API.get, dataAPI.get will always be a GET request, and dataAPI.combinedGet, dataAPI.set, and API.set will always be POST.

aaemnnosttv on 18 Sep 2020

@aaemnnosttv thanks for the details, I will review. Nice to see all those definitions in one place!

Any request made with API.get, dataAPI.get will always be a GET request, and dataAPI.combinedGet, dataAPI.set, and API.set will always be POST.

Ah, right! That simplifies what we need to pass along for tracking.

adamsilverstein on 18 Sep 2020

That is, adding Jest tests for the utility

@aaemnnosttv - I could use some guidance on the Jest tests: do we have existing tests for API/dataAPI that I should add these to, or would these be entirely new tests? If so, where do you suggest I add them?

adamsilverstein on 18 Sep 2020

I found tests for API in assets/js/googlesitekit/api/index.test.js, I did not fond existing tests for dataAPI. I sill assume we need to add one?

adamsilverstein on 21 Sep 2020

Yes, correct @adamsilverstein – the newer API has tests but dataAPI does not (but we should add some for this 👍 ).

aaemnnosttv on 21 Sep 2020

For the legacy API errors are handled in getDataError functions

In both the core widthData HOC (assets/js/components/higherorder/withdata.js) as well as overridden in assets/js/modules/analytics/components/dashboard/AdSenseDashboardWidgetTopPagesTableSmall.js and assets/js/modules/analytics/components/dashboard/AdSenseDashboardWidgetTopPagesTableSmall.js.

Correct me if I'm wrong, but I don't think these need to be changed at all (also the two files are the same 😄 ). withData uses combinedGet for all the data requests so any errors there will run through handleWPError. I think getDataError was necessary before due to batch request responses not being "cast" into WP_Errors by the server so there was additional parsing of the response that was necessary to know if something was an error or not because it was in a different Google-response format. All errors will come back in the same WP_Error-like format now; batch or not so it's much more consistent now.

In assets/js/googlesitekit/api/index.test.js in the tests in API.get and API.set, add test that verifies the trackEvent method is called when an error is returned.

Then add tests in get and set that mock errors and check that the spy was called:

Let's not test that trackAPIError was called a certain way because that doesn't really support the acceptance criteria (that the event was tracked) and we essentially end up testing implementation details. We only really care that the events are tracked with the correct parameters (on the dataLayer) so let's not give too much importance about _how_ it gets there. We can use the same patterns that we already use for testing trackEvent by using a mock function on the push method of the dataLayer target object. Since the order of the arguments to this call is significant, it makes more sense to test this than the middleman. This also allows us to test the formatting of the reported error (which is part of the AC 😉 ) as well as makes the implementation more flexible to change without requiring the tests to change with it.

Apart from that, I think we should avoid using too many positional parameters for trackAPIError. Since there's quite a few parts, I think passing them all in an object would be preferable.
ie.

trackAPIError( { method, type, identifier, datapoint, error } )

That may require the calls to be a little more verbose in some cases where the variable names don't match the object keys but that's okay as it will make it much more clear what is going on there.

aaemnnosttv on 21 Sep 2020

Apart from that, I think we should avoid using too many positional parameters for trackAPIError. Since there's quite a few parts, I think passing them all in an object would be preferable.

Makes sense, I was thinking the same thing after adding the additional parameters.

Correct me if I'm wrong, but I don't think these need to be changed at all (also the two files are the same 😄 ).

Ah, good point. Originally I was planning to catch the errors here, but as you point out we are already catching them earlier in combineGet. I'll remove these changes.

We can use the same patterns that we already use for testing trackEvent by using a mock function on the push method of the dataLayer target object.

Great suggestion. I didn't realize we had tests in place already. I'll use that pattern.

adamsilverstein on 21 Sep 2020

We can use the same patterns that we already use for testing trackEvent by using a mock function on the push method of the dataLayer target object.

@aaemnnosttv since we already test that trackEvent pushes data to the dataLayer, and trackAPIError acts a wrapper for trackEvent, should we mock trackEvent to verify it is called correctly? That is what trackAPIError is responsible for, it doesn't handle pushing the data event itself. I can update the AC to clarify that the trackEvent function should be called (rather than stating that the event should be tracked) which would be more accurate.

adamsilverstein on 21 Sep 2020

since we already test that trackEvent pushes data to the dataLayer, and trackAPIError acts a wrapper for trackEvent, should we mock trackEvent to verify it is called correctly?

Good question @adamsilverstein. In my opinion, trackEvent is an implementation detail here that is ultimately independent of the AC which should only be concerned with the important higher-level details. In that regard, I think the AC is well-stated to say that "Whenever an API error occurs, if the user has opted in to tracking, an Analytics tracking event should be sent" 👍

You're correct that we could mock/spyOn trackEvent and assert that it was called with some expected arguments, and by association with passing tests for trackEvent we could _infer_ that the event would be tracked properly, but there are actually a number of bugs that are possible that would pass those tests.

E.g.

import { disableTracking } from './tracking'
export async function trackAPIError( ... ) {
    disableTracking();

    await trackEvent(
        'api_error',
        // correct arguments here
    );
}

Testing that trackEvent is called correctly would pass here, but the error would not be tracked because tracking will be disabled! This is a silly example but goes to show that tests that can be broken so easily don't really provide any confidence and aren't worth much. In some cases, they can provide false confidence that things are working when they aren't.

That is what trackAPIError is responsible for, it doesn't handle pushing the data event itself.

I can see why you would think that, but I hope you see now that I think it's more accurate to say that trackAPIError is responsible for tracking an API error event 😄

My general approach is that if it's just as easy to test the real thing or end result, it's better to avoid using mocks. In other words, I prefer to test things with minimal fake pieces for the sake of testing.

So when we add tests for dataAPI.get, API.set, etc to ensure these track the errors as expected, these can be fairly simple tests, and they may even look very similar to tests for trackEvent, but would be specific to the actual request(s) that were made.

trackAPIEvent should have it's own tests which again may look similar, but describe the behavior of this function in more detail (e.g. how excluded error codes work, how the message and other parts of the event should be formatted, etc).

aaemnnosttv on 22 Sep 2020

My general approach is that if it's just as easy to test the real thing or end result, it's better to avoid using mocks. In other words, I prefer to test things with minimal fake pieces for the sake of testing.

Makes sense for an overall approach.

trackAPIEvent should have it's own tests which again may look similar,

Makes sense, I'll update the IB.

adamsilverstein on 22 Sep 2020

@aaemnnosttv IB updated, thanks for the feedback on the testing approach.

adamsilverstein on 22 Sep 2020

👍1

@adamsilverstein – the IB looks good – just one last question regarding the signature:

Apart from that, I think we should avoid using too many positional parameters for trackAPIError. Since there's quite a few parts, I think passing them all in an object would be preferable.

Makes sense, I was thinking the same thing after adding the additional parameters.

I saw this was still using positional arguments in the IB – are we still in agreement about using an object instead?

aaemnnosttv on 23 Sep 2020

I saw this was still using positional arguments in the IB – are we still in agreement about using an object instead?

Ah, missed that bit, fixing.

adamsilverstein on 23 Sep 2020

👍1

@aaemnnosttv Added positional argument change.

adamsilverstein on 23 Sep 2020

Great – I think we may want to do the same for handleWPError but we can look at that again in the PR.

IB ✅

aaemnnosttv on 23 Sep 2020

Great – I think we may want to do the same for handleWPError but we can look at that again in the PR.

Good point - this would be good for consistency.

adamsilverstein on 23 Sep 2020

👍1

@aaemnnosttv - I have most of this complete in https://github.com/google/site-kit-wp/pull/2072. I had some trouble getting tests for combinedGet working and would appreciate your help when you have a chance.

For these tests, I was unable to get the handler to continue beyond the fetch request. While fetchMock.getOnce( and fetchMock.post work elsewhere, I couldn't get them responding here. Next I tried mocking wordpress api-fetch directly (jest.mock( '@wordpress/api-fetch' );) and returning a resolved Promise. I see it returning in logs, however as far as I can discern, the code never continues to the then or catch blocks.

Appreciate your review of what I have so far. Do you know if wp-scripts can run in a debug mode where it provides more detailed output? I read that setting an environment variable DEBUG=fetch-mock* will increase fetch mock verbosity, but didn't get that working in my testing.

adamsilverstein on 24 Sep 2020

Appreciate your review of what I have so far.

Sure, sounds good!

Do you know if wp-scripts can run in a debug mode where it provides more detailed output?

Not that I know of specifically, although I know you can run Jest with the node debugger. I'm not sure if that would help in your case though.

I read that setting an environment variable DEBUG=fetch-mock* will increase fetch mock verbosity, but didn't get that working in my testing.

Yes, I've used that before and it is very verbose. I just tried it out and it worked so maybe you're running it differently?

DEBUG=fetch-mock* npm run test:js

This should do the trick for you (although I wouldn't recommend running them all with such verbosity 😄 )

I would avoid mocking api-fetch though as there's no reason why fetch-mock shouldn't work batch or not. Let me know if you're still stuck and we can jump on a call or something to figure it out 👍

aaemnnosttv on 24 Sep 2020

Ah, I think i was doing DEBUG=fetch-mock* && npm run test:js or maybe the wp-scripts command directly. Thanks for the tip.

there's no reason why fetch-mock shouldn't work batch or not

Yea, I can give the main fetch-mock approach another try. I know fetchMock sees the request, because I see it recorded (checking calls()) The problem is in this block: https://github.com/google/site-kit-wp/blob/1be22d1f809f40b6c6b4ceab6a54c88025141cfd/assets/js/components/data/index.js#L175-L212 - even trying fetchMock.all() and a simple return the code never hit the then or catch block.

adamsilverstein on 24 Sep 2020

@aaemnnosttv I was able to get some combinedGet tests working. I was missing an await 🤷

adamsilverstein on 25 Sep 2020

@aaemnnosttv thanks for the feedback, I'll address these final points shortly.

adamsilverstein on 19 Oct 2020

Feedback addressed (with one question), ready for another review.

adamsilverstein on 21 Oct 2020

Looks like all that was remaining in #2072 was to add tests and they have been added, so I think this is good to go. But in case you wanna have another look at it @aaemnnosttv, I've assigned this one to you since you originally reviewed and requested changes on the PR. 🙂

tofumatt on 27 Oct 2020

QA ❌

Everything is working well except there is a bug with trackEvent which is passing through the event value incorrectly. We haven't used this parameter until now so we haven't been affected by it until now.

Will submit a PR to fix this momentarily.

aaemnnosttv on 5 Nov 2020

QA ✅

Re-tested after the above PR was merged and found that error events were tracked as expected with both googlesitekit.api (get|set) and legacy dataAPI (get|set) including combinedGet.

Screenshots

API Error no data.reason