Apollo-client: dataIdFromObject can result in missing data

Created on 21 Oct 2016 · 25Comments · Source: apollographql/apollo-client

This is a more general case of #818 in that if the result of a query made later in time changes a linked to object, the original query will fail with "Perhaps you want to use the returnPartialData option?". In this test case after the first query has finished we issue a slightly different second query that changes the id of the linked object.

import * as chai from 'chai';
const { assert } = chai;
import gql from 'graphql-tag';
import { QueryManager } from '../src/core/QueryManager';
import mockNetworkInterface from './mocks/mockNetworkInterface';
import { createApolloStore, ApolloReducerConfig } from '../src/store';
import subscribeAndCount from './util/subscribeAndCount';

const defaultReduxRootSelector = (state: any) => state.apollo;

const friendAgeQuery = gql`
  query($id: ID!) {
    person(id: $id) {
      id
      friend {
        id
        age
      }
    }
  }
`;

const friendAgeData1 = {
  person: {
    id: 1,
    friend: {
      id: 2,
      age: 12,
    },
  },
};

const friendAgeData2 = {
  person: {
    id: 1,
    friend: {
      id: 3,
      age: 13,
    },
  },
};

const friendQuery = gql`
  query($id: ID!) {
    person(id: $id) {
      id
      friend {
        id
        name
      }
    }
  }
`;

const friendData = {
  person: {
    id: 1,
    friend: {
      id: 3,
      name: 'Chris',
    },
  },
};

describe('dataIdFromObject', () => {
  it.only('handles missing shared object fields', (done) => {
    const mockedResponses = [{
      request: { query: friendAgeQuery, variables: { id: 1 } },
      result: { data: friendAgeData1 },
    }, {
      request: { query: friendQuery, variables: { id: 1 } },
      result: { data: friendData },
    }, {
      request: { query: friendAgeQuery, variables: { id: 1 } },
      result: { data: friendAgeData2 },
    }];
    const reducerConfig: ApolloReducerConfig = {
      mutationBehaviorReducers: {},
      dataIdFromObject: (result: any) => {
        if (result.id) {
          return result.id;
        }
        return null;
      },
    };
    const store = createApolloStore({ config: reducerConfig });
    const queryManager = new QueryManager({
      networkInterface: mockNetworkInterface(...mockedResponses),
      store,
      reduxRootSelector: defaultReduxRootSelector,
      addTypename: false,
      reducerConfig,
    });
    const friendAge = queryManager.watchQuery({
      query: mockedResponses[0].request.query,
      variables: mockedResponses[0].request.variables,
    });
    const friend = queryManager.watchQuery({
      query: mockedResponses[1].request.query,
      variables: mockedResponses[1].request.variables,
    });
    subscribeAndCount(done, friendAge, (count, result) => {
      if (count === 1) {
        subscribeAndCount(done, friend, (innerCount, innerResult) => {
          if (innerCount === 1) {
            assert.deepEqual(innerResult.data, friendData);
          }
        });
      }
      if (count === 2) {
        assert.isTrue(result.loading);
      }
      if (count === 3) {
        assert.deepEqual(result.data, friendAgeData2);
        done();
      }
    });
  });
});

🐞 bug

Source

NeoPhi

Most helpful comment

That is unfortunate since that requires tight coupling between different parts of the application. One query shouldn't need to know what fields other parts of the application are requesting. We have a highly connected graph of data which would make this workaround very difficult to manage.

NeoPhi on 24 Oct 2016

👍7

All 25 comments

I agree that this situation is not ideal, but I think the proper workaround here is to make sure you ask for all the shared fields when making a query that might overwrite another query's result.

To avoid the error for now, we could merge the objects. This might make the object inconsistent, but at least the query wouldn't fail.

In a fancier future version we could potentially detect overlapping active queries with non-identical selections and fire off the entire overlapping set of queries (to get consistency without relying on dynamic queries).

helfer on 24 Oct 2016

NeoPhi on 24 Oct 2016

👍7

Much like @NeoPhi I fear that this is another piece of the design which requires a growing app to tightly couple its query usage across what often may be a large, de-centralized UI. It doesn't seem ideal for queries to be responsible for knowing anything whatsoever about the needs of any other queries in the application (or even their names, another problem I'm having when it comes to the design of updateQueries & refetchQueries). It seems that much of the benefit of GraphQL (your data is your API; UI components can cleanly notate their isolated data needs) fades as these patterns begin to rely on queries being so globally orchestrated.

jasonphillips on 25 Oct 2016

👍1

I agree that tight coupling is bad, but you guys see the problem, right? If we have a normalized cache, then we expect an update to one query to affect another query if they have overlapping data. We can choose several ways of dealing with these overlapping fetches:

1) merge the objects if there's an existing object in the cache (might lead to inconsistency)
2) overwrite the existing object with the new one (this is what we currently do)
3) dynamically modify the query to ask for all necessary fields in overlapping objects (this makes queries unpredicatble)
4) fire off all overlapping queries and merge the fields (this can have a bad impact on performance)

Of all the options above, I think the first one is the best as an initial step. If you use polling, you still get eventual consistency. And if you need better consistency guarantees, then you'll just have to couple your queries for now. It's not like this affects every query in your app, just the overlapping ones.

helfer on 25 Oct 2016

👍5

Thanks for the follow-up; I see the dilemma but hope that another way may be on the horizon eventually (though approach (1) seems reasonable overall where using polling, as you indicated).

This may be a diversion from the topic at hand, but I wonder if the store's normalized data (per object/id) could possibly maintain a kind of metadata key to track which queries are actively using it? Ie., a reverse mapping from object id back to a list of queries actively mapped to it.

I have a common situation now, where my front-end can very easily listen to a stream of data changes on a socket and be notified that an object with a particular ID needs to be refreshed (often asynchronously in belated response to a mutation request, where these are often side-effects that occur on the server after initial response; I don't want to duplicate that logic on the front-end using what would be a considerably complex mesh of updateQueries or an over-eager set of refetchQueries). I realize I can't use that stream of changed objects to actually invalidate the cache just yet, because that feature is still not here. But I wish that I could, for now, simply check the Apollo store for the object in question (by object id, which is easily knowable), read back a list of any active queries currently accessing that object, and then call a refetch on just those. (Or, perhaps, just call a 'clear' that would not immediately refetch the query, but merely mark that query to go straight to the server for new data the next time it is newly requested in UI).

In any case, if something like a reverse mapping from object id back to query existed, it feels like the option (2) above (overwrite the object) could be followed by an automatic refetch (or, clear the query until next time it is reconciled against cache?) for each query that is currently "attached" to that cached object.

But I may be chasing the wrong ideas on that... in any case, I'm seeing on the horizon that I'll have a lot of distributed queries across a possibly large UI (and user may not have fired them off if not visiting certain less-used pages) alongside a backend that often has complex side-effects after a mutation, which I don't want to either duplicate on the client or to have the server keep track of the active query names. Maybe there's a strategy here, or maybe just a matter of waiting for the invalidation server that is planned.

jasonphillips on 26 Oct 2016

@helfer I wonder if instead of trying to solve the general case, a good first approach would be to better surface what is going on to the consumer. If I specify returnPartialData and Apollo thinks that the query is cached and would not fetch it again, perhaps a new NetworkStatus of readyPartialData could be added. The receiving component could then make the decision to manually refetch if desired. The general idea being to tell the consumer, I've given you all the data I have and I'm not planning to load anymore. I could see this status being helpful when using the noFetch option as well.

NeoPhi on 5 Nov 2016

👍1

To expand on this (per slack @NeoPhi), if you run updateQueries without providing every last possible pathway requested by the original query, an error is thrown: Network error: Can't find field <INSERT MISSING PATHWAY> on result object…

thebigredgeek on 7 Dec 2016

Why would merging objects lead to inconsistency ? I feel like this is an edge case that never happens

sikanhe on 22 Dec 2016

I agree. This isn't REST. I am not flushing the entire object downstream with every mutation, so I don't care if some fields happen to be outdated. If a newer query fetches part of an existing object, the properties that weren't fetched could be marked as "stale" so that if a different view requested those fields apollo would automatically run the query rather than only returning the cache to "freshen" the data. I feel like there are a lot of work arounds here. From a systems design standpoint, forcing coupling between queries between various UI components to avoid errors or missing data seems like a non-negotiable. Stale data is an easier problem to deal with and solve :)

thebigredgeek on 22 Dec 2016

👍6

Another possibility is that apollo could parse the AST for queries and "rewrite" the queries based on algorithmic intent, which can be inferred. So that the GQL queries on the frontend are declarative to apollo client rather than the graphql server. But that's a very complex solution and I feel that there are simpler ones as mentioned above

thebigredgeek on 22 Dec 2016

You can also use forceFetch on any component that MUST have guaranteed consistency.

sikanhe on 23 Dec 2016

@thebigredgeek actually, merging isn't quite as easy as I thought. If you look carefully at @NeoPhi 's description you can see that the problem isn't that an existing object is changed in the cache (friend with id 2), it's that the reference is updated to a new object (friend with id 3), but that new object wasn't requested with all the fields for the previous object.

Merging doesn't make sense in this case, so in my opinion the best option would be to invalidate the original query and refetch it. It's very tempting to want to use dynamically composed queries in this case, but I think it would be the wrong course of action, unless we could compute those overlapping queries at compile-time (and thus effectively make them static).

helfer on 23 Dec 2016

👍1

Interesting. Couldn't you represent every field as an object with a "last_updated" field for comparison and then map the values into query responses? That way your updates by reference cascade because you aren't dealing with primitives?

thebigredgeek on 23 Dec 2016

Also rather than overwriting references couldn't you simply merge into the existing reference?

thebigredgeek on 23 Dec 2016

@helfer I might be misunderstanding. You're saying we shouldn't merge because someone app out might have mutations that change their primary key?

sikanhe on 23 Dec 2016

@sikanhe if you read carefully, that is actually the exact issue that @NeoPhi described. Note that the common case is not changing an object's key, but to change the contents of a list and replace one object with another (for example when you add milk to your shopping list but remove the bread). For cases where the id stays the same we already do merge the fields.

helfer on 26 Dec 2016

How would you feel if Apollo Client returned stale data in this case instead of merging or returning partial data? So if you had requested an old object with id 2 and you get a new object in the same position with id 3 but with fewer fields than before Apollo Client would jump back and read data from the old object with id 2 and any of its new updates.

This means that a query could return partially inconsistent data with the rest of your UI, but this may be a better solution then either merging or returning partial data.

calebmer on 20 Jan 2017

@calebmer That feels like it would be more surprising than the current behavior or some of the other proposed solutions.

NeoPhi on 20 Jan 2017

@calebmer @NeoPhi I personally feel that having partially stale data is okay... fields that are potentially stale can be marked and when they are requested again by another component they can be fetched again by default or something. Some of this could potentially be handled automatically as well, rather than waiting for a component to request stale data. Seems like there are a lot of ways to handle this haha. I would say that it's probably the most pressing concern that I have right now. The current state feels slightly leaky in terms of abstraction, and that leak sometimes makes me feel that it's harder to use apollo-client than redux + axios + REST API :(

thebigredgeek on 20 Jan 2017

I also want to note that we would probably always warn if we were to use stale data, and if you wanted to avoid stale data in any scenario the solution would be to include the fields from your first query in your second query.

calebmer on 20 Jan 2017

I will reiterate that a solution that requires having to manually modify other queries in the system is unworkable.

NeoPhi on 20 Jan 2017

👍1

Can someone clarify what the issue exactly is? I'm having a hard time understanding from all the comments 😝

If I have a book type that has the following fields: id, name, author. If at some point, the author of the book is changed, does that cause the issue discussed here?

Example:

var book = {
  id: 1,
  name: 'My Book',
  author: {
    id: 1,
    name: 'Author #1'
  }
}

// Then the book becomes linked to another author:
book = {
  id: 1,
  name: 'My Book',
  author: {
    id: 2,
    name: 'Author #2'
  }
}

mdebbar on 21 Jan 2017

@mdebbar yes, but you'll need another query which asks for different fields than the first one.

@NeoPhi: Since we don't want to dynamically modify queries, the only other options are to a) automatically refetch the query, or b) manually (or automatically during build time) add the necessary fields. I think the choice between a) or b) should be up to the developer.

In a fancy future version, I could imagine some build tooling that detects queries that might lead to this issue, and pre-calculates a union query which Apollo Client would dynamically generate if other queries with overlapping data are already active. The build tooling would help white-list the query on the server.The problem with this approach however is that you might get a cartesian explosion.

helfer on 22 Jan 2017

I think this is resolved now - our chosen approach was to mark queries as stale if they end up in a situation where they no longer have all of the data they need, you can read about it here: https://github.com/apollographql/apollo-client/pull/1306