Graphql-js: Force inclusion of certain fields

Created on 3 Jul 2016 · 11Comments · Source: graphql/graphql-js

Is there a way to force graphql-js to always include certain fields in it’s output? For example an id field? In order for Relay to always get the id field, you need to add in a preprocessing step so Relay can insert the id field into queries. It seems like a much simpler solution would to be just let GraphQL servers forcefully always include an id field if available. Does graphql-js currently allows this, and may this be allowed in the future?

Source

calebmer

👍4

Most helpful comment

@leebyron, @stubailo, @josephsavona

I think we can all agree a global __id field solves a lot of problems, and I don’t necessarily disagree that a transform to add it would be a simple build component to add and share. I’m not 100% convinced either that it should be force included, but I’m not strongly against it either. So with that out of the way, I’m going to play devil’s advocate and explain why I still think this would be a good idea :blush:

First reason why I think force inclusion of an __id field would be a good idea is that it closes some of the technical limitations of GraphQL when compared to Falcor almost instantly. I’m not sure how much you all have looked at Falcor, but to me on technical merit alone Falcor beats GraphQL. Once you add in stuff like a schema and a friendly query language as points on the GraphQL side, that’s where I think GraphQL makes up for some of its technical deficiencies (note: I love GraphQL! This conversation is all in relative to Falcor, GraphQL is way better than REST). The reason I think Falcor has a better technical implementation is that all data is normalized when you make the request.

Falcor doesn’t need to add an __id field or globally unique identifiers, because _the path_ is the globally unique identifier. Being built the way it is Falcor gets the following _technical_ wins over GraphQL (in my mind):

No duplicate data in the same request, everything is normalized by default. In GraphQL it is definitely possible (and perhaps best practice) to get multiple instances of the same object in your tree. Just look at the introspection query.
Streaming and live updates of data are basically free. The server just pushes a data “envelope” with certain globally unique paths and that data can easily be merged into the client’s store. In GraphQL’s case (as we all know) mutations/streaming/live updates are tough to implement (albeit I don’t work at Facebook, so this information is secondhand. Relay 1 mutations definitely aren’t fun though).
Falcor clients don’t require much to just work. Powerful GraphQL clients (like Relay and Apollo), require a good deal of extra features. Like query transforms to insert __id, or extra code for optimistic mutations and then normalizing the network mutation. Eventually, as GraphQL clients become more powerful, more imperative patterns get adopted in userland (as we have seen with Relay 2 and Apollo mutations). Now duplicate the work it took to write (pretty much 3) powerful GraphQL JavaScript clients across other languages and platforms. GraphQL isn’t a super easy tool to write clients for even if it’s a lot better than REST.

So in my mind, Falcor is better for maintaining server state in complex apps off the shelf, and GraphQL provides a much better experience when you make single requests. After coming to this conclusion the question then becomes, well how do you fix this? Because I’m definitely not using Falcor anytime soon, the tooling just isn’t as good as GraphQL :wink:. _My_ answer would be to implement greedy __id fields.

The __id field by itselfs gives us many of the same benefits as Falcor gets from globally unique paths (except response normalization…). I’m not super familiar with how the GraphQL team is trying to solve the problem of live updates, but one solution that could work is to send isolated “patches” of new data to clients where the patch has an associated __id. The client can then only use that patch if it is _guaranteed_ to know the __id from its first request. A lot of problems in GraphQL like query data normalization, streaming, live updates, mutations, and more can be solved easily if the client is guaranteed to always have a globally unique key for the objects it gets back from the server.

Furthermore, a transform to add __id may be trivial to write (as @stubailo has mentioned), but it may not be so trivial to use. The reason is that it would still require a GraphQL schema file from the user. To use a powerful GraphQL client, there are a lot of barriers to adoption. The first being the obvious need for a GraphQL server, then you need a GraphQL introspection schema file, and then you need to introduce a build step. However, building a GraphQL client that still has the power of Apollo or Relay without requiring the users keep a schema file up to date or introducing a build step can be as simple as forcing the inclusion of an __id field.

Continuing on as devil’s advocate, now I want to ask a question :wink:

I get that overfetching is an anti-pattern that GraphQL is trying to avoid, however how bad is it really to always include a couple of relatively small string field in the payload? Especially when the upsides are fairly large for a better DX. I don’t think it’s that great of a cost especially given that teams working with GraphQL have already decided it’s an ok practice to potentially include more than you need in the following ways:

First (as mentioned before) because queries aren’t normalized you may get the same objects twice (like a user object).
The Relay team has removed query diffing from Relay 2 because the cost of getting extra data sent over the network is the lesser evil when it comes to the other impacts of query diffing.
You can send arbitrary JSON objects as scalar fields without querying the selective fields (see this Twitter thread).

Given its already common practice to give more than the client asked for (whether it be in duplicate data, extra data, or arbitrary data) is there a reason to hold this tenet when there may be benefits in selectively breaking it? (with little drawbacks that I see) Of course I’m operating under the assumption that the real reason we don’t send extra data is to reduce network payload size, I may be wrong so please let me know :+1:

I love GraphQL, I think it has a really promising future, the __id specification solves _a lot_ of my problems, I just think it may be worth seriously considering breaking a core tenet for this special case which may have some serious upsides when it comes to client authors :blush:

I’ve seen where this could be useful. If you read my word wall and decide this is not worth exploring or at least not worth a second thought, I won’t be hurt. Keep up the great work! :+1:

calebmer on 20 Sep 2016

👍3

All 11 comments

I also need this to implement "greedy" queries..

chain-police on 5 Jul 2016

I don't think this is a good idea since it reduces a client's ability to predict the shape of the response and could result in accidental over-fetching for existing clients if new types or fields are added which overlap the forcefully included fields.

Instead I think a cache identifier is more a clear special case and deserves to be treated as such. I think GraphQL is more likely to see specific support for cache identifiers rather than for a query's response to be shaped by anything other than the query itself.

leebyron on 6 Jul 2016

👍3

I would be very excited to see something like __id to go with __typename. It is fine if the __id needs to be specified in the query, but it would be very helpful if it could be queried on every object and just return null or undefined if that object doesn't have a stable cache key available.

stubailo on 7 Jul 2016

👍2

Another relevant issue on the graphql repo: https://github.com/facebook/graphql/issues/188

Globegitter on 6 Sep 2016

We should definitely continue discussing the possibility of an __id field in the spec. However, this current proposal goes against a core tenet of GraphQL that the server returns only the fields the client asked for. Beyond the potential perf benefits of avoiding getting more data than you strictly needed, it's much easier to reason about the system if the only reason a field can be in the response is because there is a corresponding field in the query.

@calebmer It seems like the main motivation for this issue was to avoid the need for preprocessing, and that the main use case could be addressed by the __id proposal. Is there more to discuss here relative to preprocessing, or should we close this issue and focus on __id?

josephsavona on 20 Sep 2016

👍2 👎1

Yeah, it's trivial to write a GraphQL transform that would add __id to every selection set--we'll definitely do that immediately as soon as that makes it into the spec so it'll be easy to get it going.

stubailo on 20 Sep 2016

Ok. I'm going to go ahead and close, @calebmer feel free to reopen or comment if there is more to discuss!

josephsavona on 20 Sep 2016

@leebyron, @stubailo, @josephsavona

No duplicate data in the same request, everything is normalized by default. In GraphQL it is definitely possible (and perhaps best practice) to get multiple instances of the same object in your tree. Just look at the introspection query.
Streaming and live updates of data are basically free. The server just pushes a data “envelope” with certain globally unique paths and that data can easily be merged into the client’s store. In GraphQL’s case (as we all know) mutations/streaming/live updates are tough to implement (albeit I don’t work at Facebook, so this information is secondhand. Relay 1 mutations definitely aren’t fun though).
Falcor clients don’t require much to just work. Powerful GraphQL clients (like Relay and Apollo), require a good deal of extra features. Like query transforms to insert __id, or extra code for optimistic mutations and then normalizing the network mutation. Eventually, as GraphQL clients become more powerful, more imperative patterns get adopted in userland (as we have seen with Relay 2 and Apollo mutations). Now duplicate the work it took to write (pretty much 3) powerful GraphQL JavaScript clients across other languages and platforms. GraphQL isn’t a super easy tool to write clients for even if it’s a lot better than REST.

Continuing on as devil’s advocate, now I want to ask a question :wink:

First (as mentioned before) because queries aren’t normalized you may get the same objects twice (like a user object).
The Relay team has removed query diffing from Relay 2 because the cost of getting extra data sent over the network is the lesser evil when it comes to the other impacts of query diffing.
You can send arbitrary JSON objects as scalar fields without querying the selective fields (see this Twitter thread).

I’ve seen where this could be useful. If you read my word wall and decide this is not worth exploring or at least not worth a second thought, I won’t be hurt. Keep up the great work! :+1:

calebmer on 20 Sep 2016

👍3

@calebmer,

Furthermore, a transform to add __id may be trivial to write (as @stubailo has mentioned), but it may not be so trivial to use. The reason is that it would still require a GraphQL schema file from the user. To use a powerful GraphQL client, there are a lot of barriers to adoption. The first being the obvious need for a GraphQL server, then you need a GraphQL introspection schema file, and then you need to introduce a build step. However, building a GraphQL client that still has the power of Apollo or Relay without requiring the users keep a schema file up to date or introducing a build step can be as simple as forcing the inclusion of an __id field.

This is not true, the way Apollo currently does query transformations is by applying them recursively to a selection set. You don't need the schema to do that. 😄

migueloller on 20 Sep 2016

With regards to Apollo, the reason why it would be beneficial to have __id is that this code:

const CacheableGraphQLObjectTypenames = [/* ... */]; // these types have an `id` field

const client = new ApolloClient({
  dataIdFromObject: ({ __typename, id }) =>
    (CacheableGraphQLObjectTypenames.includes(__typename) ? `${__typename}:${id}` : null),
  networkInterface: createNetworkInterface(process.env.GRAPHQL_API_ENDPOINT),
  queryTransformer: addTypename,
});

would change to this:

const client = new ApolloClient({
  dataIdFromObject: ({ __id }) => __id,
  networkInterface: createNetworkInterface(process.env.GRAPHQL_API_ENDPOINT),
  queryTransformer: addId,
});

migueloller on 20 Sep 2016

Was there ever any progress on adding greedy queries / fields that are always returned?