Relay: Loosening object identification GraphQL spec

Created on 19 Apr 2016  路  20Comments  路  Source: facebook/relay

On a project I'm working on we are automatically generating a GraphQL server from a SQL server and many of our tables have an integer column named id. When serving this table through GraphQL we would like to make the id column queryable, however we also want our rows to implement a Node type in accordance to the Relay GraphQL spec. According to the spec an id field that is globally unique is required for compliance. This puts us in a binding position. Do we not allow columns to be named id? Do we rename id to rowId in GraphQL? These solutions aren't ideal as we also have many more foreign key columns like authorId which reference an id primary key.

The optimal solution would be if Relay allowed the id field to be renamed, either to _id or maybe gid (for global ID). Would it be possible to update the specification and Relay to allow for this? Is there a solution I'm not seeing?

Most helpful comment

Let me first clarify a few things, and then describe the changes we're considering going forward.

First, some clarifications & corrections to comments above:

  • Relay does _not_ require all types to implement Node. You're free to implement this interface or not for each type.
  • If a type has an id field, Relay _assumes_ that is is a globally unique ID and uses it for cache de-duplication. If you attempt to refetch data starting at such a record (e.g. with setVariables), Relay will assume that the type can be refetched via the node root call. This assumption is for legacy reasons; some types may have a globally unique field but not happen to implement Node.
  • Because of the above, the Relay plugin always adds the id field if is defined on a type so that the runtime can use it to de-dupe and refetch as necessary.

In the long-term, we'd like to make a few changes to support a wider variety of schemas. Foremost among these is a proposal from @leebyron to define the behavior of an __id field as part of the GraphQL specification. This would be a field that any type could optionally implement and which behaves _roughly_ as follows: if __id is defined for a type, it must be a non-null globally unique identifier.

This would free up the id field name for type-specific identifiers. Refetching in Relay would work the same as it does today: implement Node (basically a no-op implementation) _and_ define __id on a type to allow refetchability via the __id. Skip implementing Node to make Relay use the __id only for de-duplication.

All 20 comments

I'd be reluctant to incur the internal complexity of making this kind of thing configurable, especially given that a work around is possible (exposing the underlying database's id field using another field name).

But leaking the underlying primary key sounds like a leaky abstraction to me anyway. I'd recommend that you only expose and use "global" IDs in your GraphQL schema (effectively base64encode(type + ':' + databaseID)). You shouldn't need to ever leak foreign keys like authorID in your schema because you'll typically be querying for them compositionally (ie. viewer { books { author } }) or via the node root field (ie. node(id: "someGlobalID")).

The difficulty comes when introducing compound keys. There is no way (in my case) to intelligently compact the fields into a single field name _and_ hiding these fields could also cause problems.

Another layer of complexity is added when these compound primary keys are also foreign keys.

There are some consumers that would greatly benefit from being able to access the data in the same form it is stored and other consumers (_cough_ Relay _cough_) which would benefit from the Node format.

The difficulty comes when introducing compound keys. There is no way (in my case) to intelligently compact the fields into a single field name and hiding these fields could also cause problems.

Another layer of complexity is added when these compound primary keys are also foreign keys.

Can you provide a concrete example of these cases in a bit more detail, to ground the discussion? I want to make sure I fully understand the exact nature of the problem that you're facing.

Take any join table that in a graph database would be an edge for instance. Say I have a table star and it has columns user_id and post_id (where id represents a serial integer id). The star table also has some metadata columns like created_at.

user_id references a user table's id column, post_id references a post table's id column and the post table also has a column author_id which references the user table's id column.

Another constraint (for me) is I can't custom design a GraphQL schema. It has to be generated from the detected relationships among the data.

Can't you have the client only be aware of global ids, and then using fromGlobalId on the server to map that to database objects? That's how we're doing it when the id-relations associations are essential.

https://github.com/facebook/relay/blob/da307264f79877884f67fe1726baa505af0b2204/examples/todo/data/schema.js#L260

Say I have a table star and it has columns user_id and post_id (where id represents a serial integer id). The star table also has some metadata columns like created_at. user_id references a user table's id column, post_id references a post table's id column and the post table also has a column author_id which references the user table's id column.

This would probably look something like:

type Star {
  id # use the helper to generate a global id from type + `id`
  user: User  # resolve() uses star.user_id to lookup the User
  post: Post  # resolve() uses star.post_id to lookup the Post
  created_at: Text
}
type Post {
  id # same here
  author: User
}
type User { ... }

Can you clarify where this would break down for your use case?

@josephsavona it would break down at automatic generation for fields. I agree that schema would be the ideal, however I don't have the luxury to hide the primary key fields (I can't make assumptions about the data the schema is being generated from).

Would a solution like the following work with Relay, an id field with the following signature:

id(global: Bool)

Where global defaults to true? Then we could have a query like:

{
  me {
    id
    rowId: id(global: false)
  }
}

As a side note, I've opened sourced the project I'm working on so we can talk in more specifics if you want. Its here at PostGraphQL.

While making the name of the field (id) configurable might be complicated, choosing a better name for it (relay_id) so that it does not collide with just about every project schema naming convention would not be that complicated.
The problem is that users already using relay would need to rewrite their schema.
@wincent can you point out the places in the code where the "id" is hardcoded?

Any chance that this line is the only place that would need a change?
https://github.com/facebook/relay/blob/master/src/interface/RelayOSSNodeInterface.js#L38

_Before I realised I was on the wrong repo, I filed a longer ticket here: https://github.com/graphql/graphql-relay-js/issues/92_


While making the name of the field (id) configurable might be complicated, choosing a better name for it (relay_id) so that it does not collide with just about every project schema naming convention would not be that complicated.

I agree with @ruslantalpa.

Using such a common and ambiguous field name is bound to lead to clashes for e.g. people that start with just GraphQL and later on want to add Relay.

What I think this thread is missing is that not _all_ Relay clients can easily be updated when changing the semantics of an existing field that they鈥檝e been using. In our case we have an iOS app and it鈥檚 very reasonable to assume that people will not update their app when we want them to.

I really do not like options, so my suggestion would also be to use a Relay specific ID field, if one exists, and otherwise fallback to id. Although, reading through the code, it seems like making the field configurable would be simpler (complexity-wise) than trying to reflect on the schema.

@wincent After code-spelunking a bit more, my thoughts are slightly different. Namely that Relay infers the primary key from the Node interface in the given GraphQL schema by looking for a field with type ID!.

With this traditional interface, Relay would use id as the primary key:

interface Node {
  id: ID!
}

But with this interface it would be relayID:

interface Node {
  relayID: ID!
}

I had wanted to provide a prototype patch, but after code-spelunking a full day I have to admit I鈥檓 slightly out of my depth with the code-base. My thoughts were:

  • Add an optional primary key name parameter to the nodeDefinitions helper.
  • babel-relay-plugin infers the primary key name by checking the Node interface in the given schema for a field that has the ID! type and attaches that field name as the inferredPrimaryKey metadata to the generated fields of types that have the Node interface.
  • Relay uses the inferred primary key, but right now I鈥檓 not even sure what Relay uses that information for and it also doesn鈥檛 yet solve the hardcoded id value that @ruslantalpa linked to above.

I have been making some progress on this. With the following patches I鈥檓 able to define a custom Node ID field:

I still have some failing babel-relay-plugin tests to deal with after which I鈥檒l start PRing.

Can somebody tell me if Relay should _always_ include the id field in queries if that field exists on a type or if that just happened to be the case because Relay assumed that if an id field is present it is there because it implements the Node interface?

Here is where the current code _always_ includes it, but I don鈥檛 see any tests that specifically describe that id should be included _regardless_ of whether or not the type explicitly implements the Node interface.

@alloy relay needs the id on everything, that's how it caches locally.
For a GraphQL schema to be relay compatible, all types need to implement the node interface (at least that's what i got so far)

Let me first clarify a few things, and then describe the changes we're considering going forward.

First, some clarifications & corrections to comments above:

  • Relay does _not_ require all types to implement Node. You're free to implement this interface or not for each type.
  • If a type has an id field, Relay _assumes_ that is is a globally unique ID and uses it for cache de-duplication. If you attempt to refetch data starting at such a record (e.g. with setVariables), Relay will assume that the type can be refetched via the node root call. This assumption is for legacy reasons; some types may have a globally unique field but not happen to implement Node.
  • Because of the above, the Relay plugin always adds the id field if is defined on a type so that the runtime can use it to de-dupe and refetch as necessary.

In the long-term, we'd like to make a few changes to support a wider variety of schemas. Foremost among these is a proposal from @leebyron to define the behavior of an __id field as part of the GraphQL specification. This would be a field that any type could optionally implement and which behaves _roughly_ as follows: if __id is defined for a type, it must be a non-null globally unique identifier.

This would free up the id field name for type-specific identifiers. Refetching in Relay would work the same as it does today: implement Node (basically a no-op implementation) _and_ define __id on a type to allow refetchability via the __id. Skip implementing Node to make Relay use the __id only for de-duplication.

@alloy To be consistent with the current behavior, it's probably best to infer the name of the id field from the schema's Node interface, but keep the rest of the logic for adding this field unchanged. In other words, if the schema's Node interface has an _id field, then always add _id on any type that has that field, even if it doesn't implement Node.

Created a PR https://github.com/facebook/relay/pull/1232

@josephsavona Thanks for the feedback!

I was looking for more information on __id, but I couldn't find any open issues/PRs so I opened this one in case anyone interested in this thread wants to follow/contribute: https://github.com/facebook/graphql/issues/188

I鈥檒l be maintaining a fork of Relay that uses __id instead of id https://github.com/alloy/relay

If/when in the future GraphQL & Relay adopt https://github.com/facebook/graphql/issues/188, I will gladly stop doing so.

Thanks everybody for the discussion so far.

Just to close the loop, there was an RFC PR on the subject of __id in the graphql repo. We weren't able to reach consensus, but determined that we might be able to move forward with a series of smaller RFCs.

I'm going to close this one now because I don't think there is anything actionable that we can do at this point, but we should continue to monitor the situation and open a new issue to re-start the discussion once there's been some more movement on the GraphQL side. Thanks once again for all the input!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MartinDawson picture MartinDawson  路  3Comments

fedbalves picture fedbalves  路  3Comments

piotrblasiak picture piotrblasiak  路  3Comments

jstejada picture jstejada  路  3Comments

leebyron picture leebyron  路  3Comments