Keystone: Pairs of Relationship fields (ie. "backlinks") store 2 copies of relationship data; copies not kept in sync

Created on 6 Nov 2019  路  9Comments  路  Source: keystonejs/keystone

I have a Keystone, running on Knex/Postgres, using the latest packages (@keystonejs/keystone: 5.1.1, @keystonejs/adapter-knex: 5.1.0, @keystonejs/app-admin-ui: 5.0.2, @keystonejs/app-graphql: 5.0.0, @keystonejs/fields: 5.1.0).

It has two related lists:

keystone.createList('User', {
  fields: {
    name: { type: Text },
    posts: { type: Relationship, ref: 'Post.author', many: true },
    // ...
  },
});

keystone.createList('Post', {
  fields: {
    name: { type: Text },
    author: { type: Relationship, ref: 'User.posts' },
    // ...
  },
});

There's a bunch of data in this system already. For example, the user 1747 is author of post ABCD. GraphQL bears this out:

{
  User(where: { id: "1747" }) {
    id
    name
    posts { id name }
  }
}

Which correctly returns:

{
  "data": {
    "User": {
      "id": "1747",
      "name": "Aida Yasuaki",
      "posts": [
        {
          "id": "ABCD",
          "name": "Soroban For Dummies"
        }
      ]
    }
  }
}

The Admin UI for this post gives me a drop down to select the single author (user). Updating the author to user 1598 performs a query like this:

mutation {
  updatePost(
   id: "ABCD"
   data: {
     author: { connect: { id: "1598" } }
   }
  ) { id __typename }
}

But when we look at the data we see now _both_ users think they own the post:

{
  allUsers(where: { id_in: ["1747", "1598"] }) {
    id
    name
    posts { id name }
  }
}

Returns...

{
  "data": {
    "allUsers": [
      {
        "id": "1747",
        "name": "Aida Yasuaki",
        "posts": [{ "id": "ABCD", "name": "Soroban For Dummies" }]
      },
      {
        "id": "1598",
        "name": "Yoshida Mitsuyoshi",
        "posts": [{ "id": "ABCD", "name": "Soroban For Dummies" }]
      }
    ]
  }
}

So, the bug here is that the Admin UI isn't including disconnectAll: true when setting many: false relationships. The GraphQL mutation above should be something like:

mutation {
  updatePost(
   id: "ABCD"
   data: {
     author: { disconnectAll: true, connect: { id: "1598" } }
   }
  ) { id __typename }
}

That specific issue shouldn't be hard to fix but the _real_ issue here is the concept of backlinks as a whole. We're actually maintaining _two separate_ relationships here: A one-to-many relationship and a separate many-to-many relationship linking the same tables. (Note that, when even when both authors think they "own" the post, the post itself links correctly to just one.)

I'll write this larger issue separately but, sufficed to say, backlinks add complexity (surface area for bugs like this! ^^), slow down data access (reads and writes) and makes maintaining data integrity a lot harder. They're also unnecessary :)

Relationship needs-review

All 9 comments

when even when both authors think they "own" the post, the post itself links correctly to just one.

This is correct and is reflected in how the example lists are setup.

Relationship fields are only one way. They only have enough information to be one way.

To achieve what you're after, you would need to be able to have a User tell a Post to tell all its authors to disconnect themselves from this Post, ie;

mutation {
  # First, tell users they're no longer the author of the post
  # (NOTE: This mutation doesn't exist in KS5)
  updateUsers(
    where: { posts: { some: { id: "ABCD" } } },
    data: { posts: { disconnect: [{ id: "ABCD" }] } }
  ) {
    id
  }

  # Next, tell the Post who the actual author is (could also tell the Author which Post they own)
  updatePost(
    id: "ABCD"
    data: {
      author: { connect: { id: "1598" } }
    }
  ) {
 }
}

This level of cascading changes is not yet build into Keystone. We could conceivably add it, but I'm unsure what the syntax would look like here, and we'd have to think about when we do and do not want it enabled. Perhaps it's another option on Relationship fields: { type: Relationship, cascadeNestedOperations: true }?

They're also unnecessary

I have shipped 2 products now which use Back references heavily and they have been very useful to simplify queries and ensure I can query data from both sides.

slow down data access (reads and writes) and makes maintaining data integrity a lot harder.

Other than the example above, where else are backlinks causing issues? Would love some more concrete examples here.

Relationship fields are only one way. They only have enough information to be one way.

Ahh, I think this gets to the crux of it.

I agree that _Relationship fields_ only have enough information to be one way at the UI and GraphQL layers but, at the _data_ level, "one way" relationships simply don't exist -- anything that links A to B in the DB also links B to A. The problem is, _Keystone's relationship field_ couples the configuration of these 4 things together:

  1. A conceptual relationship between two lists
  2. Fields in the Admin UI
  3. Fields in the GraphQL schema
  4. Structures (fields, columns, tables, etc.) in the DB

Only points 2 and 3 are "one-way"; they model only one side of the relationship and can only be used in one direction.

When we use two _Relationship fields_ pointing at each other (aka. backlinks), we end up with 2 x "two-way" relationships in the DB. This creates real problems. We also end up with 2 x pairs of one-way fields in the Admin UI and GraphQL that sorta-kinda-sometime act like a single relationship (but, as evidenced by this issue, actually aren't).

I agree that, to _manage_ "both sides" of the relationship (from the admin UI or GraphQL), it makes sense to have config on both lists. But we should be clear that these really are two sides _of the same relationship_ and having config in two places doesn't mean we need two representations of the relationship in the DB. Put another way, conceptually and in terms of data, relationships do not _belong_ to a list, they exists _between_ lists.

This level of cascading changes is not yet build into Keystone. We could conceivably add it [...]

I agree but that's not what I'm suggesting -- I'm not saying we should do _more_ work to maintain backlink data, I'm saying we don't need backlink data to begin with.

I have shipped 2 products now which use Back references heavily and they have been very useful to simplify queries and ensure I can query data from both sides.

If you're saying that, having GraphQL fields and admin UI that lets you manage a relationship from both sides is essential, then yes, I agree! What I'm arguing against is underlying representation in the DB and, more broadly, conceptual clarity around what relationships are.

For the users-posts example, the current KS add's a User_posts table. I contend there's no reason for this table to exist -- the two Relationship fields are describing two halves of the same relationship; adding a _second_ representation of a _single_ relationship to the DB has only downsides. There is no Keystone operation or GraphQL query that actually requires this table and any query currently hitting it would be simpler and faster if it did not. I expect this holds true in both Mongo and relational DBs.

In terms of sync'ing issues, I've also seen items where (in the above example), the post item has a valid author but no backlink data (User_posts records) exists. I'd call this a bug but not sure if you would, @jesstelford? If I'm parsing your comments correctly it sounds like this behaviour is intended..?

I also don't have specific Keystone package version for this behaviour or steps to reproduce; have only observed the resultant data.

This is almost a duplicate of #305

After an offline discussion with @jesstelford we've come to the agreement that the behaviour observed in the initial example here is a) as per the initial design and b) not what we want. We are calling this problem the dangling backlink problem. If we have a relationship that is accessible from both sides, both those sides should always be consistent with each other. In the initial example in this issue two different users have a particular post in their posts field, while the post in question only has a single author. This inconsistency is the problem that needs to be addressed to solve this particular issue. More generally, we agreed that the implementation detail of backlinks should also be revisited, for all the reasons outlined above. That larger issue is something we definitely want to address, but we should note that it is a different problem to the dangling symlink problem.

Could this be solved by "pseudo backlinks", meaning one side it is actually stored but the other side is computed using db queries in resolvers.

Could this be solved by "pseudo backlinks", meaning one side it is actually stored but the other side is computed using db queries in resolvers.

As above, it's not possible to store "one side" of a relationship; you either store it or you don't. If the link between two items is stored somewhere there's nothing left to "compute".

We need to get far away from "backlinks" as a concept. They simply don't exist in any meaningful way. We want to be able to configure:

  • Links _between_ lists
  • How these are exposed in GraphQL
  • How these are exposed in the Admin UI

There is no blessed/special/front/back side of a relationship in our conceptual model, regardless of where the config exists in the codebase.

It looks like there hasn't been any activity here in over 6 months. Sorry about that! We've flagged this issue for special attention. It wil be manually reviewed by maintainers, not automatically closed. If you have any additional information please leave us a comment. It really helps! Thank you for you contribution. :)

Fixed in #2000

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gautamsi picture gautamsi  路  14Comments

ricardonogues picture ricardonogues  路  10Comments

ra-external picture ra-external  路  10Comments

jesstelford picture jesstelford  路  19Comments

molomby picture molomby  路  12Comments