Gatsby: [1.0] GraphQL node.content field

Created on 5 May 2017 · 17Comments · Source: gatsbyjs/gatsby

My data source contains objects with a content field. I noticed that when I query that data, it returns a string of the complete object. It appears that when a data object gets converted to a graphql node it gets constructed with a content and contentDigest field.

If this is correct I propose that we mark those meta data fields in a way that they are less likely to clash with data sources, either by using a __ prefix or maybe grouping them together in a sys property or something.

Source

0x80

All 17 comments

Yeah this makes a lot of sense. content is a pretty common name so there'll definitely be conflicts. Also it's a system-level concern — it's highly unlikely you'd want to directly query either of those.

Hmm.. lemme think about this.

KyleAMathews on 5 May 2017

gql really doesn't give a lot o f room for naming stuff too. __ prefixes are soon to be a hard error for non internal fields and like the naming regex is alphanumeric and _

jquense on 5 May 2017

I also ran into other naming clashes later, like type, which is even more of a common name then content. There seem to be quite a lot of these reserved keywords that can mess up your json input. I had a look at the json transformer, and it happily loads the whole data object and then overwrites fields with the graphql schema stuff.

Here obj is the source of data:

return {
      ...obj,
      id: obj.id ? obj.id : `${node.id} [${i}] >>> JSON`,
      contentDigest,
      mediaType: `application/json`,
      parent: node.id,
      // TODO make choosing the "type" a lot smarter. This assumes
      // the parent node is a file.
      // PascalCase
      type: _.upperFirst(_.camelCase(`${node.name} Json`)),
      children: [],
      content: objStr,
    }

So any source which contains contentDigest, mediaType, parent, type, children or content is getting itself into trouble, without warning.

I can see four solutions:

1) Check for these reserved keywords and throw an error if they exist.
2) Put all json data into one standard field, like source or data
3) Prefix all gatsby graphql properties so that the list of reserved keywords is less likely to clash with user data.
4) Put all graphgl properties into one like sys, making that the only reserved keyword.

I think 1 is easiest, but apart from protecting the user it's not very nice to have such a long list of properties you are not allowed to have in your dataset.

Option 2 seems reasonable to me, and only makes queries a little more cumbersome maybe.

Option 3 or 4 I would prefer maybe from a user perspective, but I don't know how feasible this is or if it is even possible with the graphql restrictions. This also would mean a lot of code refactoring.

What do you think?

0x80 on 11 May 2017

Yeah, this definitely needs fixed.

Thanks for the analysis on possible solutions. My reactions are:

too many (normal) reserved words. Would add friction to a lot of normal source/transformer plugins.
This is exactly what Drupal does (which I spent a lot of time with the past and have drawn a number of design ideas from). They use nodes with known top-level fields and then everything else gets put under a key. They use attributes as the name for this. Another possible name is props which is both short and familiar to React.js audience. I do dislike that every graphql query would become longer.
This could be interesting too. Though kinda cumbersome as all internal and plugin code would need to reference the core node fields by the prefix as well.
Also interesting.

A user-centric analysis would ask "How to help people discover data in a Gatsby site and to easily write queries getting the data they want?". This suggests to me that we should put the node-specific fields under a prefix and put custom data at the top-level as that's the data people actually care about. It'll be rare that someone wants to query children or mediaType, or other node-specific fields. The only exception here is id which is decently common to query plus is referenced all the time in code so would be easier at the top level.

So here's a 4-ish proposal for the node/graphql structure:

{
  id: string,
  childTYPE: object, // not in nodes but generated in graphql for each type of child so would be a reserved prefix.
  children: array,
  parent: string
  internal: {
    contentDigest: string,
    mediaType: string,
    type: string,
    content: string,
    // Probably will add other fields over time.
  }
  ...other custom fields for node.
}

parent and children are reasonably common words but they're used quite often in graphql schemas so I think it's reasonable keeping them at the top level.

So this limits us to 4 reserved words for nodes and with internal gives us the ability to add whatever else there in the future we want.

Thoughts?

KyleAMathews on 11 May 2017

👍1

(I prefer internal over sys as saying fields are "system" specific suggests they're somewhat needed by the system or anything outside of the node. I like to think of these nodes as autonomous objects that live independent lives and only agree on common fields as a way to coordinate with other nodes. "sys" feels a little too top-down controlled. I might be reading too much into this :-))

KyleAMathews on 11 May 2017

So to summarize my position. Either we namespace/hide internal fields or custom fields. I suggest we deprioritize internal fields as Gatsby site builders don't really care about them and hiding them reduces field clutter.

KyleAMathews on 11 May 2017

👍1

I like the proposal of internal as a sub object on Nodes 👍 I also like the general distinction you made on which fields go there. If you keep parent child|children and id at the top level that would leave most queries alone without to much of a restriction on field names.

jquense on 11 May 2017

id is required here anyway by he NodeInterface

jquense on 11 May 2017

I like the proposal! It feels like good compromise to keep some at top level and some hidden under internal. It makes sense to approach this from a users perspective.

I got the sys idea from Contentful, since I've been working with their data lately. I like internal too 🙂

Is id really a reserved word though? Seems like with json you take the id from the data if one is available. Then I'd say it's technically not a reserved word. That would leave us with 3 if you don't count the highly unlikely ocurrence of childTYPE in user data.

0x80 on 11 May 2017

👍1

id is reserved in the sense that the NodeInterface requires it, and requires that it globally unique, without intentionally doing so tho you'd probably get conflicts by just using the "local" one

jquense on 11 May 2017

👍1

Well unfortunately you won't get conflicts, you'll just overwrite the previous nodes. Source plugins just have to ensure that they're generating unique ids.

KyleAMathews on 11 May 2017

Going to start working on this today.

Another thought — what do people think about making nodes fully immutable? So remove the actionupdateNode. Transformer plugins won't be able to set fields on nodes but will have to always create new nodes.

KyleAMathews on 12 May 2017

@KyleAMathews Sounds very reasonable, but I don't know enough about its current use. Why did you come up with that API and how have you used it so far? What problem was it solving?

0x80 on 12 May 2017

A quick look at the code tells me that it's mainly used to add new children to a node. How would you plan to do that if the nodes are immutable? Can we do without creating children?

0x80 on 12 May 2017

@0x80 children would have to not be immutable.

Thinking about this more, I'm not sure we could make the node as a whole immutable. But perhaps we could make it so that other plugin couldn't overwrite the original node fields. So instead of a generic updateNode action, we add createNodeField for adding new fields to an existing node and setParent for setting the parent/child relationship for a new node.

This way fields are still immutable but we retain the convenience/ergonomics of being able to directly modify existing nodes e.g. for defining slugs on nodes (which it's pretty common for markdown) or directly connecting nodes with the ___node syntax.

Thoughts?

KyleAMathews on 15 May 2017

@KyleAMathews That sounds very reasonable to me. I think it's always good to make the API more focussed and strict like this if you can.

0x80 on 15 May 2017

This has been fixed since. The content field together with others is moved under internal

0x80 on 23 May 2017

Was this page helpful?

0 / 5 - 0 ratings