My data source contains objects with a content field. I noticed that when I query that data, it returns a string of the complete object. It appears that when a data object gets converted to a graphql node it gets constructed with a content and contentDigest field.
If this is correct I propose that we mark those meta data fields in a way that they are less likely to clash with data sources, either by using a __ prefix or maybe grouping them together in a sys property or something.
Yeah this makes a lot of sense. content is a pretty common name so there'll definitely be conflicts. Also it's a system-level concern — it's highly unlikely you'd want to directly query either of those.
Hmm.. lemme think about this.
gql really doesn't give a lot o f room for naming stuff too. __ prefixes are soon to be a hard error for non internal fields and like the naming regex is alphanumeric and _
I also ran into other naming clashes later, like type, which is even more of a common name then content. There seem to be quite a lot of these reserved keywords that can mess up your json input. I had a look at the json transformer, and it happily loads the whole data object and then overwrites fields with the graphql schema stuff.
Here obj is the source of data:
return {
...obj,
id: obj.id ? obj.id : `${node.id} [${i}] >>> JSON`,
contentDigest,
mediaType: `application/json`,
parent: node.id,
// TODO make choosing the "type" a lot smarter. This assumes
// the parent node is a file.
// PascalCase
type: _.upperFirst(_.camelCase(`${node.name} Json`)),
children: [],
content: objStr,
}
So any source which contains contentDigest, mediaType, parent, type, children or content is getting itself into trouble, without warning.
I can see four solutions:
1) Check for these reserved keywords and throw an error if they exist.
2) Put all json data into one standard field, like source or data
3) Prefix all gatsby graphql properties so that the list of reserved keywords is less likely to clash with user data.
4) Put all graphgl properties into one like sys, making that the only reserved keyword.
I think 1 is easiest, but apart from protecting the user it's not very nice to have such a long list of properties you are not allowed to have in your dataset.
Option 2 seems reasonable to me, and only makes queries a little more cumbersome maybe.
Option 3 or 4 I would prefer maybe from a user perspective, but I don't know how feasible this is or if it is even possible with the graphql restrictions. This also would mean a lot of code refactoring.
What do you think?
Yeah, this definitely needs fixed.
Thanks for the analysis on possible solutions. My reactions are:
attributes as the name for this. Another possible name is props which is both short and familiar to React.js audience. I do dislike that every graphql query would become longer.A user-centric analysis would ask "How to help people discover data in a Gatsby site and to easily write queries getting the data they want?". This suggests to me that we should put the node-specific fields under a prefix and put custom data at the top-level as that's the data people actually care about. It'll be rare that someone wants to query children or mediaType, or other node-specific fields. The only exception here is id which is decently common to query plus is referenced all the time in code so would be easier at the top level.
So here's a 4-ish proposal for the node/graphql structure:
{
id: string,
childTYPE: object, // not in nodes but generated in graphql for each type of child so would be a reserved prefix.
children: array,
parent: string
internal: {
contentDigest: string,
mediaType: string,
type: string,
content: string,
// Probably will add other fields over time.
}
...other custom fields for node.
}
parent and children are reasonably common words but they're used quite often in graphql schemas so I think it's reasonable keeping them at the top level.
So this limits us to 4 reserved words for nodes and with internal gives us the ability to add whatever else there in the future we want.
Thoughts?
(I prefer internal over sys as saying fields are "system" specific suggests they're somewhat needed by the system or anything outside of the node. I like to think of these nodes as autonomous objects that live independent lives and only agree on common fields as a way to coordinate with other nodes. "sys" feels a little too top-down controlled. I might be reading too much into this :-))
So to summarize my position. Either we namespace/hide internal fields or custom fields. I suggest we deprioritize internal fields as Gatsby site builders don't really care about them and hiding them reduces field clutter.
I like the proposal of internal as a sub object on Nodes 👍 I also like the general distinction you made on which fields go there. If you keep parent child|children and id at the top level that would leave most queries alone without to much of a restriction on field names.
id is required here anyway by he NodeInterface
I like the proposal! It feels like good compromise to keep some at top level and some hidden under internal. It makes sense to approach this from a users perspective.
I got the sys idea from Contentful, since I've been working with their data lately. I like internal too 🙂
Is id really a reserved word though? Seems like with json you take the id from the data if one is available. Then I'd say it's technically not a reserved word. That would leave us with 3 if you don't count the highly unlikely ocurrence of childTYPE in user data.
id is reserved in the sense that the NodeInterface requires it, and requires that it globally unique, without intentionally doing so tho you'd probably get conflicts by just using the "local" one
Well unfortunately you won't get conflicts, you'll just overwrite the previous nodes. Source plugins just have to ensure that they're generating unique ids.
Going to start working on this today.
Another thought — what do people think about making nodes fully immutable? So remove the actionupdateNode. Transformer plugins won't be able to set fields on nodes but will have to always create new nodes.
@KyleAMathews Sounds very reasonable, but I don't know enough about its current use. Why did you come up with that API and how have you used it so far? What problem was it solving?
A quick look at the code tells me that it's mainly used to add new children to a node. How would you plan to do that if the nodes are immutable? Can we do without creating children?
@0x80 children would have to not be immutable.
Thinking about this more, I'm not sure we could make the node as a whole immutable. But perhaps we could make it so that other plugin couldn't overwrite the original node fields. So instead of a generic updateNode action, we add createNodeField for adding new fields to an existing node and setParent for setting the parent/child relationship for a new node.
This way fields are still immutable but we retain the convenience/ergonomics of being able to directly modify existing nodes e.g. for defining slugs on nodes (which it's pretty common for markdown) or directly connecting nodes with the ___node syntax.
Thoughts?
@KyleAMathews That sounds very reasonable to me. I think it's always good to make the API more focussed and strict like this if you can.
This has been fixed since. The content field together with others is moved under internal