Hotchocolate: Filtering 2.0

Created on 8 Jun 2020  ·  27Comments  ·  Source: ChilliCream/hotchocolate

Current Situation

We are currently working on the new extension API of filtering. The filter conventions will allow the user to extend the filters to their custom needs. This can either be a customer database provider or an extension to the existing API like spatial filtering.
While working on the extensions for spatial types, several issues came up. We realized that the current implementation of filtering has severe limitations in extensibility and also has an impact on the user experience of a consumer.

Issues 1: Inconsistent API design

The API released with 10.X.X does not contain an object or array filters. The API works great for scalar filters.

A filter is always a bundle between filter kind (aka. what you want to filter) and a filter operation (aka. how to evaluate).
In the 11 previews we introduced object and array filters. Object filters look like this:

Array filters basically look like the object filters if they filter objects. If foo in the example above is a list, the filter looks like this:

So far so good. Everything looks consistent. The issue actually already came with the scalar filter: foo_contains. How would the API look like when we want to filter by the length of the string?

Inconsistency: This does not feel natural. We now have a field that represents two operations (or two properties, depending on how you want to put it)

alternatively we could also do something like this:

Inconsistency: With this approach we need a helper. “is” is not a property of length. We should just write “gt”. But this does not follow the pattern “kind_operation”. _Look at Issue 2_

Foo is a scalar but also an object. Length is a property of the string. So it should really be an object filter. If we treat foo like an object we suddenly have other issues.

Inconsistency: this would be the most natural approach. We want to filter by the operation gt of the property length of the filter foo. This obviously does not work. How would we express equality in this case? foo: “bar” does not longer exist. _Look at Issue 3_

Conclusion:
With the current approach, it is not possible to write consistent APIs. The user has to learn the different syntax for different cases. What seems natural for one, might not feel natural for others. The developer or consumer should only really need to learn one style of filtering.

Issue 2: The need for helper properties

The inconsistency continues with scalar array filters. In the first place, this did not look like a hack. But we more and more realize that this was a mistake. In the example above we had a helper property for the distance filter: where: {foo_distance:{is_gt}}.
If foos is a list of strings the filter would look like this:

Conclusion:
Helper properties seem to occur too often. These helper properties seem only to exist because of bad API design in the first place.

Issue 3: Limitation in extensibility

The current implementation pretty much seals up property filters for an extension. We have already seen this issue with the scalar filters. We do face the same issue with the object and array filters as well. How would the API design look like when we want to implement comparable on our object filters?

Spatial as an example:
The current API is not expressive enough. We have different valid API designs for spatial types but all of them have their very own issues:

The “shapes are scalars” approach:

{
  mapItems(
    where: {
      shape_area: { is_gt: 100 }
      shape_distance: { is_gt: 10, shape: { type: POINT, coordinates: [1, 1] } }
      shape_intersects: { shape: { type: POINT, coordinates: [1, 1] } }
      shape_length: { is_gt: 1000 }
      shape_within: { shape: { type: POINT, coordinates: [1, 1] } }
    }
  ) {
    shape {
      coordinates
    }
  }
}

Problems: Helper properties & shape is definitely more an object rather than a scalar

The “shapes are objects” approach:

{
  mapItems(
    where: {
       shape {
         area_gt: 100,
         distance: { is_gt: 10, shape: { type: POINT, coordinates: [1, 1] } }
         intersects: { shape: { type: POINT, coordinates: [1, 1] } }
         length_gt: 1000,
         within: { shape: { type: POINT, coordinates: [1, 1] } }
    }
  ) {
    shape {
      coordinates
    }
  }
}

Problems: Helper properties & how would you express shape equality?

The “lets mix both to see if the result is better” approach:

{
  mapItems(
    where: {
       shape_area_gt: 100,
       shape_distance: { is_gt: 10, shape: { type: POINT, coordinates: [1, 1] } }
       shape_intersects: { shape: { type: POINT, coordinates: [1, 1] } }
       shape_length_gt: 1000,
       shape_within: { shape: { type: POINT, coordinates: [1, 1] } }
    }
  ) {
    shape {
      coordinates
    }
  }
}

Problems: Helper properties & we got this weird kind_kind_operation / kind_operation_operation issue (e.g. shape_area_gt)


Proposed Solution

All of these issues could be solved with one additional layer of nesting.

This is also the solution that Hasura, PostGraphile, and Gatsby chose (we decided for the wrong one 🙃 )

With this approach object and array, filters can look the same to the end-user. A user does not have to consider anymore if it is kind_operation:1 or kind {helper_operation:1}. There is a nested object for each kind and each operation.

With this approach, there is also no need for helper fields.

With this approach, it is natural to have operation inside operations.

And last but not least, it is extendable.

Limitation: You can not have properties that are named like an operation

Spatial as an example:
The power of this design really becomes clear when you design an API. There are no longer different ways to design an API. There is only one:

{
  mapItems(
    where: {
       shape {
         area: {gt: 100} 
         distance: { gt: 10, shape: { type: POINT, coordinates: [1, 1] } }
         intersects: { shape: { type: POINT, coordinates: [1, 1] }, buffer: 20 }
         length: {gt:1000}
         within: { shape: { type: POINT, coordinates: [1, 1] } }
    }
  ) {
    shape {
      coordinates
    }
  }
}

_you could also let intersects and within directly consume a shape. But then you could not buffer. but this would impact extensibility ✨_


Why we consider this change

We are working with filtering for over a year now. Filtering is one of the big features and we would like to push this feature further. We want to provide a seamless extension API for the user. At the moment different developers are working on extensions for spatial types, elastic search, MongoDB, and neo4j. There should be room for consideration about the initial API design because this is really the last time we can change something.

There are clear benefits to a different design. Apart from the ones listed above, there are also following to consider:

Understanding the Abstraction

One of the biggest benefits is the understanding of the abstraction. Looking at a combination of kind and operation (“kind_operation”, “foo_gt”) is mentally more challenging than having the operation separate from the kind. The additional complexity of “foo_gt” compared to “foo: {gt:12}” might not be clear when writing queries. But it really becomes challenging when writing extensions.

Visitation & Handlers

The visitor will become easier and the handlers make more sense. We already have filter kinds that do not follow the initial pattern. apart from “Array” there is also “ArrayAll”, “ArraySome” and “ArrayNone”. This really is bad and is already a hack that we use in v11.
A operation also needs a way to “enter” and “leave”.


Drawbacks

  • There will be significantly more types. (Field count stays more or less the same though.)
  • This cannot be released under HotChocolate.Types.Filters because this would break everything

How this could be released.

The title of this issue is filtering 2.0. As this would change the structure of _all_ filters (including scalars deployed with 10.X.X), this would be a major breaking change. Furthermore, this would not break the backend, but it would also break _all the consumers_ of filtering.

Just to make one thing clear:

Breaking the current implementation of filtering IS NOT AN OPTION and will not be considered

The only possible way to release a change like this is to copy the old implementation of filtering, deprecate it, and release a new package as an alternative.
This way people are not forced to upgrade to a new API with 11. Unfortunately, there is no “soft” migration path. These two filtering approaches cannot coexist on the same “where” argument.


Possible Next Steps:

  1. We continue with the current implementation and work around all the issues listed above
  2. We deprecate filtering and release a new package. The deprecated package would NOT include array and object filters. These have only been available in 11 previews
  3. We deprecate filtering and release a new package. We release array and object filters with the old API as the last change
🌶 hot chocolate 🎨 refactoring

Most helpful comment

Also, with this change, we should fix our naming conventions. In GraphQL snail case is a good style for enums but not for field names. order_by should now, by default, become orderBy.

All 27 comments

This is a very detailed description, thank you for putting it together.

I agree that v2 filters should not replace v1 in 11 as that would be far too disruptive for existing api clients. It would be great to see a lot of adoption of 11 since it is such a great improvement and doing that would be counter productive.

It will benefit everyone to make the API more consistent and while doing so more simple and extensible even if it takes more time and effort.

I vote for 2 as the next step. This would not cause code changes to filtering while upgrading to 11 and it would allow us to spend time on the required changes for v2.

We could punt object and array filters to the community for v1 filtering? Unless the finish line for those is close enough in the previews, then option 3 would not be out of the question.

I think I would prefer to freeze the V1 filters on the state of version 10.

Meaning we would have the V1 API with the feature set of V10. People have time to migrate or even use them in parallel. If people want more advanced filters, they have to move to version 2.

Why freeze them on the features of version 10?
I want to focus on our new approach and not waste any time fixing new issues that we might have introduced. The filters in version 10 are simple; this ultimately means they are simple to maintain.

Nevertheless, I would leave the code that we have in version 11 for filters in until we have the new API so that we give even people that are using the current preview bits time to migrate to the newer bits.

Good with the rehaul is that we can move all the APIs under one new package HotChocolate.Filters which will have all the code to integrate data sources like selections, filters, and sorting. On top of this base package, we can build extensions like the Mongo native integration or neo4j integration.

Also, with this change, we should fix our naming conventions. In GraphQL snail case is a good style for enums but not for field names. order_by should now, by default, become orderBy.

The only point I have (which might be moot) is how do you deal with OR? As far as I understand from this image:

image

this is foo: contains 'bar' AND .length gt 29, which I think is a fine default, but I also think being able to write OR filters somehow is worth considering (if it hasn't already).

@Alxandr
The OR support would be similar to what we already have in V10
There are two possible solutions I think
Solution 1

where : {
      or: [
         {
            foo: { lt: 2}
         },
         {
            foo: { lt: 3 }
         }]
}

Solution 2

where : {
      foo: { 
        or: [
          {lt: 2 },
          {gt: 3 }
        ]
      }
}

What do you think? I think they could even coexist. though solution 2 may lead to extra complexity
I would probably go for 1

I don't think you actually need or as a "property", given that GraphQL support type unions. You could do something like union Where = Filter | Filter[]

@Alxandr GraphQL does not. The union you posted is invalid. First, GraphQL only supports output unions. Second, you create a union only on a set of object types. So you cannot switch between single item and list of items. The or-field is the only viable option.

Also, even with the upcoming input union, this would not work.

@PascalSenn I would go for the first solution not the second ... the second adds complexity on implementation and user.

@PascalSenn, @michaelstaib, the new filtering approach looks solid and extensible enough. As long as it doesn't break existing filtering I do not foresee any problems with its adoption. Moreover, it seems that the filter generation on the client side will be much easier this way.

Personally, I'm not a big fan of such auto filters, because the idea of exposing dynamic filter engine to world via API is unsafe. But such a mechanism has the right to life for administration purposes (admin panel for example).

As for the proposed solution, it reminds me ElasticSearch filtering language. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html. You can look at the documentation and get ideas from there.

@anadale It would not break 10.4, it might break 11-previews. Adoption would be hard because they API changes when you decide to switch over :)
So yes it does not break the existing one, but to use new features (conventions and maybe list and object filters) , a switch is necessary

I like the proposed approach. However with regards to the OR, what I don't like is that in current implementation both solution 1 and solution 2 (described above) are actually allowed by the syntax. So you get no errors if you do it like in the solution 2 but it will just have no effect. Do you guys plan to change this in scope of the new API design?

@vitmsrk
Do you mean:

where : {
      foo: { 
        or: {lt: 2 or: {gt: 3 }}
      }
}

@vitmsrk
Do you mean:

where : {
      foo: { 
        or: {lt: 2 or: {gt: 3 }}
      }
}

@PascalSenn Yes. Maybe this also refers to AND. So it would be nice (if possible) to restrict such possibilities on syntax level. Although I understand this is easier to say than to do. It looks like in this case the OR/AND fields should be of different type than their parent.

And also, what does this mean:

where: {
  foo: { lt: 2 },
  or: [
    { foo: { lt: 1 } },
    { foo: { gt: 5 } }
  ]
}

[Edit]
Might it make sense to always have where be an array? I mean, the following isn't much worse is it?

where: [{ foo: { lt: 2 } }]

That way you don't need and/or "pseudo"-properties, and there are no ambiguities.

@Alxandr How do you distinguish AND and OR then?

where: [
  // either
  { foo: 5, bar: true },
  // or
  { foo: { gt: 10 } }
]

(in linq)

.Where(x => (x.Foo == 5 && x.Bar == true) || (x.Foo > 10)))

(the 5 case might need to be different, cause it can't be either a number or an object, but you should get the point of or vs and).

@Alxandr things get increasingly complicated with deeper filters
This:

where: {
  foo: { lt: 2 },
  or: [
    { foo: { lt: 1 } },
    { foo: { gt: 5 } }
  ]
}

would be .Where(x => (x.Foo < 2) && (x.Foo < 1 || x.Foo > 5 ))

The good thing about this approach is that it's expressive and the API does not change when we go deep

API does not change when we go deep

Not sure what you mean here.

I would honestly expect the snipped above to result in an error, cause when using an "or" key, I would expect it to be completely exclusive. To me, it seems like a footgun.

Also, the point about making where an array is that you can't do or at any level of nesting. Only at the top. This makes things more verbose, sure, but it should also make things easier I would imagine.

I'm working on a similar filtering/conditional syntax for a thing and been wondering if it's worth doing the actual filter statement structurally rather than just an expression string

JSON has the benefit of being enforced by the parser, but it quickly becomes verbose and unwieldy to write complex statements. Also, do you really want a perhaps quickly evolving expression standard be enforced by the JSON Validator?

Couldn't you do something like JSONPath/XPath expressions, perhaps with extended syntax?

where: "(x [Bar = true and Foo = 5] or x [Foo > 10]) or x [someFunction()]"

Downside: You have to make a separate expression parser for the filtering expressions,

Upside: it's much easier to author. I've seen standards, like XACML, where they go the route of having the expressions as actual XML/JSON tree's... and it's always seems to become a nightmare. Especially since you want paranthesis groups.

If you use something like JSONPath people can reuse their knowledge rather than having to learn something new

"If all you have is a hammer, everything looks like a nail"

:)

@mhomde the issue with that is, that no GraphQL tooling wouldn't help you. The syntax would be parsable by our server and maybe our tooling, but any other tooling would just see it as a string. Also, it is not JSON :) there are distinct differences between JSON and GraphQL.

So, for us, we will use GraphQL and won't introduce any new syntax.

@mhomde the issue with that is, that no GraphQL tooling wouldn't help you. The syntax would be parsable by our server and maybe our tooling, but any other tooling would just see it as a string. Also, it is not JSON :) there are distinct differences between JSON and GraphQL.

So, for us, we will use GraphQL and won't introduce any new syntax.

Ah, that's a fair point :) Guess one could do some kind of expression tool anyway that converted between the two if one really wanted to

I think we can adopt the new approach which Hasura and postrGraphile has. I would go with number 3 as the possible next step

Kudos to you guys for taking the time to consider more advanced filtering scenarios out of the box.
I think the approach of introducing this a separate package would enable people to transition more easily and in an opt in fashion.
Now regarding the syntax, I propose we put this to a vote.
There are many alternatives to consider each with their pros and cons.
If anything we could consider prefixing operations like mongo does so it would be clear to the person writing the query what is what. ex.

{ 
  $and: [
        { $gte: [ count, 10 ] },
        { $lte: [ total, 100 ] }
      ]
 }

@fiakkasa that would not be valid GraphQL. http://spec.graphql.org/draft/#sec-Names
But with the filter-conventions, you can configure your name. You could, for instance, use an underscore as prefix _ which would be Hasura style. We will add two or three preconfigured conventions.

@fiakkasa that would not be valid GraphQL. http://spec.graphql.org/draft/#sec-Names
But with the filter-conventions, you can configure your name. You could, for instance, use an underscore as prefix _ which would be Hasura style. We will add two or three preconfigured conventions.

@michaelstaib prefixing operations with an underscore sounds like a great idea, plus it would make it easier for people coming from other GraphQL frameworks to use HotChocolate.
Now about the multiple preconfigured prefixes, we might be better off doing something similar to angular, where they offer a default set of tags, '{{' '}}', ex. {{ 'Interpolated Content' }}, and then it's up to the developer to configure whatever they want ex. StartTag 'Interpolated Content' EndTag (https://angular.io/api/core/Component#interpolation).

Wdyt?

Using some sort of convention like @fiakkasa is suggesting means we could remove the need to create any preconfigured conventions other than the one we planned to use. If it was as simple as the following adding a _ or $ before the field would be trivial.

ReadOnlySpan<char> FieldPrefix {}
ReadOnlySpan<char> FieldSuffix {}

You could combine the values to create the output field name. You would need a reverse lookup to get the real field name at projection time and you'd have to deal with computed properties or whatever they are called.

That is a neat idea and super flexible.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mortzi picture mortzi  ·  4Comments

benmccallum picture benmccallum  ·  3Comments

IKolosynskyi picture IKolosynskyi  ·  3Comments

jbray1982 picture jbray1982  ·  5Comments

RohrerF picture RohrerF  ·  3Comments