Keystone: Add query limits for array results

Created on 7 Aug 2019  路  14Comments  路  Source: keystonejs/keystone

When doing a query such as:

query {
  allPosts(first: 10){
    id
  }
}

All is ok 馃憤

But what happens when someone does a query like:

query {
  allPosts(first: 10000000){
    id
  }
}

We need to have a way to set limits on the total results, perhaps like:

keystone.createList('Post', {
  fields: {},
  limits: {
    maxResults: 100
  }
});
medium small 1 graphql schema

All 14 comments

These limits should additionally be easy to bypass (e.g, by an admin).

Are we complicating this? If someone requests 10000000 why not let them? If the server hangs...they might try less than 10000000?

@MadeByMike I think maybe the idea is to make it easier to enable limits where you might want them, rather than defacto allow it.
Could this be done as a hook maybe?

I guess if you've got a public API you might need this? Is that the point?

For public api, how do you control them at runtime?
There may be paid users who would have higher limit

why not let them?

To stop people launching DoS attacks against a KS5 instance, and to also catch the case where the developer accidentally does a query such as:

query {
  allPosts {
    id
  }
}

(ie; with no first arg).

These limits should additionally be by passable
how do you control them at runtime?

Good points! We could accept a function like we already do for access control:

keystone.createList('Post', {
  fields: {},
  limits: {
    maxResults: ({ authentication: { item } }) => {
      if (item.isAdmin) {
        return null;
      }
      return 100;
    }
  }
});

Return cursor/pagination object if over limit?

This is really a discussion about limiting query complexity to protect the GraphQL API from dangerously written queries. The common case of this is a public API being targeted my attackers, armed with intentionally malicious queries but there's always the risk of legitimate, well intentioned users (or systems) accidentally DoS'ing your API. Rate limits are important regardless of an API's audience.

Unfortunately, the power of GraphQL makes rate limits difficult. The load required to serve individual requests can vary by many orders of magnitude, defeating traditional HTTP based rate limiting. We need a way to restrict the complexity of queries within a single request. Our very own @mxstbr's has a great intro to this problem with some potential solutions.

For Keystone, we probably want to limit this at _the number of items being returned_. Considering the cost of individual fields is probably excessive, out of the box. The basis of this will need to be some kind of limit for multi-node fields (that is, any field that returns an array of items, including the top-level all${listKey} fields).

If we have a limit, either specified by the caller or defaulted by the app, for each multi-node field, Keystone can can calculate an _upper bound_ on the number of items that could be retrieved, _before the query is run_ and reject the query if it exceeds some limit.

This could, for example, take the form of two new bits of config. Something like...

  • defaultMultiNodeLimit - the default limit to use for multi-node fields where no limit (first value) is specified by the caller
  • maxNodesPerRequest - The limit above which request is rejected with a message instructing the developer to add limits (or tighten them if they've already been specified)

These could be configured on the keystone instance or per list. We could even make maxNodesPerRequest dynamic based on session data or something, if that was deemed important.

Let's say we ship Keystone with defaults of 100 and 15000 for these options. Those values would let this query run as is:

# Could return up to 10,100 items; runs fine
{
   allStores { # implicit limit of 100
      id name
      employees { # implicit limit of 100
         id name startDate
      }
   }
}

The multi-node fields both get an implicit limits of 100, so the upper limit of nodes returned is 10100, under the default max of 15,000. Adding another multi-node field without further restricting the query would cause it to fail:

# Could return up to 20,100 items; returns an (helpful!) error
{
   allStores { # implicit limit of 100
      id name
      employees { # implicit limit of 100
         id name startDate
      }
      products { # implicit limit of 100
         id label code price
      }
   }
}

This query, as written, could potentially return 20,100 nodes (100 + ((100 + 100) * 100)) so would be rejected with a message explaining how the limits are implemented.

The developer could then either increase the limits configured for Keystone or, preferably, pay some more attention to what they're doing and update the query to be more specific:

Hmmm.. well there are only 6 stores in the DB at the moment and this interface I'm building will only really fit 24 on the screen, max. Most stores have ~10 employees but, if they ever _did_ have 100, would we want to list them all? Probably not. Maybe we should only show the 10 most experienced employees here...

The re-written query is within the limits and actually _better_; we've fixed some bugs that would have broken the interface later, as the dataset grew:

# returns a max of 2,664 nodes
{
   allStores (first: 24) {
      id name
      employees (first: 10, orderBy: "startDate DESC") {
         id name startDate
      }
      products { # implicit limit of 100
         id label code price
      }
   }
}

As in this example, encouraging API users to add limits to their queries is usually a good thing. It forces you to think about what you're actually trying to do and how your app will behave as data grows. I'd conjecture that there's _never_ a time you're talking to an API and both these conditions are true:

  • You genuinely don't know how many items exist
  • You want _all of them_

Usually, you either have an idea of the scale and can add a limit (ie. "oh there's only a few of these, I'll use a limit of 50 to be sure I get them all") or you're dealing with a potentially large dataset, in which case you'll always want to paginate (ie. "there could be _millions_ of these, I'll write my code to nibble 1000 at a time").


Important points:

  • Errors thrown due to these limits would almost always surface _in development_, when the queries themselves are being written. They should actually reduce bugs in the wild.
  • Keystone itself should (probably?) never be hitting the DB with completely unbounded queries, regardless of what is configured.
  • This isn't a complete solution. As discussed above, we haven't accounted for load differences between fields (which can be significant). We also haven't actually applied any rate limiting per se. What we've done is push the rate limiting problem back to the HTTP layer where there are powerful existing tools.

I'm implementing part of this. It looks like this:

keystone.createList("BigThing", {
  fields: {
    content: { type: Text }
  },
  queryLimits: {
    maximumResultCount: 10,  // Adds a first: 10 (or lower) to every query
  }
});

We had a bug that would sometimes cause a where restriction to be null, which caused KS to try to load all items into memory and crash. maximumResultCount is per list, so limitations on nesting and total count of queries is also needed to guarantee crashes like that can't happen, but it fixes the lowest-hanging fruit.

The API is designed to easily support other limits in future (like whitelisting the columns that where conditions can use), but I'm sticking to the one feature for now.

Allowing a default list maximumResultCount to be set in KS itself sounds like a good idea (and an API for that would be needed for restricting nesting, etc). But I'm leaving that out of scope for now, too.

https://github.com/keystonejs/keystone-5/pull/1571

Still not implemented:

  • Support for limits based on functions
  • Request-wide limits such as nesting depth, number of queries and total size
  • Whitelisting columns for where clauses
  • Keystone-wide limit defaults for lists

@sarneaud Awesome work on this so far! @jesstelford What do you think about the remaining tasks? Do we intend to implement all of these and should we create a new issue?

I'm just about to submit a PR for the second item on my list (https://github.com/keystonejs/keystone-5/issues/1469#issuecomment-527313366).

Update: most of the features in the report have been added. The two that are still missing are

  • Function-based limits
  • Whitelisting of fields used in where clauses, etc.

The query limits haven't been integrated with the (relatively) new multi-schema system. Maybe we don't need the function-based limits if we have schema-aware limits.

I'm not doing any work on this at the moment.

I'm be happy to close this issue as resolved and open a new one if someone wants these features

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bpavot picture bpavot  路  11Comments

justinmoon picture justinmoon  路  13Comments

thekevinbrown picture thekevinbrown  路  31Comments

gautamsi picture gautamsi  路  14Comments

molomby picture molomby  路  12Comments