Graphene: Protection against malicious queries

Created on 21 Feb 2019 · 22Comments · Source: graphql-python/graphene

Hey. I was looking for a lot of information on how to protect against malicious requests, and as a result I found several common approaches:

query cost or resource limitations (https://developer.github.com/v4/guides/resource-limitations/)
limiting query depth (https://sangria-graphql.org/learn/#limiting-query-depth)
query whitelisting
hide introspection for production mode (this is not common solution, but sounds good)

All this approaches we can implement by meddleware and custom backend. But it will be cool if this solution is out of the box.

Also, graphene don't have information about security into the documentation.

ps: I can try to help, but if u have no reason why this is a bad idea.

question 🙋 help wanted

Source

Arfey

👍19

Most helpful comment

I use middleware for that right now

class HideIntrospectMiddleware:
    """
    This middleware should use for production mode. This class hide the
    introspection.
    """
    def resolve(self, next, root, info, **args):
        if info.field_name == '__schema':
            return None
        return next(root, info, **args)

Arfey on 17 Apr 2019

👍9

All 22 comments

I have recently been looking into exactly these same things. It would be great to have something built in available, and if not, at least some examples of how someone might best implement these with graphene.

jmichalicek on 7 Mar 2019

@Arfey this is an area where there aren't any official answers in Graphene at the moment but I'll give you my opinion on the approaches that you've listed:

1 and 2: query cost or resource limitations + limiting query depth

In another issue I've written up some sample code (not thoroughly tested) that will implement a basic max query depth check: https://github.com/graphql-python/graphene/issues/772#issuecomment-400094405 Something similar could be used for query cost calculations but it's unclear to mean how you would determine the cost of particular fields.

I think there is opportunity for experimentation in user space for this.

3. query whitelisting

The backend functionality is also the place to implement any kind of persisted/query whitelisting. I actually don't think this feature is within the scope of the Graphene project though because it's going to be tightly coupled to the rest of your application. You would need to define where the persisted queries are stored, how the front end part of your app adds new queries, how your server interprets static queries etc. Again this is a place where experimentation can happen in userland and Graphene already exposes the right hooks through backends to implement it.

4. hide introspection for production mode

This sounds like a reasonable suggestion and I think it could be implemented. Any thoughts @graphql-python/core @graphql-python/governors ?

jkimbo on 16 Mar 2019

👍4

@jkimbo @Arfey Solution 4 sounds good. Easy to implement and will address this vulnerability in a timely manner. I don't see a reason why we would need introspection in production. @Arfey mind submitting a PR with this approach?

mvanlonden on 19 Mar 2019

yes, i will make pull request some time later

Arfey on 19 Mar 2019

Just my completely unasked for $.02:

About 3, I think it would help if the types that are automatically generated (from SQLAlchemy or Django ORM models) had their fields explicitly included instead of explicitly excluded. Much like how Django's own ModeField works.

Instead of

class MyType(SQLAlchemyObjectType):
    class Meta:
        model = MyModel
        exclude_fields = ('scary_field',)

this:

class MyType(SQLAlchemyObjectType):
    class Meta:
        model = MyModel
        include_fields = ('safe_field',)

while making the absence of both include_fields or exclude_fields an error.

etandel on 22 Mar 2019

👍7

Hiding introspection won't help much if the attacker has access to a copy of a client application, as she would be able to guess enough of the schema to write custom queries - just one cycle in the graph is enough for an attack.
But it would be useful for hiding upcoming features from competitors :-)

ktosiek on 17 Apr 2019

I use middleware for that right now

class HideIntrospectMiddleware:
    """
    This middleware should use for production mode. This class hide the
    introspection.
    """
    def resolve(self, next, root, info, **args):
        if info.field_name == '__schema':
            return None
        return next(root, info, **args)

Arfey on 17 Apr 2019

👍9

@Arfey this is an area where there aren't any official answers in Graphene at the moment but I'll give you my opinion on the approaches that you've listed:

1 and 2: query cost or resource limitations + limiting query depth

In another issue I've written up some sample code (not thoroughly tested) that will implement a basic max query depth check: #772 (comment) Something similar could be used for query cost calculations but it's unclear to mean how you would determine the cost of particular fields.

I think there is opportunity for experimentation in user space for this.

3. query whitelisting

The backend functionality is also the place to implement any kind of persisted/query whitelisting. I actually don't think this feature is within the scope of the Graphene project though because it's going to be tightly coupled to the rest of your application. You would need to define where the persisted queries are stored, how the front end part of your app adds new queries, how your server interprets static queries etc. Again this is a place where experimentation can happen in userland and Graphene already exposes the right hooks through backends to implement it.

4. hide introspection for production mode

This sounds like a reasonable suggestion and I think it could be implemented. Any thoughts @graphql-python/core @graphql-python/governors ?

Hiya, thanks for the sample code but how can I plug this into graphene please? Is it a setting I can add, something like:

GRAPHENE = {
    "SCHEMA": "path.to.my.schema",
    "BACKENDS": ["path.to.my.backends.DepthAnalysisBackend"],
}

I've also tried adding it to my GraphQL view, which gives me an error when I run a query through graphiql (using the core backend)

graphql_views.GraphQLView.as_view(graphiql=True, backend=GraphQLCoreBackend)

the error is:

document_from_string() missing 1 required positional argument: 'document_string'

Update - sorted it out, needed to provide an instance not the class. I'll leave this here in case it helps someone.

so to add the backend, I've done:
`from .graphql import backends .... urlpatterns = [ path( "v1/graphql", graphql_views.GraphQLView.as_view( graphiql=True, backend=backends.DepthAnalysisBackend() ) ) ]

sandwichsudo on 4 Jun 2019

👍3

My $.02 for how I handle this:

I have different types in the Schema with different related-node-information based on where the type is being used. This artificially sets a query depth which is dependent on the query.

E.g. If authors have one more books:

Author has field books of type AuthorsBook
AuthorsBook _doesn't_ have field author, as that would create a cycle

This works for small to medium sized applications and doesn't take much thought/code to get working.

thejcannon on 5 Jun 2019

Thanks @thejcannon - agreed, it makes a lot of sense just design the api not to have cycles in it. Still nice to know no one is able to run queries that join many distinct tables on my server.

@jkimbo - can I check something on the code sample you provided please:

...
# We are only interested in queries
            if definition.operation != 'query':
                continue
...

This suggests we're only interested in queries, but since a mutation could return an object (like an authentication mutation might return a user object) does it not make sense to run this on both? Wondering if there is some subtlety I'm missing.
Thanks

sandwichsudo on 6 Jun 2019

@sandwichsudo that is a good point, I can't remember why I added that line but I think you could probably just omit it and it would still work.

jkimbo on 6 Jun 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 5 Aug 2019

I am using this validation rule to disable schema introspection:

class NoIntrospection(ValidationRule):
    def enter_Field(self, node, key, parent, path, ancestors):
        field_name = node.name.value
        if field_name == "__schema" or field_name == "__type":
            self.context.report_error(
                GraphQLError(u"GraphQL introspection is not allowed", [node])
            )

sboisson on 16 Aug 2019

@sboisson I found this snippet:

export const introspectionTypes = Object.freeze([
  __Schema,
  __Directive,
  __DirectiveLocation,
  __Type,
  __Field,
  __InputValue,
  __EnumValue,
  __TypeKind,
]);

Seems like there is more introspection keywords - I think you should be checking for those too.
I think it might be easier to just check for startsWith("__").

On a more general note: More security tools for this would be nice.
Either included or as attachable middleware.

Adding to the initial list:

Input Validation (e.g. is the string I'm getting actually a string? Also, sanitise it.).

flowirtz on 6 Sep 2019

@FWirtz Nice find!
But checking just for startsWith("__") might be too much.
It seems that__typename is used a lot, by Apollo client for example.

sboisson on 10 Sep 2019

👍2

@sboisson yeah I found that catch afterwards as well haha.
I think it would be best to rather check the query name (i.e. root node) whether it starts with __ instead of the fields. Afaik __typename is always just a nested field. Not entirely sure how to do this though.

flowirtz on 13 Sep 2019

Any update on this matter? I found this repo of someone trying to work on this, for now it implements the least interesting part of it (limiting query depth). I'm really new to this so I don't know if this could help.

By the way, is it possible to add a middleware directly to a schema instead of having it plugged to schema.execute, I ask this because in some cases (for example when using Graphene with Starlette), we don't have access to schema.execute as it is performed internally. I opened an issue on that question on Starlette side, not sure what's the best solution here.

cglacet on 3 Dec 2019

Can anyone enlighten me why schema introspection is a flaw?

I use graphene-django, and I only expose non-private fields using include_fields. For us, our React frontend dynamically generates Forms using schema introspection.

melvinkcx on 23 Dec 2019

@melvinkcx it's not strictly a flaw, it's a risk:

for security, by exposing fields not used by your client applications, fields that might be less tested or even might only be exposed by accident;
for business, by exposing new fields (often with some documentation) before the feature that uses them is released (or even finished).

ktosiek on 23 Dec 2019

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 22 Mar 2020

Not stale