Hello, I was expecting MongoEngine to help developers to prevent injection vulnerabilities.
class User(Document):
name = StringField()
# assume this is controller for HTTP request
def get_users_by_name(request):
# assume that request.json is {'name': {'$gt': ''}}
q = User.objects(name=request.json['name'])
print(q._query) # this would change original query structure to {'name': {'$gt': ''}}
return list(q)
Expecting MongoEngine to not allow such behavior, because ORM's for SQL usually prevent most injection vectors. Let me know your opinion on it.
You should use (de)serialization library like marshmallow to communicate with the outside world.
Have a look at marshmallow-mongoengine which allow to generate a mashmallow schema from a mongoengine document.
@touilleMan thank for suggestion. Marshmallow seems to be really cool library, though I can make connection to query injection.
Apologies for not a clear question. What I was trying to show, is that it is possible change query structure. User.name is a string, therefore User.objects(name='any_name')._query should take only string values and query would be {'name': 'any_name'}.
However it is possible to pass dict as name: User.objects(name={'$gt': ''})._query which is implicit and completely changes query structure {'name': {'$gt': ''}}, it no longer searches for exact name matches.
Thank you @touilleMan, marshmallow data validation might prevent some injection vectors (including the one in example).
I'm weary of the extra maintenance cost of adding validation to the query params, and so I wouldn't want to e.g. restrict querying a StringField by a str value only... Restricting a dict value is also not that straightforward, because it has to be allowed at least for the DictField, MapField, EmbeddedDocumentField, and probably a bunch of other field types.
Given that any user-generated data should be validated anyway and there are good libraries that deal with that specifically, we don't need to add this extra layer to MongoEngine.
Thank you for your answer. Do agree that maintenance cost would increase in MongoEngine, but it would decrease cost of maintenance in all applications that rely on MongoEngine.
As for input validation, it might not prevent injection - because sometimes you need to accept control characters/structures because business requires it. Nevertheless input data must never change query structure. I'm not to familiar with mongodb, but if it's possible to use value escaping, that might sensible solution.
Few resources for ideas:
@aisbaa
Can you provide me with a concrete example of an injection? The http protocol only passes string values. Therefore if a query was written:
nameToFind = http.form['name-search-field']
User.objects(name=nameToFind)
(where http.form is dictionary containing the result of a web form)
If nameToFind contained "John Doe" it would find or not find a user. An injection attack implies that there is a way to use escape characters. Simply placing "{'$gt': ''}" into the web form (and thus nameToFind) does not change the nature of the query. Can you come up with a string that does?
Or, is this strictly when receiving a JSON result (from a REST API for example) and converting that string result into a python object without filtering.
(Just curious: in the example you gave, which library is request coming from?)
Concrete example - would a unit test case do or a web application with vulnerability would be more appropriate?
That is true, HTTP protocol only passes textual data, therefore without data serialization this vector would not be possible. It is not possible to change query structure in your example.
Yes, it is strictly when data is deserialized from string (YAML, JSON, ...) to python object without proper validation. It would be nice to limit inputs to HTTP forms, yet query parameters could come from various sources: database, message queue, other API's, web crawlers, files...
Original example is partly based on flask (request class).
I agree generally with the principle that having a library enforce quality checks lowers maintenance costs of the projects that depend on the library.
In the particular case of enforcing User.objects(name=condition) to ensure that condition is only a string is a problem since mongoengine specifically allows more advanced queries using dictionaries and lists. It is a core function.
Do you have any specific ideas in mind to help prevent injection attacks in this scenario? (Outside of using User.objects(name=str(condition)).) Perhaps something like a "limited strict query" parameter or option? Or something that more tightly integrates with marshmallow for queries?
I still think the scenario this argument is based on is a bit artificial. We're talking about directly passing deserialized yet completely unsanitized user data as query args. That's not ok to begin with. It wouldn't fly if you were saving that data, it probably shouldn't fly if you're using it for queries either.
That said, let's unpack it a bit more... Maybe there's something we can do to help developers in such cases without hindering anybody else. For example:
In [3]: class Doc(Document):
...: text = StringField()
...:
In [4]: Doc.objects.create(text='whatever')
Out[4]: <Doc: Doc object>
In [5]: Doc.objects(text__gte='')
Out[5]: [<Doc: Doc object>]
In [6]: Doc.objects(text={'$gte': ''})
Out[6]: [<Doc: Doc object>]
In [7]: Doc.objects(text=re.compile('^w'))
Out[7]: [<Doc: Doc object>]
As you can see, a query arg for a StringField can come in many types. The first two queries are equivalent, but I think we can agree that the former is preferred in MongoEngine. Is supporting dict query args ever useful for a StringField?
There's plenty of other fields that support {'$gte': ''} as a query arg, for example URLField, EmailField, DateTimeField. At the same time, there are some that already don't support this approach: IntField (raises TypeError: int() argument must be a string or a number, not 'dict'), LongField, DecimalField. This seems like an accident more than desired behavior and we should probably make it consistent either by making it stricter for strings etc. (which is what @aisbaa is suggesting) or making it looser for ints etc. Note I haven't covered any complex fields (ListField, DictField, etc.) here because in their cases I think it's a given that any query arg should be fair game.
One could argue that making it looser gives devs flexibility. Even if we don't directly support some MongoDB operators (and there probably are some in this list that don't work in MongoEngine: https://docs.mongodb.com/manual/reference/operator/query/), people can still take it upon themselves to include it via a dict. Is this good enough of an argument though?
@wojcikstefan thank you for examples. Sanitization is not listed as valid injection prevention in OWASP injection prevention cheat sheet. The reason being that data might be damaged by sanitization.
One of approaches is to use safe API, that clearly separates data from operations. Then it is usually possible to encode data. All of querying methods listed in your example can be made safe, except Doc.objects(text={'$gte': ''}).
What do you think of explicitly marking unsafe API Doc.objects(text__raw={'$gte': ''}).
@JohnAD thank you for pointing out issue with condition. Lets look into this approach User.objects(<condition>), where condition is (name | name__op) = data. In this case it is possible to require data type to match field type defined in Model.
Ah, that's a very good point @aisbaa.
To clearly separate data from operations, we'd have to always rely on the MongoEngine __operators. That removes some flexibility and puts more stress on us as maintainers to keep up with latest MongoDB operators, but maybe it's worth it?
Exercising this idea, on top of relying solely on operators, we'd have to validate that the query data is not a dict. There should, however, be some exceptions to this rule. This is just a dump of my quick thoughts without actually looking into it in depth:
dict that can contain MongoDB operators.BaseField subclass and determine if we can simply disallow those types from being used as query data.DictField should most definitely support dicts. However, in the query dict, we can disallow keys that start with$or contain.` (these chars are not allowed as dict keys to begin with, see https://docs.mongodb.com/manual/reference/limits/#Restrictions-on-Field-Names).What do you think @aisbaa and @JohnAD?
ListField - is it possible to pass operations via list into mongo db?
There's a few array-specific operators in MongoDB: https://docs.mongodb.com/manual/reference/operator/query/#array
@wojcikstefan looks like a plan! Is it possible to escape $ so that it would be interpreted as normal character and not as operator?
As a general idea, I like adding the restriction. Clearly version A:
Doc.objects(somelist__size=3)
is superior to B:
Doc.objects(somelist={'$size': 3})
since A is much safer. And the programmer can always fall back to a __raw__ query under unusual scenarios.
However, what I don't have a feel for is how much B is used in the real world. If it is very common, then removing support for it will fundamentally break things. If so, then we could:
__eq operator. Change docs.How prevalent is passing dictionaries of conditions? @aisbaa and @wojcikstefan
I like approach number 3. It would allow to implementing static code analysis to detect all unsafe calls. Is also backwards compatible with existing code.
I like the 3rd proposal with a slight spin: My guesstimate is that > 90% of key=val queries in the wild are actually used as key__eq=val and I think that's the most intuitive & clean approach which is worth preserving. However, we'd still generate a warning if a dict with a $operator is passed as val. That can be done immediately in the upcoming bugfix version (v0.11.1), because it won't break anything.
Then, in v0.12.0 we can straight up reject such queries (I still want to wait with v1.0.0 for some bigger performance improvements which I consider crucial for this package to be used in production as-is).
Most helpful comment
I agree generally with the principle that having a library enforce quality checks lowers maintenance costs of the projects that depend on the library.
In the particular case of enforcing
User.objects(name=condition)to ensure thatconditionis only a string is a problem since mongoengine specifically allows more advanced queries using dictionaries and lists. It is a core function.Do you have any specific ideas in mind to help prevent injection attacks in this scenario? (Outside of using
User.objects(name=str(condition)).) Perhaps something like a "limited strict query" parameter or option? Or something that more tightly integrates with marshmallow for queries?