Flask: `request.json` deprecation discussion

Created on 6 Apr 2015  路  43Comments  路  Source: pallets/flask

Most helpful comment

It seems like you'd want to remove JSON handling altogether, not just deprecate .json, since both the property and the method drain input and are open to malicious input by default. Since you say handling this is out of scope though, there's two options: remove it completely, or leave it as is and document the use and security implications. Only deprecating the property doesn't seem effective, just inconsistent.

All 43 comments

cc @adambyrtek @ThiefMaster

I'd keep request.json as a property. Not having to use json_data = request.get_json() in every view function that expects JSON is a plus.

However, maybe it's worth changing its behavior to raise BadRequest when using it with a non-json payload?

Keep in mind that get_data etc still have an internal cache you can use (and AFAIK it's enabled by default).

However, maybe it's worth changing its behavior to raise BadRequest when using it with a non-json payload?

It already does that.

Nope. Currently it only fails if the headers indicate JSON but the payload isn't JSON.

    if not (force or self.is_json):
        return None

https://github.com/mitsuhiko/flask/blob/master/flask/wrappers.py#L142

I see, I thought about non-JSON data under a JSON mimetype.

I'll try to summarise the arguments for keeping the json property:

  • Keeps backward compatibility. Otherwise everybody would have to replace json with get_json() in existing code for no obvious benefit, or simply ignore warnings.
  • More concise and readable for the common use cases (compare request.json['foo'] with request.get_json()['foo']).
  • Consistent with request.form and request.data (both have properties in addition to getters).
  • The default behaviour for both json and get_json() is to cache the data, so there is no real reason to encourage calling it only once.
  • Advanced users who need more control can still use get_data() and get_json() directly.

There could be a way to make the errors more helpful when mixing both json and data in a single view function (which is the only rationale for this deprecation that I've heard so far).

@untitaker Would be good to have @mitsuhiko take a look at this one as well. Deprecating APIs just before 1.0 should be carefully considered.

Could we have a discussion about this issue? I should have some time for contributing if there's a consensus on this.

+1 to keep request.json property.

I am not a user of this myself, instead assigning request.get_json() to a local variable. However, it does seem very likely to force many app developers to do lot of refactoring work for a benefit very few will see or appreciate.

EDIT: After reading the below comments, my curiosity got me looking into this a bunch more, and now it makes sense to me why some folks think it should get deprecated.

I'm really not very happy with request.json as a property. It's causes errors and bugs, and it's too easy to trigger by accident. However I'm happy keeping it as a forever deprecation with a very noisy warning that and a flag to completely disable it.

It's causes errors and bugs, and it's too easy to trigger by accident.

Afraid I'm not aware of these bad side effects... I'm sure they exist, but I haven't stumbled across them in my own projects.

Mind clarifying or linking to a few examples?

Accessing .json drains the input stream which is problematic if people want to use specific processing for that data. While the same is true for .files and .form they have their own internal support systems for hooking and are so common that it does not really matter. .json however is highly problematic because untrusted incoming JSON data is questionable due to its nested nature.

Honestly, larger projects should never really accept JSON data verbatim like this but with a validating parser which however is largely out of the scope for Flask. At Fireteam we used to parse JSON data with a stream processor and for that it's very important that accidentally touching a property does not entirely bypass this.

That makes sense, thanks!

@mitsuhiko Could you give an example of nested JSON that would be questionable? It's not obvious how using unvalidated JSON is any more dangerous than using unvalidated form data.

If a project needs to do something different, they can already use a different deserializer, a different request object, or just make sure not to access the property, along with other properties with this behavior. I think it is preferable to keep the convenience and consistency of .json and call out the fact that some properties drain the input.

Basic example: json.loads('[' * 800 + ']' * 800)

It seems like you'd want to remove JSON handling altogether, not just deprecate .json, since both the property and the method drain input and are open to malicious input by default. Since you say handling this is out of scope though, there's two options: remove it completely, or leave it as is and document the use and security implications. Only deprecating the property doesn't seem effective, just inconsistent.

If someone post '[' * 8000 + ']' * 8000, what can we do?

if use `request.json`:
    it will cost some CPU time, and raise a RuntimeError
    the client will receive a 500 response
if use `request.get_json()`:
    the result is the same as use request.json
    what can prevent me from touching the bad json data?

So, Easier to Ask Forgiveness than Permission

try:
    data = request.json
except:
    abort(400, 'bad json data')

Or, let flask take care of the bad json data, catch RuntimeError and abort(400), and add wraning on documents.

first of all, bare except is bad. Then, the whole point of not using a "normal" json parser would be that you do not end up using a crapton of memory in the first place. Sending many requests with malicious json payload may easily overload your server, even if you abort when parsing eventually fails

Is there any security json parser?

ijson seems like an option

So use a security json parser is the key to solve the problem, yes or no?

As far as I understand, reading from the response body after accessing request.json is not a common pitfall, but if it does happen, it's ugly. So Flask needs to prevent this situation in some way, however, many people argue it's not worth the API change. How about preventing direct access to the stream (or to formdata) after JSON has been accessed? E.g.:

request.data  # works

_ = request2.json
request2.data  # raises exception

With this, the pitfall is avoided. The API is still "ugly" (as argued by @mitsuhiko) because in principle properties are not supposed to change their value/behavior like that.

I do not understand the memory exhaustion concerns at all.

I think there is a very simple solution for this: while the deprecation is still ongoing nothing changes anyways other than a warning. Afterwards people could inject a custom subclass that reinstates it if they want it.

@mitsuhiko That's just assuming people actually care about the aesthetic problem of properties exhausting the stream. Apparently that's not the case.

Sometimes it's an issue if there are tools going over dir(obj) and then calling getattr() on all attributes. I haven't noticed anyone having issues with this on the request object though.

It's imposible to avoid bad json data unless use a stream parser and tell the struct we needed to the parser.
For ijson, it yield (keys,event,value), once we get the keys we need, it stop.The keys is like user.name if we need the name of user object.
And I am writing https://github.com/guyskk/validater which can validate json data by schema, the schema is like {"user":{"name":"str&required"}}. validater is based on ijson, it will stop once the json struct diff with schema, but this feature is not released, you should install it from master source code.
Because the parser need to know the struct of json, it can't compat with standard json lib.

I don't think being able to use the stdlib parser is really a requirement - the simplicity of just getting a dict/list/whatever on the other hand does make a useful default (if you want to use e.g. ijson you'd simply not use request.get_json / request.json) and shouldn't be lost IMO. Especially for small applications the DoS risk of using a normal JSON parser is probably acceptable.

BTW: If I were you I'd use JSON-Schema for that validator (with o, not e ;)) instead of something self-made (str&required looks really awkward/awful in my opinion).

@ThiefMaster Without

My reason of use self-made syntax is to keep the same struct for schema and json data锛宨t is easier to write and read.

@ThiefMaster sorry, without a stream parser, it'd imposible to avoid the risk. if the risk is acceptable, its no need to change request.json.

I'll be working on this for the PyCon sprints. I'm going to verify that response.json gives a deprecation warning.

Yes it does have a Deprecation Warning: https://github.com/pallets/flask/blob/master/flask/wrappers.py#L106 And it has only been around since: 0.11. See changeset 61097c47 from Oct 15, 2014.

That's not the point of this issue @dorianpula, that code was already there before this issue got opened IIRC.

Since some people seem to disagree with this change, we opened this issue for discussion.

Sure, I was talking with @davidism and he said that he is with @mitsuhiko to remove the property. The question was whether it can be done for 1.0. And I was verifying when the deprecation was added. But since the deprecation warning was just added, it doesn't seem we can just remove it right now, unless there will be a release or two before 1.0.

To clarify, I'd still rather see json left in, since form and files modify request.data as well and we're not changing those. Since @mitsuhiko seemed set on it, I wanted to verify that we had deprecated it during the current release.

A slightly off-topic comment.

There's always a trade-off between making code more explicit and making it shorter. I thought it would make sense that flask would remain very explicit (since it's known for that, and there's plenty of frameworks on other end of the spectrum). So I was a bit surprised that the very implicit and often confusing behavior of properties that exhaust the stream on access (.form, .files) is not being slowly deprecated. Especially since I don't think it's hard for anyone to add those properties, with the desired behavior, in a Request subclass.

That said, after I read the entire thread, the decision to deprecate .json but not other properties is understandable.

@pkch: In almost all cases people don't care and don't have to care/know about the WSGI input stream. So taking away request.form from them would be counterproductive. Accessing POST data is among the most common things you do in a webapp. The average new flask user should not have to subclass Flask/Request for that.

Unless you deliberately do .get_json(cache=False) (which .json doesn't), .data still contains the raw data, and .form still contains form data if there wasn't actually JSON. I've tried all sorts of combinations of order of access, type of data, parameters, and nothing about the current behavior seems unexpected to me. I'm going with "not deprecated" unless I see a good concrete example of this breaking.

Issues discussed:

  • .json reads from the stream on access, but so does .form
  • .data should also be deprecated, but the .disable_data_descriptor flag exists to control it
  • small JSON payload can cause memory issues, unlike .form
  • multipart form reads into temp file when memory limit is exceeded
  • async?
  • HTTP 100 response?
  • adding a max_json_depth config, but this would require different parsing than what Flask currently supports

All the points raised seem tangential to .json existing. That is, they affect .get_json() just as much. Anecdotally, I can't remember coming across any Stack Overflow question about .json draining input when accessed, and as I pointed out above all the common properties keep working because they're cached.

At my day job, we use Flask for many microservices, but not many of our devs know Flask internals. I see in their code that they intuitively reach for request.json all the time, and they would be very confused/frustrated if it was deprecated/removed. Given @davidism 's points above, I don't see anything compelling to merit the headache and usability decrease of deprecating it.

Related to max_json_depth, @mitsuhiko mentioned that @getsentry loads JSON in a way that mitigates '[' * 800 + ']' * 800, but I couldn't find it, they're just using simplejson.decode.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xliiv picture xliiv  路  3Comments

greyli picture greyli  路  3Comments

chuanconggao picture chuanconggao  路  4Comments

ghost picture ghost  路  3Comments

dreampuf picture dreampuf  路  3Comments