Flask-socketio: JSON Data is serialized incorrectly.

Created on 14 Jan 2016  Â·  15Comments  Â·  Source: miguelgrinberg/Flask-SocketIO

Here's a fun one.

Basically, the Python JSON library generates invalid JSON _by default_, by serializing NaN, infinity and -infinity as NaN, infinity and -infinity. JSON does not allow any of these symbols (see here).

The Socket.IO JSON deserialization appears to silently ignore any messages that fail to deserialize, such as messages containing any of the above (see the bug report for that).

Basically, wherever json.dumps(x) is being called in the codebase needs to be replaced with json.dumps(x, allow_nan=False).

invalid

Most helpful comment

I know that. The point is that _by default_, it's broken.

I shouldn't have to write a wrapper around the built-in json just to make it work with the remote client it's designed to interoperate with.

All 15 comments

You can pass in an alternative json module in your Socket.IO server options.

https://flask-socketio.readthedocs.org/en/latest/#flask_socketio.SocketIO

json – An alternative json module to use for encoding and decoding packets. Custom json modules must have dumps and loads functions that are compatible with the standard library versions.

I know that. The point is that _by default_, it's broken.

I shouldn't have to write a wrapper around the built-in json just to make it work with the remote client it's designed to interoperate with.

Yes, you should because Mongo uses BSON and extended JSON, not JSON.

Sorry, I confused this with a different issue... But still... Overriding the default JSON encoder, by default, probably isn't considered best practice. Explicit is always better than implicit, IMO.

I'm not saying override the JSON encoder by default. I'm saying just pass allow_nan=False by default.

As it is, it fails entirely to work with socket.io in any case where you have a nan or infinity anywhere in your message data structure. The resulting JSON data just gets swallowed silently by socket.io.

@fake-name if you pass allow_nan=False the problem isn't solved, you will just get a ValueError exception, so basically you are moving the error from one place to another. The real problem is that your application is trying to send a JSON payload with values that aren't supported by JSON.

A better solution would be for you to walk your data and replace any unsupported values with None, but this is application specific. It can be made into a custom JSON exporter.

Alternatively, you can send the data as binary, then decode it with your custom code in the client.

@fake-name That's not what I get here:

>>> a = {'a': float('nan')}
>>> a
{'a': nan}
>>> json.dumps(a)
'{"a": NaN}'
>>> json.dumps(a, allow_nan=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/miguelgrinberg/.pyenv/versions/3.4.3/lib/python3.4/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
  File "/Users/miguelgrinberg/.pyenv/versions/3.4.3/lib/python3.4/json/encoder.py", line 192, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/miguelgrinberg/.pyenv/versions/3.4.3/lib/python3.4/json/encoder.py", line 250, in iterencode
    return _iterencode(o, 0)
ValueError: Out of range float values are not JSON compliant

I'm not sure why I thought it was coercing values, I made a bunch of changes at once, and the problem went away. I _thought_ it was because I patched the library, but the data-source I'm working with isn't deterministic, so It's possible it just didn't emit NaNs for my testing after I made the changes (hurrah for custom hardware!).

I later switched to using msgpack for data serialization anyways, for throughput reasons.


Anyways, my point was, as it is, the invalid data gets swallowed (silently!) by the socket.io js library (unless you explicitly attach an error handler).

It'd be much nicer to have it fail noisily somewhere the error is actually easy to track down then having silent json deserialization issues in the client library.

Personally, I'm of the opinion that the default value of allow_nan in the python library is a bug in and of itself (It's called "json", but it doesn't produce json. Wat?), but there's too much code out there to change it at this point.

The use of NaN and infinity in JSON is common, even though they are illegal, because Javascript does support those constants.

I can see it from both sides, but regardless of my position, you just said it yourself, if anything, this is a bug in the standard library, not this project.

NaN and infinity are _explicitly_ never used in JSON. The specification disallows them:

Numeric values that cannot be represented as sequences of digits (such as Infinity and NaN) are not permitted.

They're broadly used in things that kind of look like JSON, but if they're present, it's not JSON anymore.

Anyways, my point here is, that from the point of interacting with socket.io (which is the explicit purpose of this library, I believe), allowing them to be serialized is something of a disservice, as it leads to somewhat annoying to debug message loss.

That's what I meant. They are usually included in JSON payloads, even though the spec says those are illegal.

I feel you are asking this package to compensate for what can be considered a deficiency in the json library. Following that argument, any packages that use json should be asked to add this option. I agree that sometimes it is better to fail loudly than silently, but as a general rule, I don't think it is a good idea for this project to change the behavior of a dependency.

Obviously whoever designed the json package thought there would be a good use case to have allow_nan=True, so I don't think I should override that decision. In fact, there could be other Socket.IO clients that accept Nan without problem, certainly the Python client will.

I feel you are asking this package to compensate for what can be considered a deficiency in the json library.

I can't disagree with framing it that way, as in my opinion the default behaviour of the json library _is_ broken. Unfortunately, the python devs aren't interested in fixing the underlying issue, so it's basically beholden to either the libraries or end-programmers to work around the bug.

From what I can tell, it isn't so much as the person who wrote the library decided it was a good idea, but rather that it did that originally because the person didn't completely follow the json spec, and at this point too much junk is dependent on the incorrect behaviour to fix it.

Wow!
As a newcomer to Python, I cannot believe that anyone would even bother acquiescing to this bad choice by the Flask devs!
As I understand it, Flask is used as a server to serve to clients, maybe /even/ remote clients who might operate on, y'know, global standards, rather than bending their code to fit somone's unilateral decision that the JSON spec is not wide enough.

It's not possible to abstain on the question of sticking to a standard or not. Flask-SocketIO is actively supporting this choice, which breaks frontends. (As an example, Firefox recognises Nan is JSON as a Syntax Error, so if it is "usually included in JSON payloads" then whoever is consuming those payloads has not managed to touch Firefox in the development process, which makes me suspect "usually" or even "common" are overstatements).

Flask devs should be pushed into compatibility with the JSON standard by default. The alternative is to try to push the entire rest of the web into allowing a de facto extension to the JSON standard, which I somehow don't think will be successful. (And would just create headaches for the world's devs in the future if it were. I cannot believe that developers of such a major library can be so irresponsible. If this were Javascript, Flask devs would be (rightly) flamed and a new library would already be taking Flask's market share.

As for Flask-SocketIO, I guess you can ignore the problem if you want, but the more libraries acquiesce to this nonsense, the more people are angered and take an entrenched position before the war actually starts. Nip it the bud, I beg you!

@morkeltry Your rant doesn't belong here. It literally has nothing to do with flask-socketio. If you don't like the default json module you can change it as I mentioned above. It would be weird, unexpected, and unpythonic to override the defaults just for this library.

You might have luck here: https://github.com/pallets/flask/pull/2832

Though the decision there too is left to the end user to implement as correct solution.

Was this page helpful?
0 / 5 - 0 ratings