Sanic: Update default json library

Created on 21 Jan 2019  路  25Comments  路  Source: sanic-org/sanic

I propose replacing the dependency on ujson with orjson.

sanic using orjson does 2.9x the requests per second on an example benchmark compared to when using ujson.

It also does not have the correctness issues ujson has. Its README has details: https://github.com/ijl/orjson.

I think the implementation details of json() leak by handling types differently (e.g., ujson serializing datetimes to epoch timestamps, supporting subclasses) and exposing kwargs, so switching would be a breaking change.

orjson:

Running 30s test @ http://127.0.0.1:8000/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   122.59us   12.41us 525.00us   90.59%
    Req/Sec     8.13k   260.69     8.55k    76.74%
  486869 requests in 30.10s, 24.23GB read
Requests/sec:  16175.08
Transfer/sec:    824.38MB

ujson:

Running 30s test @ http://127.0.0.1:8000/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   356.78us   40.74us   6.17ms   99.09%
    Req/Sec     2.81k    94.51     3.02k    73.54%
  168044 requests in 30.10s, 8.37GB read
Requests/sec:   5582.88
Transfer/sec:    284.58MB

The output is a 56KiB JSON document of a GitHub activity feed. This was measured using:

PYTHONPATH=$PWD gunicorn \
    --preload --reuse-port --log-level error --bind localhost:8000 \
    --workers 2 --worker-class sanic.worker.GunicornWorker \
    wsgi:app
import os
from json import loads

from sanic import Sanic
from sanic.response import json
from sanic.views import HTTPMethodView

filename = os.path.join(os.path.dirname(__file__), 'github.json')

with open(filename, 'r') as fileh:
    DATA = loads(fileh.read())

app = Sanic(__name__)
app.config.ACCESS_LOG = False

class View(HTTPMethodView):

    def get(self, request):
        return json(DATA)

app.add_route(View.as_view(), '/')

The minimal patch to benchmark:

diff --git a/sanic/response.py b/sanic/response.py
index 7b245a8..536b27c 100644
--- a/sanic/response.py
+++ b/sanic/response.py
@@ -11,7 +11,7 @@ from sanic.helpers import STATUS_CODES, has_message_body, remove_entity_headers


 try:
-    from ujson import dumps as json_dumps
+    from orjson import dumps as json_dumps
 except BaseException:
     from json import dumps

@@ -216,8 +216,11 @@ def json(
     :param headers: Custom Headers.
     :param kwargs: Remaining arguments that are passed to the json encoder.
     """
+    body_bytes = dumps(body, **kwargs)
+    if not isinstance(body_bytes, bytes):
+        body_bytes = body_bytes.encode('utf-8')
     return HTTPResponse(
-        dumps(body, **kwargs),
+        body_bytes=body_bytes,
         headers=headers,
         status=status,
         content_type=content_type,

With ujson fully replaced, the test suite is fine. I'll open a pull request if it's ok to go ahead.

documentation help wanted idea discussion on hold

Most helpful comment

I must admit, I am not familiar with orjson, but the project does look promising.

However, it is still a young project (not that ujson or sanic for that matter are that old either), and I am not sure if it is battle tested out in the wild or not.

I think a better approach is just allowing the developer to pass in their own dumps method. Indeed, the developer already has that option:

import my_favorite_json
from sanic.response import json

async def handler(request):
    return json(..., dumps=my_favorite_json.dumps)

The developer is free to use an alternative. And, indeed, perhaps we can add some documentation on this.

At this time I would be more in favor of a PR that fixes the documentation that would explain how to do this rather than universally making the change for the entire project (especially since there would be a breaking change as pointed out).

All 25 comments

I must admit, I am not familiar with orjson, but the project does look promising.

However, it is still a young project (not that ujson or sanic for that matter are that old either), and I am not sure if it is battle tested out in the wild or not.

I think a better approach is just allowing the developer to pass in their own dumps method. Indeed, the developer already has that option:

import my_favorite_json
from sanic.response import json

async def handler(request):
    return json(..., dumps=my_favorite_json.dumps)

The developer is free to use an alternative. And, indeed, perhaps we can add some documentation on this.

At this time I would be more in favor of a PR that fixes the documentation that would explain how to do this rather than universally making the change for the entire project (especially since there would be a breaking change as pointed out).

Yeah, i agree with @ahopkins . Also looks like orjson only has manylinux wheel published https://pypi.org/simple/orjson/. there is no source dist, and no wheel for macos currently.

Can I take this one? :)

@szepnapot Please do. :beers:

(So I know we are on the same page, you are talking about the documentation part, right?)

Ok. I think it would be more appropriate to change the default to rapidjson or worse to change it to json and make the user choose another library for performance reasons. ujson has been unmaintained for years, that'll only get worse, and neither rapidjson nor json have its issues. rapidjson is close in performance for strings and dicts type web app payloads.

I think a documentation change on using a different JSON library isn't likely to have much effect though in that people also don't really change defaults and it doesn't help the library's goal of being fast out of the box.

orjson can distribute macOS, Windows, and source distributions, but it hasn't been done.

@ahopkins Yes :)
Can you give me some ideas what other examples can we show on the doc?
Or it's enough to show how to use a handler via this json example?

@szepnapot Since you are working on it, maybe also mention how to pass a loads when access JSON data from request:

import my_favorite_json
from sanic.response import json

async def handler(request):
    json_data = request.load_json(loads=my_favorite_json.loads)
    ...
    return json(..., dumps=my_favorite_json.dumps)

@yunstanford orjson now publishes Linux, macOS, and Windows wheels. Can we either go forward with this or split the documentation to another ticket so I can close?

@ijl i've tried once, but looks like some unit tests are failing. but didn't dig deep.

Any progress on this?

hmm for small JSONs ujson faster.

ultrajson 2.0.0 is released, supports Python 3.8 https://github.com/ultrajson/ultrajson/releases/tag/2.0.0

New release of the ujson contains breaking change: they do not serialize iterables anymore. So, for example, set can't be serialized anymore. Probably you should switch from the ujson finally

I'll reopen for anyone that wants to add discussion. Now that it is being supported again, I am less inclined to switch but only en to convincing arguments. Thanks for sharing @VMAtm , I'll take a read thru the release notes.

Looks like new ultrajson now degradate feature set to orjson.

What do you mean?

@ahopkins

Remove serialization of date/datetime objects (50181f0) @Jahaja
Remove double_precision encoding option and precise_float decoding option (eb7d894) @Jahaja
Remove generic serialization of objects/iterables (53f85b1) @Jahaja
Remove support for __json__ method on str (5f98f01) @Jahaja
Remove blist tests (3a6ba52) @Jahaja

for me, the most important feature was datetime conversion. Now ultrajson have the same feature set as orjson but slower.

Personally on a couple of performance-critical applications in my organisation I have replaced ujson with orjson, specifically for the built-in iso-format on datetime objects (and orjson is super fast).

@ashleysommer I see, for me, was the most important deserialization what not supported by orjson.

Yeah. That was one of the biggest reasons I was hesitant to make the switch because of the changes that would cause for Sanic users.

If it is a problem we will have to endure either way, I am back to the beginning on this one.

Anyone willing to put together another round of performance testing?

I been in the json lib hellhole for a while now, and im a vivid Sanic user and what I would love to get access to instead of choosing one in particular is to do something like this:

https://github.com/mattgiles/mujson

Have a "proxy" so you can load the compatible ones and it deside which to use based on dev prefs and so on. This would make Sanic a bit more "configurable" to the specific issue you are having :)

I need to speed up my json handling since I am running our entire backend microservice installation on Sanic with quite a bit of throughput, being able to use whats strong for the usecase is a strong feat and gives a very flexible solution to the json serialization and deserialization for Sanic. (In my eyes there are no unicorn in the json lib family)

I actually saw a 30% decrease in speed when switching from usjon to orjson. :thinking:

@ahopkins do you have a test case for reproducing?

Here is a version of what I use. I change this often, but it has more or less the information used.

As for Sanic, I tested it against this branch.

In general, benchmark may vary a lot depends on test cases(use cases), e.g. small objects, complicated nested objects, datetime, etc, those libs are likely having some local optimizations internally.

Absolutely. Benchmarks are (at best) a vague indication of trends. However, what else is there to decide if this is worthwhile? We probably need a test against a large variety of objects.

Was this page helpful?
0 / 5 - 0 ratings