Bjoern: Poor performance on Linux

Created on 24 Jan 2020  路  14Comments  路  Source: jonashaag/bjoern

Hello,
I'm assembling some performance results and found that Bjoern is much faster on macOS than on Linux.

I get one eighth the performance on Linux benchmarking with wrk. Is this by design? Is macOS the preferred platform or is something wrong?

Most helpful comment

Hm, just ran some benchmarks, it's looking ok, completely non-optimized Linux server w/ 2 vCPUs:

import bjoern

def app(env,sr):
    sr('200 ok', [('content-length','0')])
    return ''

bjoern.run(app, 'localhost', 8888, reuse_port=1)
$ venv/wrk/wrk -t12 -c400 -d10s http://127.0.0.1:8888/index.html
Running 10s test @ http://127.0.0.1:8888/index.html
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.76ms    5.45ms 139.52ms   84.60%
    Req/Sec     3.93k     1.08k   19.63k    80.95%
  470512 requests in 10.09s, 27.82MB read
Requests/sec:  46628.11
Transfer/sec:      2.76MB
$ venv/wrk/wrk -t4 -c20 -d10s http://127.0.0.1:8888/index.html
Running 10s test @ http://127.0.0.1:8888/index.html
  4 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   557.60us    1.69ms  49.67ms   98.13%
    Req/Sec    12.09k     4.20k   24.55k    76.25%
  481385 requests in 10.03s, 28.46MB read
Requests/sec:  48016.14
Transfer/sec:      2.84MB

Same numbers with multiple workers (either fork or receive steering).

I'd be very curious to reproduce this in your environment!

All 14 comments

Can you please share your exact benchmark setup on both machines?

I just took any default MacBook from 2019 and ran wrk from Homebrew with default settings against python3 from Homebrew; performance was in the category of Japronto roughly.

Then I ran it on Clear Linux with their python3 and my compiled wrk (which I know to perform very well) with again default settings and yeah performance was one eighth (10k vs 84k).

It's usually the other way around, Linux should outperform macOS in networking

I know that Linux machine to do 125k in that python3.8 with other modules, so 10k is way off

Thanks! Can you please share the exact Python application you were serving and command line parameters, and also the wrk configuration.

Can you also provide the max number of fds as reported by ulimit by your executing shell

No different than the most simple hello world app, no configuration other than default wrk http://localhost:8080 no extra parameters in any way, just straight out default and minimal.

I have like a million fds because I run larger tests from time to time, but wrk only uses 10 for this

Hm, just ran some benchmarks, it's looking ok, completely non-optimized Linux server w/ 2 vCPUs:

import bjoern

def app(env,sr):
    sr('200 ok', [('content-length','0')])
    return ''

bjoern.run(app, 'localhost', 8888, reuse_port=1)
$ venv/wrk/wrk -t12 -c400 -d10s http://127.0.0.1:8888/index.html
Running 10s test @ http://127.0.0.1:8888/index.html
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.76ms    5.45ms 139.52ms   84.60%
    Req/Sec     3.93k     1.08k   19.63k    80.95%
  470512 requests in 10.09s, 27.82MB read
Requests/sec:  46628.11
Transfer/sec:      2.76MB
$ venv/wrk/wrk -t4 -c20 -d10s http://127.0.0.1:8888/index.html
Running 10s test @ http://127.0.0.1:8888/index.html
  4 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   557.60us    1.69ms  49.67ms   98.13%
    Req/Sec    12.09k     4.20k   24.55k    76.25%
  481385 requests in 10.03s, 28.46MB read
Requests/sec:  48016.14
Transfer/sec:      2.84MB

Same numbers with multiple workers (either fork or receive steering).

I'd be very curious to reproduce this in your environment!

I heard Clear Linux is used by Intel specifically for development and might not be a the most representative.

That makes no sense. Clear Linux is a performance dist, and my very first comment mentioned that I get very high numbers with other server software.

The point I'm making and the bug I'm reporting is that - Bjoern has very competitive performance on macOS compared to other server software, but on Linux the deviation between that very same server software and Bjoern is very much to disadvantage for Bjoern.

That is, I'm not reporting Bjoern is slow, it isn't, but on Linux it is strangely non-competitive as if something was wrong.

Because getting one eighth the performance on Linux, compared to other server software is very poor.

That makes no sense. Clear Linux is a performance dist, and my very first comment mentioned that I get very high numbers with other server software.

Unfortunately it might be due to a number of issues plus Clear Linux being an exotic distro, plus Jonas was not able to reproduce your findings in an other distros. It would be helpful if you could provide him a minimal reproducible example, which I think Jonas requested in his previous comment.

For example you could use https://hub.docker.com/_/clearlinux and try to replicated you setup in a container.

I think Jonas did in fact replicate it. 48k for two CPUs - numbers mean nothing without info on what exactly those CPUs are, but it sounds way low.

No I can鈥檛 reproduce and are looking for an example.

Alright the difference is the most with default wrk settings:

./wrk http://localhost:3000
Running 10s test @ http://localhost:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.33ms   13.38ms  42.21ms   78.97%
    Req/Sec     6.02k     1.68k    9.64k    70.00%
  120611 requests in 10.07s, 10.81MB read
Requests/sec:  11982.66
Transfer/sec:      1.07MB

With 400 connections like you used the numbers are different:

./wrk -c400 http://localhost:3000
Running 10s test @ http://localhost:3000
  2 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.05ms   14.71ms  45.77ms   78.16%
    Req/Sec    29.59k     4.64k   40.56k    70.00%
  588631 requests in 10.02s, 52.80MB read
Requests/sec:  58760.47
Transfer/sec:      5.27MB

But either way it doesn't come close to other python servers with similar goals, on Linux. Here running with default wrk settings:

./wrk http://localhost:3000
Running 10s test @ http://localhost:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    67.36us   41.30us   4.23ms   90.27%
    Req/Sec    74.29k     8.69k   98.37k    65.35%
  1493810 requests in 10.10s, 102.57MB read
Requests/sec: 147908.51
Transfer/sec:     10.16MB

Where the diff is more than 10x, going down to 2.5x with 400 connections. So the case I'm primarily reporting is the one where you run default wrk settings - that's where the diff is unreasonably large.

Here's the very basic example test:

import bjoern

# This simple test does not include routing
def application(env, start_response):
    start_response('200 OK', [])
    return b"Hello Python!"

# I get poor performance on Linux, much better on macOS?
# 10k ish on Linux, 84k on macOS, let's compare on macOS to make it even!
bjoern.run(application, "localhost", 3000)

All of these numbers are running on my 10+ year old laptop, with same Python version.

Thanks, trying now in a Clear Linux Docker container. Can you please compare performance in your setup to Clear Linux running inside Docker, and to Ubuntu running inside Docker?

Also can you share what's the Python server in the last example, and the exact application code?

It's just a prototype I have, but you can see similar differences if you consider Japronto vs. Bjoern on macOS vs. on Linux. You don't need Clear Linux, there's nothing kernel-specific and Docker doesn't even give you the actual Clear Linux kernel, it gives you your own kernel. This applies to any kernel.

In any case, I wasn't really looking for a long discussion I was really just reporting my initial findings while doing benchmarks.

It鈥檚 an apples to oranges comparison. Japronto isn鈥檛 a WSGI server. I鈥檒l benchmark just for curiosity anyways :)

Btw I never got around to setting up Clear Linux since packages took forever to download and there鈥檚 no packaged libev. So that took too much time for my gusto.

I鈥檒l be closing this ticket but feel free to continue the discussion!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

voroninman picture voroninman  路  5Comments

saley89 picture saley89  路  34Comments

thedrow picture thedrow  路  22Comments

jonashaag picture jonashaag  路  18Comments

avloss picture avloss  路  3Comments