Hello,
I'm assembling some performance results and found that Bjoern is much faster on macOS than on Linux.
I get one eighth the performance on Linux benchmarking with wrk. Is this by design? Is macOS the preferred platform or is something wrong?
Can you please share your exact benchmark setup on both machines?
I just took any default MacBook from 2019 and ran wrk from Homebrew with default settings against python3 from Homebrew; performance was in the category of Japronto roughly.
Then I ran it on Clear Linux with their python3 and my compiled wrk (which I know to perform very well) with again default settings and yeah performance was one eighth (10k vs 84k).
It's usually the other way around, Linux should outperform macOS in networking
I know that Linux machine to do 125k in that python3.8 with other modules, so 10k is way off
Thanks! Can you please share the exact Python application you were serving and command line parameters, and also the wrk configuration.
Can you also provide the max number of fds as reported by ulimit by your executing shell
No different than the most simple hello world app, no configuration other than default wrk http://localhost:8080 no extra parameters in any way, just straight out default and minimal.
I have like a million fds because I run larger tests from time to time, but wrk only uses 10 for this
Hm, just ran some benchmarks, it's looking ok, completely non-optimized Linux server w/ 2 vCPUs:
import bjoern
def app(env,sr):
sr('200 ok', [('content-length','0')])
return ''
bjoern.run(app, 'localhost', 8888, reuse_port=1)
$ venv/wrk/wrk -t12 -c400 -d10s http://127.0.0.1:8888/index.html
Running 10s test @ http://127.0.0.1:8888/index.html
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.76ms 5.45ms 139.52ms 84.60%
Req/Sec 3.93k 1.08k 19.63k 80.95%
470512 requests in 10.09s, 27.82MB read
Requests/sec: 46628.11
Transfer/sec: 2.76MB
$ venv/wrk/wrk -t4 -c20 -d10s http://127.0.0.1:8888/index.html
Running 10s test @ http://127.0.0.1:8888/index.html
4 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 557.60us 1.69ms 49.67ms 98.13%
Req/Sec 12.09k 4.20k 24.55k 76.25%
481385 requests in 10.03s, 28.46MB read
Requests/sec: 48016.14
Transfer/sec: 2.84MB
Same numbers with multiple workers (either fork or receive steering).
I'd be very curious to reproduce this in your environment!
I heard Clear Linux is used by Intel specifically for development and might not be a the most representative.
That makes no sense. Clear Linux is a performance dist, and my very first comment mentioned that I get very high numbers with other server software.
The point I'm making and the bug I'm reporting is that - Bjoern has very competitive performance on macOS compared to other server software, but on Linux the deviation between that very same server software and Bjoern is very much to disadvantage for Bjoern.
That is, I'm not reporting Bjoern is slow, it isn't, but on Linux it is strangely non-competitive as if something was wrong.
Because getting one eighth the performance on Linux, compared to other server software is very poor.
That makes no sense. Clear Linux is a performance dist, and my very first comment mentioned that I get very high numbers with other server software.
Unfortunately it might be due to a number of issues plus Clear Linux being an exotic distro, plus Jonas was not able to reproduce your findings in an other distros. It would be helpful if you could provide him a minimal reproducible example, which I think Jonas requested in his previous comment.
For example you could use https://hub.docker.com/_/clearlinux and try to replicated you setup in a container.
I think Jonas did in fact replicate it. 48k for two CPUs - numbers mean nothing without info on what exactly those CPUs are, but it sounds way low.
No I can鈥檛 reproduce and are looking for an example.
Alright the difference is the most with default wrk settings:
./wrk http://localhost:3000
Running 10s test @ http://localhost:3000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.33ms 13.38ms 42.21ms 78.97%
Req/Sec 6.02k 1.68k 9.64k 70.00%
120611 requests in 10.07s, 10.81MB read
Requests/sec: 11982.66
Transfer/sec: 1.07MB
With 400 connections like you used the numbers are different:
./wrk -c400 http://localhost:3000
Running 10s test @ http://localhost:3000
2 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 13.05ms 14.71ms 45.77ms 78.16%
Req/Sec 29.59k 4.64k 40.56k 70.00%
588631 requests in 10.02s, 52.80MB read
Requests/sec: 58760.47
Transfer/sec: 5.27MB
But either way it doesn't come close to other python servers with similar goals, on Linux. Here running with default wrk settings:
./wrk http://localhost:3000
Running 10s test @ http://localhost:3000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 67.36us 41.30us 4.23ms 90.27%
Req/Sec 74.29k 8.69k 98.37k 65.35%
1493810 requests in 10.10s, 102.57MB read
Requests/sec: 147908.51
Transfer/sec: 10.16MB
Where the diff is more than 10x, going down to 2.5x with 400 connections. So the case I'm primarily reporting is the one where you run default wrk settings - that's where the diff is unreasonably large.
Here's the very basic example test:
import bjoern
# This simple test does not include routing
def application(env, start_response):
start_response('200 OK', [])
return b"Hello Python!"
# I get poor performance on Linux, much better on macOS?
# 10k ish on Linux, 84k on macOS, let's compare on macOS to make it even!
bjoern.run(application, "localhost", 3000)
All of these numbers are running on my 10+ year old laptop, with same Python version.
Thanks, trying now in a Clear Linux Docker container. Can you please compare performance in your setup to Clear Linux running inside Docker, and to Ubuntu running inside Docker?
Also can you share what's the Python server in the last example, and the exact application code?
It's just a prototype I have, but you can see similar differences if you consider Japronto vs. Bjoern on macOS vs. on Linux. You don't need Clear Linux, there's nothing kernel-specific and Docker doesn't even give you the actual Clear Linux kernel, it gives you your own kernel. This applies to any kernel.
In any case, I wasn't really looking for a long discussion I was really just reporting my initial findings while doing benchmarks.
It鈥檚 an apples to oranges comparison. Japronto isn鈥檛 a WSGI server. I鈥檒l benchmark just for curiosity anyways :)
Btw I never got around to setting up Clear Linux since packages took forever to download and there鈥檚 no packaged libev. So that took too much time for my gusto.
I鈥檒l be closing this ticket but feel free to continue the discussion!
Most helpful comment
Hm, just ran some benchmarks, it's looking ok, completely non-optimized Linux server w/ 2 vCPUs:
Same numbers with multiple workers (either fork or receive steering).
I'd be very curious to reproduce this in your environment!