go 🚀 - runtime: lock-free channels

What is the update on this ?

olekukonko on 5 Oct 2015

Unfortunately the CL seems to have gone stale @dvyukov due to changes to runtime's code.

odeke-em on 12 Mar 2016

I have implemented an extremely simple lock-free channel here, if there's interest, I'm willing to implement it in the runtime and submit a CL (and re-license it under the Go license of course).

➜ go test -bench=. -benchmem -cpu 1,4,8,32 -benchtime 3s

#   ch := NewSize(runtime.NumCPU())
BenchmarkLFChan         30000000               168 ns/op              40 B/op          4 allocs/op
BenchmarkLFChan-4       30000000               175 ns/op              45 B/op          4 allocs/op
BenchmarkLFChan-8       20000000               205 ns/op              45 B/op          4 allocs/op
BenchmarkLFChan-32      20000000               201 ns/op              45 B/op          4 allocs/op

# ch := make(chan interface{}, runtime.NumCPU())
BenchmarkChan           50000000               115 ns/op               8 B/op          1 allocs/op
BenchmarkChan-4         20000000               261 ns/op               8 B/op          1 allocs/op
BenchmarkChan-8         20000000               331 ns/op               8 B/op          1 allocs/op
BenchmarkChan-32        10000000               532 ns/op               8 B/op          1 allocs/op

PASS
ok      github.com/OneOfOne/lfchan      51.663s

OneOfOne on 24 Mar 2016

@dvyukov @ianlancetaylor @rsc

OneOfOne on 24 Mar 2016

@OneOfOne The tricky part will be to support blocking on empty/full chan and select statements.
Also your chan does not seem to be FIFO which is a requirement.
Also you scan the whole array to find an item, this can be very expensive for large chans.

dvyukov on 24 Mar 2016

@dvyukov I'll work on fixing 2 and 3 and ping you back, and if I get your approval I'll try to figure out 1.

OneOfOne on 24 Mar 2016

Your benchmarks look good but any change to channels has to work efficiently with select statements.

ianlancetaylor on 24 Mar 2016

@dvyukov I updated the package and fixed all your notes.

@ianlancetaylor I added Select support to the package.

➜ go test -bench=. -benchmem -cpu 1,4,8,32 -benchtime 3s
# ch := NewSize(100)
BenchmarkLFChan         20000000               292 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-4       20000000               202 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-8       30000000               161 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-32      20000000               215 ns/op               8 B/op          1 allocs/op
# ch := make(chan interface{}, 100)
BenchmarkChan           10000000               371 ns/op               8 B/op          1 allocs/op
BenchmarkChan-4         10000000               378 ns/op               8 B/op          1 allocs/op
BenchmarkChan-8         10000000               506 ns/op               8 B/op          1 allocs/op
BenchmarkChan-32        10000000               513 ns/op               8 B/op          1 allocs/op

the reason 32 is slower than 8 is the fact I only have 8 cores, but theoretically, it should get faster with the number of (real) cores you throw at it.

OneOfOne on 26 Mar 2016

👍5

I'll benchmark up to 80 cores later today and post results.

jonhoo on 26 Mar 2016

👍5

On an 80-core Intel host:

$ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48,64,72,80 -benchtime 10s
PASS
BenchmarkLFChan         30000000               506 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-2       20000000              1107 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-4       10000000              1611 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-8       10000000              1710 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-16      10000000              2165 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-32      10000000              2192 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-48      10000000              2288 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-64      10000000              2354 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-72       5000000              2454 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-80       5000000              2554 ns/op               8 B/op          1 allocs/op
BenchmarkChan           20000000               768 ns/op               8 B/op          1 allocs/op
BenchmarkChan-2         20000000              1188 ns/op               8 B/op          1 allocs/op
BenchmarkChan-4         20000000              3215 ns/op               8 B/op          1 allocs/op
BenchmarkChan-8          5000000              3657 ns/op               8 B/op          1 allocs/op
BenchmarkChan-16        10000000              2734 ns/op               8 B/op          1 allocs/op
BenchmarkChan-32         5000000              2387 ns/op               8 B/op          1 allocs/op
BenchmarkChan-48         5000000              2448 ns/op               8 B/op          1 allocs/op
BenchmarkChan-64         5000000              2452 ns/op               8 B/op          1 allocs/op
BenchmarkChan-72         5000000              2552 ns/op               8 B/op          1 allocs/op
BenchmarkChan-80         5000000              2659 ns/op               8 B/op          1 allocs/op
ok      github.com/OneOfOne/lfchan      436.003s

On a 48-core AMD host:

$ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48 -benchtime 10s
PASS
BenchmarkLFChan         20000000               734 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-2       10000000              1271 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-4       20000000              1140 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-8       20000000              1097 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-16      10000000              1257 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-32      10000000              1564 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-48      10000000              2172 ns/op               8 B/op          1 allocs/op
BenchmarkChan           20000000               937 ns/op               8 B/op          1 allocs/op
BenchmarkChan-2         20000000              1147 ns/op               8 B/op          1 allocs/op
BenchmarkChan-4         10000000              1721 ns/op               8 B/op          1 allocs/op
BenchmarkChan-8         10000000              2372 ns/op               8 B/op          1 allocs/op
BenchmarkChan-16        10000000              2349 ns/op               8 B/op          1 allocs/op
BenchmarkChan-32         5000000              2295 ns/op               8 B/op          1 allocs/op
BenchmarkChan-48         5000000              2753 ns/op               8 B/op          1 allocs/op
ok      github.com/OneOfOne/lfchan      276.712s

Also, a plot:

jonhoo on 26 Mar 2016

👍9

@jonhoo thanks for running the benchmark, it is rather weird how different the numbers are, I wonder if it has something to do with having multiple CPU sockets.

OneOfOne on 26 Mar 2016

@OneOfOne These machines are also somewhat older, so each core is slower than what you'd find in your laptop nowadays.

jonhoo on 26 Mar 2016

Any idea why adding HW threads doesn't result in increased parallelism.
Could you describe the HW? How many HW threads per core?

On Sat, Mar 26, 2016 at 2:07 PM, Jon Gjengset [email protected]
wrote:

@OneOfOne https://github.com/OneOfOne These machines are also somewhat
older, so each core is slower than what you'd find in your laptop nowadays.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-201907711

RLH on 26 Mar 2016

@RLH: see the linked pages for hardware descriptions. I'm not pinning threads, but I never run with -cpu higher than the number of real cores.

jonhoo on 26 Mar 2016

@OneOfOne There are several problems with this implementation:

It is not FIFO.
Recv/Send can fail spuriously.
Len can return -1.
It does not support blocking and unblocking of goroutines (you must not just spin, but tell scheduler precisely when a goroutine can be descheduled and marked as blocked, and when it must be made runnable again).
The same applies to select: you must not just poll in a loop, you need to tell scheduler when to block and unblock the goroutine.

dvyukov on 27 Mar 2016

@dvyukov

fixed here.
I can't trigger that (might have gotten fixed with 1).
I'll see if I can fix it, I added a test, but actually have to use -count to trigger it.
That's not possible in user code AFAIK, I was gonna do that once I submit a CL and be able to use runtime funcs.
same as 4

OneOfOne on 27 Mar 2016

@dvyukov 3. fixed with https://github.com/OneOfOne/lfchan/commit/bdddd904676fc8368064cc2eb21efaa4384cd2db

OneOfOne on 27 Mar 2016

have you tried to write a SPIN model for your implementation?

the runtime used to have SPIN model for the scheduler, but
it got out of date pretty quickly. If we add more lock-free stuff
to the runtime, I'd suggest we start with a formal model and
strive to keep the formal model always in sync with the source
code.

minux on 28 Mar 2016

@minux I'm not sure what a SPIN model is.

OneOfOne on 28 Mar 2016

http://spinroot.com/spin/whatispin.html

bradfitz on 28 Mar 2016

👍1

@bradfitx thanks.

Yeah that's way above my head.

OneOfOne on 28 Mar 2016

@OneOfOne ping me if you want me to run another benchmark after making changes.

jonhoo on 28 Mar 2016

@jonhoo Please do, I believe I fixed all the issues that I could fix without
using some runtime magic.

On 03/28/2016 07:34 AM, Jon Gjengset wrote:

@OneOfOne https://github.com/OneOfOne ping me if you want me to run
another benchmark after making changes.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-202241176

OneOfOne on 28 Mar 2016

Seems to be doing worse for many cores now:

80core-intel $ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48,64,72,80 -benchtime 10s
PASS
BenchmarkLFChan     30000000           624 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-2   10000000          1266 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-4   10000000          2400 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-8    5000000          3423 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-16   5000000          3536 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-32   5000000          3305 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-48   5000000          3674 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-64   3000000          4078 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-72   3000000          4473 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-80   3000000          4539 ns/op           8 B/op          1 allocs/op
BenchmarkChan       20000000           742 ns/op           8 B/op          1 allocs/op
BenchmarkChan-2     20000000          1161 ns/op           8 B/op          1 allocs/op
BenchmarkChan-4      5000000          2412 ns/op           8 B/op          1 allocs/op
BenchmarkChan-8     10000000          1791 ns/op           8 B/op          1 allocs/op
BenchmarkChan-16    10000000          1966 ns/op           8 B/op          1 allocs/op
BenchmarkChan-32    10000000          2386 ns/op           8 B/op          1 allocs/op
BenchmarkChan-48    10000000          2338 ns/op           8 B/op          1 allocs/op
BenchmarkChan-64    10000000          2281 ns/op           8 B/op          1 allocs/op
BenchmarkChan-72    10000000          2298 ns/op           8 B/op          1 allocs/op
BenchmarkChan-80    10000000          2512 ns/op           8 B/op          1 allocs/op
ok      github.com/OneOfOne/lfchan  526.199s

48core-amd $ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48 -benchtime 10s
PASS
BenchmarkLFChan     20000000           863 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-2   10000000          1345 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-4   10000000          1257 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-8   10000000          1373 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-16  10000000          2270 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-32   5000000          3315 ns/op           8 B/op          1 allocs/op
BenchmarkLFChan-48   3000000          5026 ns/op           8 B/op          1 allocs/op
BenchmarkChan       20000000           969 ns/op           8 B/op          1 allocs/op
BenchmarkChan-2     20000000          1364 ns/op           8 B/op          1 allocs/op
BenchmarkChan-4     10000000          1956 ns/op           8 B/op          1 allocs/op
BenchmarkChan-8      5000000          2792 ns/op           8 B/op          1 allocs/op
BenchmarkChan-16     5000000          2718 ns/op           8 B/op          1 allocs/op
BenchmarkChan-32     5000000          2659 ns/op           8 B/op          1 allocs/op
BenchmarkChan-48     5000000          3101 ns/op           8 B/op          1 allocs/op
ok      github.com/OneOfOne/lfchan  358.916s

jonhoo on 28 Mar 2016

That's really weird, on my system it is completely different :-/

Thank you nonetheless for taking the time to test it, might be my "fake"
atomic value / spinlock.

➜ go test -bench=. -benchmem -run NONE -cpu 1,4,8 -benchtime 10s
BenchmarkLFChan 50000000 325 ns/op 8
B/op 1 allocs/op
BenchmarkLFChan-4 50000000 270 ns/op 8
B/op 1 allocs/op
BenchmarkLFChan-8 50000000 297 ns/op 8
B/op 1 allocs/op
BenchmarkChan 50000000 362 ns/op 8
B/op 1 allocs/op
BenchmarkChan-4 50000000 383 ns/op 8
B/op 1 allocs/op
BenchmarkChan-8 30000000 500 ns/op 8
B/op 1 allocs/op

On 03/28/2016 07:52 AM, Jon Gjengset wrote:

Seems to be doing worse for many cores now:

80core-intel $ perflock go test -bench=. -benchmem -cpu
1,2,4,8,16,32,48,64,72,80 -benchtime 10s
PASS
BenchmarkLFChan 30000000 624 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 10000000 1266 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 10000000 2400 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 5000000 3423 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 5000000 3536 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 5000000 3305 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 5000000 3674 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-64 3000000 4078 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-72 3000000 4473 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-80 3000000 4539 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 742 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1161 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 5000000 2412 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 10000000 1791 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 10000000 1966 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 10000000 2386 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 10000000 2338 ns/op 8 B/op 1 allocs/op
BenchmarkChan-64 10000000 2281 ns/op 8 B/op 1 allocs/op
BenchmarkChan-72 10000000 2298 ns/op 8 B/op 1 allocs/op
BenchmarkChan-80 10000000 2512 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 526.199s
48core-amd $ perflock go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48
-benchtime 10s
PASS
BenchmarkLFChan 20000000 863 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 10000000 1345 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 10000000 1257 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 10000000 1373 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 10000000 2270 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 5000000 3315 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 3000000 5026 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 969 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1364 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 10000000 1956 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 5000000 2792 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 5000000 2718 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 5000000 2659 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 5000000 3101 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 358.916s

x
https://cloud.githubusercontent.com/assets/176295/14071848/8ec0d1ee-f487-11e5-9895-9f9fdc8e089b.png

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-202243944

OneOfOne on 28 Mar 2016

Well, as I mentioned previously, the cores are probably slower than what's in your laptop. The trend is what's important, and it's clearly not great. Looking at cpu utilization during the 80-core benchmark, all cores are active. If it's at all helpful, some profiling results:

(pprof) top20
27010ms of 29760ms total (90.76%)
Dropped 100 nodes (cum <= 148.80ms)
Showing top 20 nodes out of 57 (cum >= 9970ms)
      flat  flat%   sum%        cum   cum%
   12060ms 40.52% 40.52%    12060ms 40.52%  runtime.futex
    3020ms 10.15% 50.67%     3020ms 10.15%  runtime/internal/atomic.Xchg
    1620ms  5.44% 56.12%     1620ms  5.44%  sync/atomic.AddUint32
    1540ms  5.17% 61.29%     5940ms 19.96%  github.com/OneOfOne/lfchan.Chan.Send
    1370ms  4.60% 65.89%     8060ms 27.08%  runtime.lock
    1130ms  3.80% 69.69%     1130ms  3.80%  runtime/internal/atomic.Cas
    1030ms  3.46% 73.15%     1030ms  3.46%  sync/atomic.CompareAndSwapUint32
    1020ms  3.43% 76.58%     1020ms  3.43%  runtime.globrunqget
     590ms  1.98% 78.56%      590ms  1.98%  runtime/internal/atomic.Xadd
     590ms  1.98% 80.54%      590ms  1.98%  sync/atomic.LoadUint32
     450ms  1.51% 82.06%      450ms  1.51%  runtime.procyield
     410ms  1.38% 83.43%     4400ms 14.78%  github.com/OneOfOne/lfchan.Chan.Recv
     330ms  1.11% 84.54%     7870ms 26.44%  runtime.schedule
     300ms  1.01% 85.55%      300ms  1.01%  runtime.memmove
     300ms  1.01% 86.56%      300ms  1.01%  runtime.osyield
     300ms  1.01% 87.57%      300ms  1.01%  sync/atomic.AddUint64
     290ms  0.97% 88.54%      450ms  1.51%  runtime.mallocgc
     260ms  0.87% 89.42%      670ms  2.25%  runtime.execute
     200ms  0.67% 90.09%      200ms  0.67%  runtime.gogo
     200ms  0.67% 90.76%     9970ms 33.50%  runtime.unlock

jonhoo on 28 Mar 2016

Comparatively, here's the profile from BenchmarkChan:

(pprof) top20
22140ms of 25220ms total (87.79%)
Dropped 107 nodes (cum <= 126.10ms)
Showing top 20 nodes out of 69 (cum >= 330ms)
      flat  flat%   sum%        cum   cum%
   12490ms 49.52% 49.52%    12490ms 49.52%  runtime.futex
    1500ms  5.95% 55.47%     1500ms  5.95%  runtime/internal/atomic.Xchg
    1290ms  5.11% 60.59%     1290ms  5.11%  runtime.memmove
     900ms  3.57% 64.16%     7210ms 28.59%  runtime.lock
     610ms  2.42% 66.57%      610ms  2.42%  runtime/internal/atomic.Xadd
     600ms  2.38% 68.95%     5890ms 23.35%  runtime.chansend
     530ms  2.10% 71.05%      550ms  2.18%  testing.(*PB).Next
     490ms  1.94% 73.00%      490ms  1.94%  runtime/internal/atomic.Cas
     480ms  1.90% 74.90%      480ms  1.90%  runtime.procyield
     440ms  1.74% 76.65%      440ms  1.74%  runtime.osyield
     410ms  1.63% 78.27%      410ms  1.63%  runtime.gopark
     360ms  1.43% 79.70%     1790ms  7.10%  runtime.schedule
     340ms  1.35% 81.05%      340ms  1.35%  sync/atomic.AddUint64
     330ms  1.31% 82.36%      460ms  1.82%  runtime.execute
     310ms  1.23% 83.58%      720ms  2.85%  runtime.goparkunlock
     240ms  0.95% 84.54%      440ms  1.74%  runtime.mallocgc
     240ms  0.95% 85.49%      240ms  0.95%  runtime/internal/atomic.Load
     200ms  0.79% 86.28%      200ms  0.79%  runtime.(*waitq).dequeue
     200ms  0.79% 87.07%      200ms  0.79%  runtime/internal/atomic.Cas64
     180ms  0.71% 87.79%      330ms  1.31%  runtime.casgstatus

jonhoo on 28 Mar 2016

Jon, can you post the svg or pdf of that profile please.

On Mon, Mar 28, 2016 at 5:14 PM, Jon Gjengset [email protected]
wrote:

Well, as I mentioned previously, the cores are probably slower than what's
in your laptop. The trend is what's important, and it's clearly not great.
Looking at cpu utilization during the 80-core benchmark, all cores are
active. If it's at all helpful, some profiling results:

(pprof) top20
27010ms of 29760ms total (90.76%)
Dropped 100 nodes (cum <= 148.80ms)
Showing top 20 nodes out of 57 (cum >= 9970ms)
flat flat% sum% cum cum%
12060ms 40.52% 40.52% 12060ms 40.52% runtime.futex
3020ms 10.15% 50.67% 3020ms 10.15% runtime/internal/atomic.Xchg
1620ms 5.44% 56.12% 1620ms 5.44% sync/atomic.AddUint32
1540ms 5.17% 61.29% 5940ms 19.96% github.com/OneOfOne/lfchan.Chan.Send
1370ms 4.60% 65.89% 8060ms 27.08% runtime.lock
1130ms 3.80% 69.69% 1130ms 3.80% runtime/internal/atomic.Cas
1030ms 3.46% 73.15% 1030ms 3.46% sync/atomic.CompareAndSwapUint32
1020ms 3.43% 76.58% 1020ms 3.43% runtime.globrunqget
590ms 1.98% 78.56% 590ms 1.98% runtime/internal/atomic.Xadd
590ms 1.98% 80.54% 590ms 1.98% sync/atomic.LoadUint32
450ms 1.51% 82.06% 450ms 1.51% runtime.procyield
410ms 1.38% 83.43% 4400ms 14.78% github.com/OneOfOne/lfchan.Chan.Recv
330ms 1.11% 84.54% 7870ms 26.44% runtime.schedule
300ms 1.01% 85.55% 300ms 1.01% runtime.memmove
300ms 1.01% 86.56% 300ms 1.01% runtime.osyield
300ms 1.01% 87.57% 300ms 1.01% sync/atomic.AddUint64
290ms 0.97% 88.54% 450ms 1.51% runtime.mallocgc
260ms 0.87% 89.42% 670ms 2.25% runtime.execute
200ms 0.67% 90.09% 200ms 0.67% runtime.gogo
200ms 0.67% 90.76% 9970ms 33.50% runtime.unlock

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-202253563

davecheney on 28 Mar 2016

BenchmarkLFChan:

BenchmarkChan:

jonhoo on 28 Mar 2016

@OneOfOne why is your code calling runtime.GOMAXPROCS constantly, your code should assume that value won't change.

davecheney on 28 Mar 2016

I believe benchmarks in https://github.com/OneOfOne/lfchan/blob/master/chan_test.go are not benchmarking what you want. Operations like wg.Add, wg.Done and creating many goroutines probably limit scalability and skew results.

I think a better approach would be to use 1 reader and 1 writer goroutine and let RunParallel handle concurrency level for you. Also avoid using atomic operations and wait groups.

Also I want to point out that the following result:

BenchmarkLFChan         50000000               325 ns/op
BenchmarkLFChan-4       50000000               270 ns/op

shows that the benchmarked operation became 270*4/325=3.3 times slower.

kostya-sh on 28 Mar 2016

For a lock free implementation, this code sure does spend a long time in runtime.lock.

davecheney on 28 Mar 2016

😄4

@jonhoo I fixed that, would you be kind to rerun the benchmarks, I simplified them as per @kostya-sh's suggestion and removed the GOMAXPROCS call.

@dvyukov pointed me to the right direction with his runtime.lock remark.

➜ go test -bench=. -benchmem  -run NONE -cpu 1,4,8 -benchtime 10s
BenchmarkLFChan         100000000              142 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-4       100000000              174 ns/op               8 B/op          1 allocs/op
BenchmarkLFChan-8       100000000              132 ns/op               8 B/op          1 allocs/op
BenchmarkChan           100000000              102 ns/op               8 B/op          1 allocs/op
BenchmarkChan-4         50000000               257 ns/op               8 B/op          1 allocs/op
BenchmarkChan-8         50000000               337 ns/op               8 B/op          1 allocs/op
PASS
ok      github.com/OneOfOne/lfchan      86.061

OneOfOne on 28 Mar 2016

This bug was originally about lock-free channels in the runtime.

All this recent discussion is about an external package. Can we move discussion to elsewhere on Github (e.g. https://github.com/OneOfOne/lfchan/issues) so I can remain subscribed to this bug for its original purpose?

bradfitz on 28 Mar 2016

👍9

Created an issue over at OneOfOne/lfchan#3. Will continue posting benchmark results there.

jonhoo on 28 Mar 2016

👍2

@OneOfOne The queue is still not FIFO after 2f9d3a217eadbc557e758649fa153dd59ff14c11 and len still can return -1. You can block and unblock goroutines without runtime support, sync.Mutex+Cond can block/unblock goroutines. You can stab runtime scheduler by implementing 2 functions on top of Mutex+Cond: park(g *G) and unpark(g *G).

dvyukov on 3 Apr 2016

👍1

@jonhoo how did you generate the plot?

gorakhargosh on 23 Apr 2016

@gorakhargosh: https://github.com/OneOfOne/lfchan/issues/3#issuecomment-203239057

jonhoo on 24 Apr 2016

@dvyukov ping?

gyf19 on 27 Dec 2016

After the recent fairness changes in channels, my algorithm does not apply per se. My algorithm does not have the fairness properties that the current code provide.

dvyukov on 27 Dec 2016

@dvyukov any plans to revisit this in future ?

olekukonko on 6 Jan 2017

No particular plans.

dvyukov on 6 Jan 2017

@dvyukov

What are the fairness properties this implementation is missing?
Other than that, is there anything else wrong with it?

stjepang on 6 Apr 2017

@stjepang I've lost all context already. What implementation do you mean? Why do you think there are fairness problems?

dvyukov on 7 Apr 2017

@dvyukov In order to improve the performance of Go's channels, you designed almost lock-free channels and wrote this implementation. Now it's apparently abandonded, as you commented:

After the recent fairness changes in channels, my algorithm does not apply per se. My algorithm does not have the fairness properties that the current code provide.

I was wondering what are the fairness changes that make your algorithm inapplicable. Why exactly your algorithm didn't make it as the official Go channel implementation? What are the problems with it that still need to be solved?

stjepang on 7 Apr 2017

I was wondering what are the fairness changes that make your algorithm inapplicable.

I mean this change:
https://go-review.googlesource.com/c/16740
Though, it does not mention fairness nor adds tests. So probably we can break the additional fairness guarantees that it provides.

Why exactly your algorithm didn't make it as the official Go channel implementation? What are the problems with it that still need to be solved?

It was constantly deprioritized over other runtime changes, thus constantly get outdated and I wasn't able to keep up updating it. As far as I remember it all worked. Now the major problem is rebasing it onto current runtime code. Also need to decide what to do with respect to the fairness change 16740.

dvyukov on 7 Apr 2017

@dvyukovI, I might be wrong but I noticed you have not been active like before. Is this as a result of your busy schedule or you lost the zeal as a result of multiple rejections of some of your proposal or working on a secret project we can't know about yet :)

olekukonko on 7 Apr 2017

@olekukonko working on a non-secret project you can know about: https://github.com/google/syzkaller

dvyukov on 7 Apr 2017

@dvyukov, the project looks Interesting, I will be experimenting with it and thanks for the prompt response. Regards.

olekukonko on 7 Apr 2017

Are there any plans to revisit this?

empijei on 19 Mar 2019

Go: runtime: lock-free channels

Most helpful comment

All 50 comments

Related issues