Here is design doc: https://docs.google.com/document/d/1yIAYmbvL3JxOKOjuCyon7JhW4cSv1wy5hC0ApeGMV9s/pub Here is the implementation: https://golang.org/cl/12544043/ I've benchmarked the change of a real app where chans were abandoned in favor of other sync primitives due to perf reasons: https://groups.google.com/d/msg/golang-dev/0IElw_BbTrk/cGHMdNoHGQEJ and the change provides 23% end-to-end speedup making chans significantly faster than the other synchronization.
What is the update on this ?
Unfortunately the CL seems to have gone stale @dvyukov due to changes to runtime's code.
I have implemented an extremely simple lock-free channel here, if there's interest, I'm willing to implement it in the runtime and submit a CL (and re-license it under the Go license of course).
➜ go test -bench=. -benchmem -cpu 1,4,8,32 -benchtime 3s
# ch := NewSize(runtime.NumCPU())
BenchmarkLFChan 30000000 168 ns/op 40 B/op 4 allocs/op
BenchmarkLFChan-4 30000000 175 ns/op 45 B/op 4 allocs/op
BenchmarkLFChan-8 20000000 205 ns/op 45 B/op 4 allocs/op
BenchmarkLFChan-32 20000000 201 ns/op 45 B/op 4 allocs/op
# ch := make(chan interface{}, runtime.NumCPU())
BenchmarkChan 50000000 115 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 20000000 261 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 20000000 331 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 10000000 532 ns/op 8 B/op 1 allocs/op
PASS
ok github.com/OneOfOne/lfchan 51.663s
@dvyukov @ianlancetaylor @rsc
@OneOfOne The tricky part will be to support blocking on empty/full chan and select statements.
Also your chan does not seem to be FIFO which is a requirement.
Also you scan the whole array to find an item, this can be very expensive for large chans.
@dvyukov I'll work on fixing 2 and 3 and ping you back, and if I get your approval I'll try to figure out 1.
Your benchmarks look good but any change to channels has to work efficiently with select statements.
@dvyukov I updated the package and fixed all your notes.
@ianlancetaylor I added Select support to the package.
➜ go test -bench=. -benchmem -cpu 1,4,8,32 -benchtime 3s
# ch := NewSize(100)
BenchmarkLFChan 20000000 292 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 20000000 202 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 30000000 161 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 20000000 215 ns/op 8 B/op 1 allocs/op
# ch := make(chan interface{}, 100)
BenchmarkChan 10000000 371 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 10000000 378 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 10000000 506 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 10000000 513 ns/op 8 B/op 1 allocs/op
I'll benchmark up to 80 cores later today and post results.
On an 80-core Intel host:
$ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48,64,72,80 -benchtime 10s
PASS
BenchmarkLFChan 30000000 506 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 20000000 1107 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 10000000 1611 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 10000000 1710 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 10000000 2165 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 10000000 2192 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 10000000 2288 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-64 10000000 2354 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-72 5000000 2454 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-80 5000000 2554 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 768 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1188 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 20000000 3215 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 5000000 3657 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 10000000 2734 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 5000000 2387 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 5000000 2448 ns/op 8 B/op 1 allocs/op
BenchmarkChan-64 5000000 2452 ns/op 8 B/op 1 allocs/op
BenchmarkChan-72 5000000 2552 ns/op 8 B/op 1 allocs/op
BenchmarkChan-80 5000000 2659 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 436.003s
On a 48-core AMD host:
$ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48 -benchtime 10s
PASS
BenchmarkLFChan 20000000 734 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 10000000 1271 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 20000000 1140 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 20000000 1097 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 10000000 1257 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 10000000 1564 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 10000000 2172 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 937 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1147 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 10000000 1721 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 10000000 2372 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 10000000 2349 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 5000000 2295 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 5000000 2753 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 276.712s
Also, a plot:
@jonhoo thanks for running the benchmark, it is rather weird how different the numbers are, I wonder if it has something to do with having multiple CPU sockets.
@OneOfOne These machines are also somewhat older, so each core is slower than what you'd find in your laptop nowadays.
Any idea why adding HW threads doesn't result in increased parallelism.
Could you describe the HW? How many HW threads per core?
On Sat, Mar 26, 2016 at 2:07 PM, Jon Gjengset [email protected]
wrote:
@OneOfOne https://github.com/OneOfOne These machines are also somewhat
older, so each core is slower than what you'd find in your laptop nowadays.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-201907711
@RLH: see the linked pages for hardware descriptions. I'm not pinning threads, but I never run with -cpu
higher than the number of real cores.
@OneOfOne There are several problems with this implementation:
@dvyukov
-count
to trigger it.@dvyukov 3. fixed with https://github.com/OneOfOne/lfchan/commit/bdddd904676fc8368064cc2eb21efaa4384cd2db
have you tried to write a SPIN model for your implementation?
the runtime used to have SPIN model for the scheduler, but
it got out of date pretty quickly. If we add more lock-free stuff
to the runtime, I'd suggest we start with a formal model and
strive to keep the formal model always in sync with the source
code.
@minux I'm not sure what a SPIN model is.
@bradfitx thanks.
Yeah that's way above my head.
@OneOfOne ping me if you want me to run another benchmark after making changes.
@jonhoo Please do, I believe I fixed all the issues that I could fix without
using some runtime magic.
On 03/28/2016 07:34 AM, Jon Gjengset wrote:
@OneOfOne https://github.com/OneOfOne ping me if you want me to run
another benchmark after making changes.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-202241176
Seems to be doing worse for many cores now:
80core-intel $ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48,64,72,80 -benchtime 10s
PASS
BenchmarkLFChan 30000000 624 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 10000000 1266 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 10000000 2400 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 5000000 3423 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 5000000 3536 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 5000000 3305 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 5000000 3674 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-64 3000000 4078 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-72 3000000 4473 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-80 3000000 4539 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 742 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1161 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 5000000 2412 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 10000000 1791 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 10000000 1966 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 10000000 2386 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 10000000 2338 ns/op 8 B/op 1 allocs/op
BenchmarkChan-64 10000000 2281 ns/op 8 B/op 1 allocs/op
BenchmarkChan-72 10000000 2298 ns/op 8 B/op 1 allocs/op
BenchmarkChan-80 10000000 2512 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 526.199s
48core-amd $ go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48 -benchtime 10s
PASS
BenchmarkLFChan 20000000 863 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 10000000 1345 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 10000000 1257 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 10000000 1373 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 10000000 2270 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 5000000 3315 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 3000000 5026 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 969 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1364 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 10000000 1956 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 5000000 2792 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 5000000 2718 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 5000000 2659 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 5000000 3101 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 358.916s
That's really weird, on my system it is completely different :-/
Thank you nonetheless for taking the time to test it, might be my "fake"
atomic value / spinlock.
➜ go test -bench=. -benchmem -run NONE -cpu 1,4,8 -benchtime 10s
BenchmarkLFChan 50000000 325 ns/op 8
B/op 1 allocs/op
BenchmarkLFChan-4 50000000 270 ns/op 8
B/op 1 allocs/op
BenchmarkLFChan-8 50000000 297 ns/op 8
B/op 1 allocs/op
BenchmarkChan 50000000 362 ns/op 8
B/op 1 allocs/op
BenchmarkChan-4 50000000 383 ns/op 8
B/op 1 allocs/op
BenchmarkChan-8 30000000 500 ns/op 8
B/op 1 allocs/op
On 03/28/2016 07:52 AM, Jon Gjengset wrote:
Seems to be doing worse for many cores now:
80core-intel $ perflock go test -bench=. -benchmem -cpu
1,2,4,8,16,32,48,64,72,80 -benchtime 10s
PASS
BenchmarkLFChan 30000000 624 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 10000000 1266 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 10000000 2400 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 5000000 3423 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 5000000 3536 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 5000000 3305 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 5000000 3674 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-64 3000000 4078 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-72 3000000 4473 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-80 3000000 4539 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 742 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1161 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 5000000 2412 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 10000000 1791 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 10000000 1966 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 10000000 2386 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 10000000 2338 ns/op 8 B/op 1 allocs/op
BenchmarkChan-64 10000000 2281 ns/op 8 B/op 1 allocs/op
BenchmarkChan-72 10000000 2298 ns/op 8 B/op 1 allocs/op
BenchmarkChan-80 10000000 2512 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 526.199s
48core-amd $ perflock go test -bench=. -benchmem -cpu 1,2,4,8,16,32,48
-benchtime 10s
PASS
BenchmarkLFChan 20000000 863 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-2 10000000 1345 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 10000000 1257 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 10000000 1373 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-16 10000000 2270 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-32 5000000 3315 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-48 3000000 5026 ns/op 8 B/op 1 allocs/op
BenchmarkChan 20000000 969 ns/op 8 B/op 1 allocs/op
BenchmarkChan-2 20000000 1364 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 10000000 1956 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 5000000 2792 ns/op 8 B/op 1 allocs/op
BenchmarkChan-16 5000000 2718 ns/op 8 B/op 1 allocs/op
BenchmarkChan-32 5000000 2659 ns/op 8 B/op 1 allocs/op
BenchmarkChan-48 5000000 3101 ns/op 8 B/op 1 allocs/op
ok github.com/OneOfOne/lfchan 358.916sx
https://cloud.githubusercontent.com/assets/176295/14071848/8ec0d1ee-f487-11e5-9895-9f9fdc8e089b.png—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-202243944
Well, as I mentioned previously, the cores are probably slower than what's in your laptop. The trend is what's important, and it's clearly not great. Looking at cpu utilization during the 80-core benchmark, all cores are active. If it's at all helpful, some profiling results:
(pprof) top20
27010ms of 29760ms total (90.76%)
Dropped 100 nodes (cum <= 148.80ms)
Showing top 20 nodes out of 57 (cum >= 9970ms)
flat flat% sum% cum cum%
12060ms 40.52% 40.52% 12060ms 40.52% runtime.futex
3020ms 10.15% 50.67% 3020ms 10.15% runtime/internal/atomic.Xchg
1620ms 5.44% 56.12% 1620ms 5.44% sync/atomic.AddUint32
1540ms 5.17% 61.29% 5940ms 19.96% github.com/OneOfOne/lfchan.Chan.Send
1370ms 4.60% 65.89% 8060ms 27.08% runtime.lock
1130ms 3.80% 69.69% 1130ms 3.80% runtime/internal/atomic.Cas
1030ms 3.46% 73.15% 1030ms 3.46% sync/atomic.CompareAndSwapUint32
1020ms 3.43% 76.58% 1020ms 3.43% runtime.globrunqget
590ms 1.98% 78.56% 590ms 1.98% runtime/internal/atomic.Xadd
590ms 1.98% 80.54% 590ms 1.98% sync/atomic.LoadUint32
450ms 1.51% 82.06% 450ms 1.51% runtime.procyield
410ms 1.38% 83.43% 4400ms 14.78% github.com/OneOfOne/lfchan.Chan.Recv
330ms 1.11% 84.54% 7870ms 26.44% runtime.schedule
300ms 1.01% 85.55% 300ms 1.01% runtime.memmove
300ms 1.01% 86.56% 300ms 1.01% runtime.osyield
300ms 1.01% 87.57% 300ms 1.01% sync/atomic.AddUint64
290ms 0.97% 88.54% 450ms 1.51% runtime.mallocgc
260ms 0.87% 89.42% 670ms 2.25% runtime.execute
200ms 0.67% 90.09% 200ms 0.67% runtime.gogo
200ms 0.67% 90.76% 9970ms 33.50% runtime.unlock
Comparatively, here's the profile from BenchmarkChan
:
(pprof) top20
22140ms of 25220ms total (87.79%)
Dropped 107 nodes (cum <= 126.10ms)
Showing top 20 nodes out of 69 (cum >= 330ms)
flat flat% sum% cum cum%
12490ms 49.52% 49.52% 12490ms 49.52% runtime.futex
1500ms 5.95% 55.47% 1500ms 5.95% runtime/internal/atomic.Xchg
1290ms 5.11% 60.59% 1290ms 5.11% runtime.memmove
900ms 3.57% 64.16% 7210ms 28.59% runtime.lock
610ms 2.42% 66.57% 610ms 2.42% runtime/internal/atomic.Xadd
600ms 2.38% 68.95% 5890ms 23.35% runtime.chansend
530ms 2.10% 71.05% 550ms 2.18% testing.(*PB).Next
490ms 1.94% 73.00% 490ms 1.94% runtime/internal/atomic.Cas
480ms 1.90% 74.90% 480ms 1.90% runtime.procyield
440ms 1.74% 76.65% 440ms 1.74% runtime.osyield
410ms 1.63% 78.27% 410ms 1.63% runtime.gopark
360ms 1.43% 79.70% 1790ms 7.10% runtime.schedule
340ms 1.35% 81.05% 340ms 1.35% sync/atomic.AddUint64
330ms 1.31% 82.36% 460ms 1.82% runtime.execute
310ms 1.23% 83.58% 720ms 2.85% runtime.goparkunlock
240ms 0.95% 84.54% 440ms 1.74% runtime.mallocgc
240ms 0.95% 85.49% 240ms 0.95% runtime/internal/atomic.Load
200ms 0.79% 86.28% 200ms 0.79% runtime.(*waitq).dequeue
200ms 0.79% 87.07% 200ms 0.79% runtime/internal/atomic.Cas64
180ms 0.71% 87.79% 330ms 1.31% runtime.casgstatus
Jon, can you post the svg or pdf of that profile please.
On Mon, Mar 28, 2016 at 5:14 PM, Jon Gjengset [email protected]
wrote:
Well, as I mentioned previously, the cores are probably slower than what's
in your laptop. The trend is what's important, and it's clearly not great.
Looking at cpu utilization during the 80-core benchmark, all cores are
active. If it's at all helpful, some profiling results:(pprof) top20
27010ms of 29760ms total (90.76%)
Dropped 100 nodes (cum <= 148.80ms)
Showing top 20 nodes out of 57 (cum >= 9970ms)
flat flat% sum% cum cum%
12060ms 40.52% 40.52% 12060ms 40.52% runtime.futex
3020ms 10.15% 50.67% 3020ms 10.15% runtime/internal/atomic.Xchg
1620ms 5.44% 56.12% 1620ms 5.44% sync/atomic.AddUint32
1540ms 5.17% 61.29% 5940ms 19.96% github.com/OneOfOne/lfchan.Chan.Send
1370ms 4.60% 65.89% 8060ms 27.08% runtime.lock
1130ms 3.80% 69.69% 1130ms 3.80% runtime/internal/atomic.Cas
1030ms 3.46% 73.15% 1030ms 3.46% sync/atomic.CompareAndSwapUint32
1020ms 3.43% 76.58% 1020ms 3.43% runtime.globrunqget
590ms 1.98% 78.56% 590ms 1.98% runtime/internal/atomic.Xadd
590ms 1.98% 80.54% 590ms 1.98% sync/atomic.LoadUint32
450ms 1.51% 82.06% 450ms 1.51% runtime.procyield
410ms 1.38% 83.43% 4400ms 14.78% github.com/OneOfOne/lfchan.Chan.Recv
330ms 1.11% 84.54% 7870ms 26.44% runtime.schedule
300ms 1.01% 85.55% 300ms 1.01% runtime.memmove
300ms 1.01% 86.56% 300ms 1.01% runtime.osyield
300ms 1.01% 87.57% 300ms 1.01% sync/atomic.AddUint64
290ms 0.97% 88.54% 450ms 1.51% runtime.mallocgc
260ms 0.87% 89.42% 670ms 2.25% runtime.execute
200ms 0.67% 90.09% 200ms 0.67% runtime.gogo
200ms 0.67% 90.76% 9970ms 33.50% runtime.unlock—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/8899#issuecomment-202253563
BenchmarkLFChan
:
BenchmarkChan
:
@OneOfOne why is your code calling runtime.GOMAXPROCS constantly, your code should assume that value won't change.
I believe benchmarks in https://github.com/OneOfOne/lfchan/blob/master/chan_test.go are not benchmarking what you want. Operations like wg.Add
, wg.Done
and creating many goroutines probably limit scalability and skew results.
I think a better approach would be to use 1 reader and 1 writer goroutine and let RunParallel handle concurrency level for you. Also avoid using atomic operations and wait groups.
Also I want to point out that the following result:
BenchmarkLFChan 50000000 325 ns/op
BenchmarkLFChan-4 50000000 270 ns/op
shows that the benchmarked operation became 270*4/325=3.3
times slower.
For a lock free implementation, this code sure does spend a long time in runtime.lock
.
@jonhoo I fixed that, would you be kind to rerun the benchmarks, I simplified them as per @kostya-sh's suggestion and removed the GOMAXPROCS call.
@dvyukov pointed me to the right direction with his runtime.lock
remark.
➜ go test -bench=. -benchmem -run NONE -cpu 1,4,8 -benchtime 10s
BenchmarkLFChan 100000000 142 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-4 100000000 174 ns/op 8 B/op 1 allocs/op
BenchmarkLFChan-8 100000000 132 ns/op 8 B/op 1 allocs/op
BenchmarkChan 100000000 102 ns/op 8 B/op 1 allocs/op
BenchmarkChan-4 50000000 257 ns/op 8 B/op 1 allocs/op
BenchmarkChan-8 50000000 337 ns/op 8 B/op 1 allocs/op
PASS
ok github.com/OneOfOne/lfchan 86.061
This bug was originally about lock-free channels in the runtime.
All this recent discussion is about an external package. Can we move discussion to elsewhere on Github (e.g. https://github.com/OneOfOne/lfchan/issues) so I can remain subscribed to this bug for its original purpose?
Created an issue over at OneOfOne/lfchan#3. Will continue posting benchmark results there.
@OneOfOne The queue is still not FIFO after 2f9d3a217eadbc557e758649fa153dd59ff14c11 and len still can return -1. You can block and unblock goroutines without runtime support, sync.Mutex+Cond can block/unblock goroutines. You can stab runtime scheduler by implementing 2 functions on top of Mutex+Cond: park(g *G) and unpark(g *G).
@jonhoo how did you generate the plot?
@gorakhargosh: https://github.com/OneOfOne/lfchan/issues/3#issuecomment-203239057
@dvyukov ping?
After the recent fairness changes in channels, my algorithm does not apply per se. My algorithm does not have the fairness properties that the current code provide.
@dvyukov any plans to revisit this in future ?
No particular plans.
@dvyukov
What are the fairness properties this implementation is missing?
Other than that, is there anything else wrong with it?
@stjepang I've lost all context already. What implementation do you mean? Why do you think there are fairness problems?
@dvyukov In order to improve the performance of Go's channels, you designed almost lock-free channels and wrote this implementation. Now it's apparently abandonded, as you commented:
After the recent fairness changes in channels, my algorithm does not apply per se. My algorithm does not have the fairness properties that the current code provide.
I was wondering what are the fairness changes that make your algorithm inapplicable. Why exactly your algorithm didn't make it as the official Go channel implementation? What are the problems with it that still need to be solved?
I was wondering what are the fairness changes that make your algorithm inapplicable.
I mean this change:
https://go-review.googlesource.com/c/16740
Though, it does not mention fairness nor adds tests. So probably we can break the additional fairness guarantees that it provides.
Why exactly your algorithm didn't make it as the official Go channel implementation? What are the problems with it that still need to be solved?
It was constantly deprioritized over other runtime changes, thus constantly get outdated and I wasn't able to keep up updating it. As far as I remember it all worked. Now the major problem is rebasing it onto current runtime code. Also need to decide what to do with respect to the fairness change 16740.
@dvyukovI, I might be wrong but I noticed you have not been active like before. Is this as a result of your busy schedule or you lost the zeal as a result of multiple rejections of some of your proposal or working on a secret project we can't know about yet :)
@olekukonko working on a non-secret project you can know about: https://github.com/google/syzkaller
@dvyukov, the project looks Interesting, I will be experimenting with it and thanks for the prompt response. Regards.
Are there any plans to revisit this?
Most helpful comment
This bug was originally about lock-free channels in the runtime.
All this recent discussion is about an external package. Can we move discussion to elsewhere on Github (e.g. https://github.com/OneOfOne/lfchan/issues) so I can remain subscribed to this bug for its original purpose?