3c8ed8c85f653b476541fddf6fbdad29c92e1815 is the first bad commit
commit 3c8ed8c85f653b476541fddf6fbdad29c92e1815
Author: @chfast
Date: Wed Aug 2 21:56:32 2017
CLMiner: Disable kickoff / pause
It causes the following two errors (randomly) when running ethminer -G -M
Trial 4... 6640981
Trial 5... 6291456
✘ 10:12:04|cl-0 OpenCL Error: clSetKernelArg -48
min/mean/max: 3495253/5941930/6990506 H/s
inner mean: 6640981 H/s
or
Trial 5... 6634346
*** Error in `/workspace/ethminer/dbuild/ethminer/ethminer': corrupted double-linked list ***
======= Backtrace: =========
#0 0x00007f1e3d9cc498 in raise () from /lib64/libc.so.6
#1 0x00007f1e3d9cd8ea in abort () from /lib64/libc.so.6
#2 0x00007f1e3da09241 in __libc_message () from /lib64/libc.so.6
#3 0x00007f1e3da0eac6 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f1e3da0fadf in _int_free () from /lib64/libc.so.6
#5 0x00007f1e30d0a73c in r600_buffer_transfer_unmap (ctx=<optimized out>, transfer=<optimized out>)
at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/auxiliary/util/u_inlines.h:144
#6 0x00007f1e3eac55dd in clover::mapping::~mapping (this=<optimized out>, __in_chrg=<optimized out>)
at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/core/resource.cpp:219
#7 0x00007f1e3eaa89f1 in _Function_handler<void(clover::event&), (anonymous namespace)::soft_copy_op(clover::command_queue&, T, const vector_t&, const vector_t&, S, const vector_t&, const vector_t&, const vector_t&) [with T = void*; S = clover::buffer*; (anonymous namespace)::vector_t = std::array<long unsigned int, 3ul>]::<lambda(clover::event&)> >::_M_invoke(const _Any_data &, event &) (__functor=..., __args#0=...)
at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/api/transfer.cpp:247
#8 0x00007f1e3eab38f6 in clover::event::trigger (this=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/functional:2267
#9 0x00007f1e3eab49d9 in clover::hard_event::hard_event(clover::command_queue&, unsigned int, clover::ref_vector<clover::event> const&, std::function<void (clover::event&)>) (this=<optimized out>, q=..., command=<optimized out>, deps=..., action=...)
at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/core/event.cpp:126
#10 0x00007f1e3eab158e in clover::intrusive_ref<clover::hard_event> clover::create<clover::hard_event, clover::command_queue&, int, clover::ref_vector<clover::event>&, std::function<void (clover::event&)> >(clover::command_queue&, int&&, clover::ref_vector<clover::event>&, std::function<void (clover::event&)>&&) ()
from /usr/lib64/libOpenCL.so.1
#11 0x00007f1e3eaaa56f in clEnqueueReadBuffer (d_q=<optimized out>, d_mem=<optimized out>, blocking=<optimized out>, offset=<optimized out>,
size=<optimized out>, ptr=<optimized out>,
num_deps=<error reading variable: Could not find the frame base for "clEnqueueReadBuffer(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, cl_event const*, cl_event*)".>,
d_deps=<error reading variable: Could not find the frame base for "clEnqueueReadBuffer(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, cl_event const*, cl_event*)".>,
rd_ev=<error reading variable: Could not find the frame base for "clEnqueueReadBuffer(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, cl_event const*, cl_event*)".>) at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/api/transfer.cpp:296
#12 0x00000000005991de in cl::CommandQueue::enqueueReadBuffer (this=0x8b4d78, buffer=..., blocking=1, offset=0, size=8, ptr=0x7f1e285cc5b0, events=0x0, event=0x0) at /workspace/ethminer/libethash-cl/CL/cl2.hpp:7040
tmp = 0x7f1e285cc510
err = 0
#13 0x0000000000594c91 in dev::eth::CLMiner::workLoop (this=0x8b4bd0) at /workspace/ethminer/libethash-cl/CLMiner.cpp:199
w = {boundary = {m_data = {_M_elems = "\000\000\000\000\000\000\000\002", '\000' <repeats 23 times>}}, header = {m_data = {_M_elems = "P\310V\256C3M\252\306e\266 \334~C\235c4\265\222\313`\022R2\301\322\v\237!\264", <incomplete sequence \350>}}, seed = {m_data = {_M_elems = '\000' <repeats 31 times>}},
startNonce = 0, exSizeBits = -1}
results = {0, 4282384247}
nonce = 0
exit = false
c_zero = 0
startNonce = 16160213467296077422
__PRETTY_FUNCTION__ = "virtual void dev::eth::CLMiner::workLoop()"
#14 0x00000000005662ae in dev::Worker::<lambda()>::operator()(void) const (__closure=0x8b4378) at /workspace/ethminer/libdevcore/Worker.cpp:55
ex = dev::WorkerState::Starting
ok = true
this = 0x8b4bd0
#15 0x0000000000567828 in std::_Bind_simple<dev::Worker::startWorking()::<lambda()>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x8b4378)
at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/functional:1531
#16 0x000000000056777e in std::_Bind_simple<dev::Worker::startWorking()::<lambda()>()>::operator()(void) (this=0x8b4378)
at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/functional:1520
#17 0x000000000056770e in std::thread::_Impl<std::_Bind_simple<dev::Worker::startWorking()::<lambda()>()> >::_M_run(void) (this=0x8b4360)
at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/thread:115
#18 0x00007f1e3e3276d2 in std::execute_native_thread_routine_compat (__p=0x8b4360)
at /var/tmp/portage/sys-devel/gcc-6.4.0/work/gcc-6.4.0/libstdc++-v3/src/c++11/thread.cc:110
#19 0x00007f1e3e8553a4 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f1e3da831ad in clone () from /lib64/libc.so.6
in order to bisect the issue, I had to use the new cl2.hpp, but it should not cause any interference.
Of course reverting this commit (without reverting the whole series) is not easily possible.
I have the same issue under macOS 10.12.6.
@chfast Could you have a look at it? Or at least give me some directions where to look at? Debugging this is kind of hard, without understanding the reasoning behind the change.
ℹ 00:36:20| Found suitable OpenCL device [ Intel(R) HD Graphics 630 ] with 1610612736 bytes of GPU memory
Benchmarking on platform: CL
Preparing DAG for block #0
cl 00:36:20|cl-0 No work. Pause for 3 s.
Warming up...
cl 00:36:23|cl-0 New work: header #50c856ae… target 0000000000000002000000000000000000000000000000000000000000000000
cl 00:36:23|cl-0 New seed #00000000…
cl 00:36:24|cl-0 Platform: Apple
cl 00:36:24|cl-0 Device: AMD Radeon Pro 560 Compute Engine / OpenCL 1.2
cl 00:36:24|cl-0 Build info: <program source>:329:9: warning: comparison of integers of different signs: 'int' and 'const uint' (aka 'const unsigned int')
if (i == thread_id)
~ ^ ~~~~~~~~~
<program source>:364:9: warning: comparison of integers of different signs: 'int' and 'const uint' (aka 'const unsigned int')
if (i == thread_id)
~ ^ ~~~~~~~~~
cl 00:36:24|cl-0 Creating light cache buffer, size 16776896
cl 00:36:24|cl-0 Creating DAG buffer, size 1073739904
cl 00:36:24|cl-0 Loading kernels
cl 00:36:24|cl-0 Writing light cache buffer
cl 00:36:24|cl-0 Creating buffer for header.
cl 00:36:24|cl-0 Creating mining buffer
cl 00:36:24|cl-0 Generating DAG
cl 00:36:25|cl-0 DAG 0 %
cl 00:36:25|cl-0 DAG 6 %
cl 00:36:25|cl-0 DAG 12 %
cl 00:36:26|cl-0 DAG 18 %
cl 00:36:26|cl-0 DAG 25 %
cl 00:36:27|cl-0 DAG 31 %
cl 00:36:27|cl-0 DAG 37 %
cl 00:36:27|cl-0 DAG 43 %
cl 00:36:28|cl-0 DAG 50 %
cl 00:36:28|cl-0 DAG 56 %
cl 00:36:29|cl-0 DAG 62 %
cl 00:36:29|cl-0 DAG 68 %
cl 00:36:30|cl-0 DAG 75 %
cl 00:36:30|cl-0 DAG 81 %
cl 00:36:30|cl-0 DAG 87 %
cl 00:36:31|cl-0 DAG 93 %
cl 00:36:31|cl-0 Switch time 10420 ms / 7420705 us
Trial 1... 1648940
Trial 2... 1783470
Trial 3... 1783649
Trial 4... 1782935
Trial 5... 1782757
✘ 00:36:51|cl-0 OpenCL Error: clSetKernelArg -48
min/mean/max: 1648940/1756350/1783649 H/s
inner mean: 1783054 H/s
I get a similar error when running ethminer -G -M
cl-0 OpenCL Error: clSetKernelArg -49
also if I try to connect to mining farm I only see:
cl 15:00:18|cl-0 No work. Pause for 3 s.
This is the only output. It never appears to get any work.
Ubuntu 16.04.03 AMD RX 570 GPU 8GB
@AntoniosHadji clSetKernelArg -49 is a different error, it means CL_INVALID_ARG_INDEX. You might want to open a new bug here.
@chfast Can you please have a look at this bug? It should be considered a major blocker for a 0.12.0 release...
@EoD I created new bug #346 for clSetKernelArg -49
I'm getting the -48 bug when compiling from source for Mac OS 10.13 (High Sierra):
...
cl 18:12:17|cl-1 DAG 93 %
cl 18:12:17|cl-1 Switch time 7575 ms / 4572163 us
Trial 1... 24656329
Trial 2... 25702963
Trial 3... 25605738
Trial 4... 25695251
Trial 5... 25690112
✘ 18:12:39|cl-1 OpenCL Error: clSetKernelArg -48
✘ 18:12:39|cl-0 OpenCL Error: clSetKernelArg -48
min/mean/max: 24656329/2547
Should I open a bug on this as well?
@jeffrey-l-turner It seems you are hitting the same bug as we do originally.
yep -- not sure if I should refile a GitHub issue... It may be just a High Sierra anomaly?
I think I am not following you. Why do you want to file another bug of the same issue? Do you have a reason to believe that it is a different bug?
nope -- I wasn't sure if it was a "High Sierra" bug only... you answered the question. Thanks
Got the same error on an RX-480, had to use 0.11 for it to work
@chfast It might be that I mixed up two completely different bugs here. I can definitely confirm that in current master the ✘ 10:12:04|cl-0 OpenCL Error: clSetKernelArg -48 is not fixed. I am unable to reproduce the corrupted double-linked list for the moment.
Can you test recent RC builds?
Copying from #509:
I can reproduce the 2nd error here even with birdayz changes:
On commit 99d8674e900266dd59283b9c0ebf5082372b6fbf:
1049625
Trial 2... 9444267
Trial 3... 10644680
Trial 4... 11122691
Trial 5... 12381864
✘ 00:41:01|cl-0 OpenCL Error: clSetKernelArg: CL_INVALID_KERNEL (-48)
I am seeing this as well in 0.13.0rc7. If there is anything I can do to help test, please let me know.
@bsletten A known workaround is to install the propertiery AMD OpenCL libraries (not the drivers!) and use them together with Clover. You will get even better performance than with Windows ;)
See here for more explanation on how it can be used.
Or download it from AMD.
But for the actual error CL_INVALID_KERNEL / -48 we might need to play a bit with the OpenCL kernel and see if some of the changes "fix" the issue here.
Not what I am going for, but thanks. I've done some initial spelunking in the code. I am pegged at the moment but in the coming weeks I'll do some more investigation. Happy to help test anything whenever. Thanks.
I can confirm that OpenCL Error: clSetKernelArg -48 is happening for me. But because it affects only benchmarking, I will not consider it [showstopper] any more.
Most helpful comment
@AntoniosHadji
clSetKernelArg -49is a different error, it meansCL_INVALID_ARG_INDEX. You might want to open a new bug here.