Ethminer: Error: clSetKernelArg CL_INVALID_KERNEL / corrupted double-linked list:

Created on 12 Aug 2017  Â·  22Comments  Â·  Source: ethereum-mining/ethminer

3c8ed8c85f653b476541fddf6fbdad29c92e1815 is the first bad commit
commit 3c8ed8c85f653b476541fddf6fbdad29c92e1815
Author: @chfast
Date: Wed Aug 2 21:56:32 2017

CLMiner: Disable kickoff / pause

It causes the following two errors (randomly) when running ethminer -G -M

Trial 4... 6640981
Trial 5... 6291456
  ✘  10:12:04|cl-0      OpenCL Error: clSetKernelArg -48
min/mean/max: 3495253/5941930/6990506 H/s
inner mean: 6640981 H/s

or

Trial 5... 6634346
*** Error in `/workspace/ethminer/dbuild/ethminer/ethminer': corrupted double-linked list ***
======= Backtrace: =========
#0  0x00007f1e3d9cc498 in raise () from /lib64/libc.so.6
#1  0x00007f1e3d9cd8ea in abort () from /lib64/libc.so.6
#2  0x00007f1e3da09241 in __libc_message () from /lib64/libc.so.6
#3  0x00007f1e3da0eac6 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f1e3da0fadf in _int_free () from /lib64/libc.so.6
#5  0x00007f1e30d0a73c in r600_buffer_transfer_unmap (ctx=<optimized out>, transfer=<optimized out>)
    at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/auxiliary/util/u_inlines.h:144
#6  0x00007f1e3eac55dd in clover::mapping::~mapping (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/core/resource.cpp:219
#7  0x00007f1e3eaa89f1 in _Function_handler<void(clover::event&), (anonymous namespace)::soft_copy_op(clover::command_queue&, T, const vector_t&, const vector_t&, S, const vector_t&, const vector_t&, const vector_t&) [with T = void*; S = clover::buffer*; (anonymous namespace)::vector_t = std::array<long unsigned int, 3ul>]::<lambda(clover::event&)> >::_M_invoke(const _Any_data &, event &) (__functor=..., __args#0=...)
    at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/api/transfer.cpp:247
#8  0x00007f1e3eab38f6 in clover::event::trigger (this=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/functional:2267
#9  0x00007f1e3eab49d9 in clover::hard_event::hard_event(clover::command_queue&, unsigned int, clover::ref_vector<clover::event> const&, std::function<void (clover::event&)>) (this=<optimized out>, q=..., command=<optimized out>, deps=..., action=...)
    at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/core/event.cpp:126
#10 0x00007f1e3eab158e in clover::intrusive_ref<clover::hard_event> clover::create<clover::hard_event, clover::command_queue&, int, clover::ref_vector<clover::event>&, std::function<void (clover::event&)> >(clover::command_queue&, int&&, clover::ref_vector<clover::event>&, std::function<void (clover::event&)>&&) ()
   from /usr/lib64/libOpenCL.so.1
#11 0x00007f1e3eaaa56f in clEnqueueReadBuffer (d_q=<optimized out>, d_mem=<optimized out>, blocking=<optimized out>, offset=<optimized out>,
    size=<optimized out>, ptr=<optimized out>,
    num_deps=<error reading variable: Could not find the frame base for "clEnqueueReadBuffer(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, cl_event const*, cl_event*)".>,
    d_deps=<error reading variable: Could not find the frame base for "clEnqueueReadBuffer(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, cl_event const*, cl_event*)".>,
    rd_ev=<error reading variable: Could not find the frame base for "clEnqueueReadBuffer(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, cl_event const*, cl_event*)".>) at /usr/src/debug/media-libs/mesa-9999/mesa-9999/src/gallium/state_trackers/clover/api/transfer.cpp:296
#12 0x00000000005991de in cl::CommandQueue::enqueueReadBuffer (this=0x8b4d78, buffer=..., blocking=1, offset=0, size=8, ptr=0x7f1e285cc5b0, events=0x0, event=0x0) at /workspace/ethminer/libethash-cl/CL/cl2.hpp:7040
        tmp = 0x7f1e285cc510
        err = 0
#13 0x0000000000594c91 in dev::eth::CLMiner::workLoop (this=0x8b4bd0) at /workspace/ethminer/libethash-cl/CLMiner.cpp:199
        w = {boundary = {m_data = {_M_elems = "\000\000\000\000\000\000\000\002", '\000' <repeats 23 times>}}, header = {m_data = {_M_elems = "P\310V\256C3M\252\306e\266 \334~C\235c4\265\222\313`\022R2\301\322\v\237!\264", <incomplete sequence \350>}}, seed = {m_data = {_M_elems = '\000' <repeats 31 times>}}, 
          startNonce = 0, exSizeBits = -1}
        results = {0, 4282384247}
        nonce = 0
        exit = false
        c_zero = 0
        startNonce = 16160213467296077422
        __PRETTY_FUNCTION__ = "virtual void dev::eth::CLMiner::workLoop()"
#14 0x00000000005662ae in dev::Worker::<lambda()>::operator()(void) const (__closure=0x8b4378) at /workspace/ethminer/libdevcore/Worker.cpp:55
        ex = dev::WorkerState::Starting
        ok = true
        this = 0x8b4bd0
#15 0x0000000000567828 in std::_Bind_simple<dev::Worker::startWorking()::<lambda()>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x8b4378)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/functional:1531
#16 0x000000000056777e in std::_Bind_simple<dev::Worker::startWorking()::<lambda()>()>::operator()(void) (this=0x8b4378)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/functional:1520
#17 0x000000000056770e in std::thread::_Impl<std::_Bind_simple<dev::Worker::startWorking()::<lambda()>()> >::_M_run(void) (this=0x8b4360)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/g++-v5/thread:115
#18 0x00007f1e3e3276d2 in std::execute_native_thread_routine_compat (__p=0x8b4360)
    at /var/tmp/portage/sys-devel/gcc-6.4.0/work/gcc-6.4.0/libstdc++-v3/src/c++11/thread.cc:110
#19 0x00007f1e3e8553a4 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f1e3da831ad in clone () from /lib64/libc.so.6

in order to bisect the issue, I had to use the new cl2.hpp, but it should not cause any interference.

Of course reverting this commit (without reverting the whole series) is not easily possible.

bug showstopper

Most helpful comment

@AntoniosHadji clSetKernelArg -49 is a different error, it means CL_INVALID_ARG_INDEX. You might want to open a new bug here.

All 22 comments

I have the same issue under macOS 10.12.6.

@chfast Could you have a look at it? Or at least give me some directions where to look at? Debugging this is kind of hard, without understanding the reasoning behind the change.

  ℹ  00:36:20|          Found suitable OpenCL device [ Intel(R) HD Graphics 630 ] with 1610612736  bytes of GPU memory
Benchmarking on platform: CL
Preparing DAG for block #0
 cl  00:36:20|cl-0      No work. Pause for 3 s.
Warming up...
 cl  00:36:23|cl-0      New work: header #50c856ae… target 0000000000000002000000000000000000000000000000000000000000000000
 cl  00:36:23|cl-0      New seed #00000000…
 cl  00:36:24|cl-0      Platform: Apple
 cl  00:36:24|cl-0      Device:   AMD Radeon Pro 560 Compute Engine  / OpenCL 1.2 
 cl  00:36:24|cl-0      Build info: <program source>:329:9: warning: comparison of integers of different signs: 'int' and 'const uint' (aka 'const unsigned int')
                if (i == thread_id)
      ~ ^  ~~~~~~~~~
<program source>:364:9: warning: comparison of integers of different signs: 'int' and 'const uint' (aka 'const unsigned int')
                if (i == thread_id)
      ~ ^  ~~~~~~~~~

 cl  00:36:24|cl-0      Creating light cache buffer, size 16776896
 cl  00:36:24|cl-0      Creating DAG buffer, size 1073739904
 cl  00:36:24|cl-0      Loading kernels
 cl  00:36:24|cl-0      Writing light cache buffer
 cl  00:36:24|cl-0      Creating buffer for header.
 cl  00:36:24|cl-0      Creating mining buffer
 cl  00:36:24|cl-0      Generating DAG
 cl  00:36:25|cl-0      DAG 0 %
 cl  00:36:25|cl-0      DAG 6 %
 cl  00:36:25|cl-0      DAG 12 %
 cl  00:36:26|cl-0      DAG 18 %
 cl  00:36:26|cl-0      DAG 25 %
 cl  00:36:27|cl-0      DAG 31 %
 cl  00:36:27|cl-0      DAG 37 %
 cl  00:36:27|cl-0      DAG 43 %
 cl  00:36:28|cl-0      DAG 50 %
 cl  00:36:28|cl-0      DAG 56 %
 cl  00:36:29|cl-0      DAG 62 %
 cl  00:36:29|cl-0      DAG 68 %
 cl  00:36:30|cl-0      DAG 75 %
 cl  00:36:30|cl-0      DAG 81 %
 cl  00:36:30|cl-0      DAG 87 %
 cl  00:36:31|cl-0      DAG 93 %
 cl  00:36:31|cl-0      Switch time 10420 ms / 7420705 us
Trial 1... 1648940
Trial 2... 1783470
Trial 3... 1783649
Trial 4... 1782935
Trial 5... 1782757
  ✘  00:36:51|cl-0      OpenCL Error: clSetKernelArg -48
min/mean/max: 1648940/1756350/1783649 H/s
inner mean: 1783054 H/s

I get a similar error when running ethminer -G -M
cl-0 OpenCL Error: clSetKernelArg -49

also if I try to connect to mining farm I only see:
cl 15:00:18|cl-0 No work. Pause for 3 s.
This is the only output. It never appears to get any work.

Ubuntu 16.04.03 AMD RX 570 GPU 8GB

@AntoniosHadji clSetKernelArg -49 is a different error, it means CL_INVALID_ARG_INDEX. You might want to open a new bug here.

@chfast Can you please have a look at this bug? It should be considered a major blocker for a 0.12.0 release...

@EoD I created new bug #346 for clSetKernelArg -49

I'm getting the -48 bug when compiling from source for Mac OS 10.13 (High Sierra):
... cl 18:12:17|cl-1 DAG 93 % cl 18:12:17|cl-1 Switch time 7575 ms / 4572163 us Trial 1... 24656329 Trial 2... 25702963 Trial 3... 25605738 Trial 4... 25695251 Trial 5... 25690112 ✘ 18:12:39|cl-1 OpenCL Error: clSetKernelArg -48 ✘ 18:12:39|cl-0 OpenCL Error: clSetKernelArg -48 min/mean/max: 24656329/2547

Should I open a bug on this as well?

@jeffrey-l-turner It seems you are hitting the same bug as we do originally.

yep -- not sure if I should refile a GitHub issue... It may be just a High Sierra anomaly?

I think I am not following you. Why do you want to file another bug of the same issue? Do you have a reason to believe that it is a different bug?

nope -- I wasn't sure if it was a "High Sierra" bug only... you answered the question. Thanks

Got the same error on an RX-480, had to use 0.11 for it to work

@chfast It might be that I mixed up two completely different bugs here. I can definitely confirm that in current master the ✘ 10:12:04|cl-0 OpenCL Error: clSetKernelArg -48 is not fixed. I am unable to reproduce the corrupted double-linked list for the moment.

Can you test recent RC builds?

Copying from #509:

I can reproduce the 2nd error here even with birdayz changes:

On commit 99d8674e900266dd59283b9c0ebf5082372b6fbf:

1049625
Trial 2... 9444267
Trial 3... 10644680
Trial 4... 11122691
Trial 5... 12381864
  ✘  00:41:01|cl-0      OpenCL Error: clSetKernelArg: CL_INVALID_KERNEL (-48)

I am seeing this as well in 0.13.0rc7. If there is anything I can do to help test, please let me know.

@bsletten A known workaround is to install the propertiery AMD OpenCL libraries (not the drivers!) and use them together with Clover. You will get even better performance than with Windows ;)
See here for more explanation on how it can be used.
Or download it from AMD.

But for the actual error CL_INVALID_KERNEL / -48 we might need to play a bit with the OpenCL kernel and see if some of the changes "fix" the issue here.

Not what I am going for, but thanks. I've done some initial spelunking in the code. I am pegged at the moment but in the coming weeks I'll do some more investigation. Happy to help test anything whenever. Thanks.

I can confirm that OpenCL Error: clSetKernelArg -48 is happening for me. But because it affects only benchmarking, I will not consider it [showstopper] any more.

Was this page helpful?
0 / 5 - 0 ratings