Ethminer: Fiji, big performance regression in 0.16

Created on 11 Oct 2018 · 46Comments · Source: ethereum-mining/ethminer

Describe the bug
Performance regression
ethminer 0.15 - 28MH/s
ethminer 0.16 - 5MH/s

To Reproduce
Compare 0.15 and 0.16 on Fiji hardware.

Expected behaviour
Performance should be on par or better when updating ethminer.

Desktop (please complete the following information):

Operating System: Debian Sid
Hardware: Sapphire Nitro R9 Fury
Ethminer Version: 0.15 / 0.16 / 0.16.1
Driver: ROCm 1.9 (same issue was observed using legacy driver aka. Orca as well)

Additional context

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2679.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx803
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 
  Driver Version                                  2679.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         Fiji [Radeon R9 FURY / NANO Series]
  Device Topology (AMD)                           PCI-E, 09:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               56
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1050MHz
  Graphics IP (AMD)                               8.3
  Device Partition                                (core)
    Max number of sub-devices                     56
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4294967296 (4GiB)
  Global free memory (AMD)                        4192256 (3.998GiB)
  Global memory channels (AMD)                    16
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           3650722201 (3.4GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             29440
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        3650722201 (3.4GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Number of P2P devices (AMD)                     0
  P2P devices (AMD)                               
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        0ns (Thu Jan  1 01:00:00 1970)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  No
    Number of async queues (AMD)                  8
    Max real-time compute queues (AMD)            8
    Max real-time compute units (AMD)             56
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx803
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx803
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx803

Source

Brisse89

Most helpful comment

~~Volatiles removed by pr #1737~~

Tried this with mixed results on 480 depending on BIOS version.

improves result with stock BIOS.
degrades result with mining BIOS.

Given that I can't explain these results, I've decided not to proceed.

jean-m-cyr on 1 Dec 2018

👍2

All 46 comments

Is it possible that with the rocm now the fiji processors have an issue with the atomics? Did you check kernel logs to see if any errors were present? rocm 1.9.x was it already installed when you ran 15 or did you upgrade to it as well as the latest version 16?

https://rocm.github.io/hardware.html

lesjokolat on 14 Oct 2018

No related errors in kern.log

ROCm 1.9 was already installed and has been working well with ethminer 0.15

Brisse89 on 14 Oct 2018

Same here.

ernstp on 18 Oct 2018

Same here, ROCm 1.9.1, Vega 64, hashrate < 4 MH/s.

uentity on 28 Oct 2018

Have you tried the --cl-only option?

chfast on 27 Nov 2018

Tried it just now, but doesn't make a difference. Still terrible performance on 0.16.1 but 0.15 works fine.

cl 18:50:58 cl-0     Platform: AMD Accelerated Parallel Processing
cl 18:50:58 cl-0     Device:   gfx803 / OpenCL 1.2 
 i 18:50:58 cl-0     Adjusting CL work multiplier for 56 CUs.Adjusted work multiplier: 101�945
cl 18:51:00 cl-0     OpenCL kernel
cl 18:51:00 cl-0     Creating light cache buffer, size: 44,250 MB
cl 18:51:00 cl-0     Creating DAG buffer, size: 2,766 GB, free: 1,191 GB
cl 18:51:00 cl-0     Loading kernels
cl 18:51:00 cl-0     Writing light cache buffer
cl 18:51:00 cl-0     Creating buffer for header.
cl 18:51:00 cl-0     Creating mining buffer
 m 18:51:03 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:08 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 i 18:51:11 cl-0     2,766 GB of DAG data generated in 10�513 ms.
 m 18:51:13 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:18 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:23 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:28 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:33 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:38 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:43 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:48 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:00
 i 18:51:51 stratum  Job: #9bbe974e… eu1.ethermine.org [172.65.207.106:5555]
 i 18:51:51 stratum  Job: #6a0af543… eu1.ethermine.org [172.65.207.106:5555]
 m 18:51:53 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:01
 i 18:51:55 stratum  Job: #efa6f38c… eu1.ethermine.org [172.65.207.106:5555]
 m 18:51:58 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:03 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:08 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:13 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:18 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:01
^C m 18:52:23 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:01
 i 18:52:23 ethminer Shutting down...
 i 18:52:23 ethminer Shutting down miners...
 i 18:52:23 main     Disconnected from eu1.ethermine.org [172.65.207.106:5555]
 i 18:52:28 ethminer Terminated!

Brisse89 on 27 Nov 2018

Any ideas @jean-m-cyr @ddobreff ?

chfast on 27 Nov 2018

I have no idea what fiji is???

jean-m-cyr on 27 Nov 2018

https://videocardz.net/gpu/amd-fiji/

hackmod on 27 Nov 2018

I have to plug my old fury and find out.

ddobreff on 27 Nov 2018

I did a git bisect hoping it might be helpful to you developers. I'm not 100% confident in the results because there were two revisions which would not build which I just assumed to be bad, but maybe this could be of a little help anyway.

85e433401b08e51111c367f44b241cbd61ae8489 is the first bad commit
commit 85e433401b08e51111c367f44b241cbd61ae8489
Author: AndreaLanfranchi <[email protected]>
Date:   Sat Jun 16 13:06:56 2018 +0200

    Amend MSVC warning for unreferenced variable

:040000 040000 72a293a9ce1e8f511eeee6f868beea8ae2c0b0ce 8dba80085970044eba9f01a9199587f1401162f4 M  libapicore

Brisse89 on 28 Nov 2018

API has nothing to do with hashing speed.
You must investigate in changes over libetash-cl

AndreaLanfranchi on 28 Nov 2018

@Brisse89 please retest with latest 0.17 and report.
0.16 is quite old now.

AndreaLanfranchi on 28 Nov 2018

Performance is still crippled in 0.17.0-rc.0

Brisse89 on 28 Nov 2018

git bisect is good approach to this (I wanted to suggest it), but the https://github.com/ethereum-mining/ethminer/commit/85e433401b08e51111c367f44b241cbd61ae8489 commit is definitely not a problem. Can you point other candidates?

chfast on 28 Nov 2018

I did another bisection with a more narrow target based on my previous findings, and this time there were no build errors so I didn't have to assume anything.

8ad03b0a301062b0e3d163b3b387c48c89df0f52 is the first bad commit
commit 8ad03b0a301062b0e3d163b3b387c48c89df0f52
Author: Jean Cyr <[email protected]>
Date:   Fri Jul 20 18:42:27 2018 -0400

    Binary kernels revisited

    - add gooburs addaptation of zawawa binary kernel source
    - add pre-compiled binary kernels
    - Load binary kernels from INSTALLDIR/kernels
    - Copy binary kernels to INSTALLDIR/kernels
    - delete redundant cl_finish call, blocking read syncs the loop.
    - minor DAG gen optimization

    x

:100644 100644 d35b26859d927c3499c82538addf0fb8ccadd3cf cdbc063bce2fc0aa0cf2df8030043f150e6c51a6 M  CMakeLists.txt
:040000 040000 50e21d916d552af521d49cd82b3dc86cf482622e 9b7ffa209e59c412a13878e3e83ab341ef40b245 M  ethminer
:000000 040000 0000000000000000000000000000000000000000 0d6b4721525ed3fb4ea1ebb6a542e9f06f84c6de A  kernels
:040000 040000 fc6b6a8620aa5a723d3afc8487729772760e81fc 4a11f6b6ba702a01f0a787a90db838e8651f31aa M  libethash-cl

With this version I see the following errors when running ethminer

 X 12:49:36 cl-0     OpenCL init failed: clSetKernelArg: CL_INVALID_ARG_SIZE (-51)
 X 12:49:36 cl-0     OpenCL Error: clEnqueueWriteBuffer: CL_INVALID_MEM_OBJECT (-38)

Brisse89 on 28 Nov 2018

https://github.com/ethereum-mining/ethminer/commit/8ad03b0a301062b0e3d163b3b387c48c89df0f52#commitcomment-29789865

chfast on 28 Nov 2018

Im not sure but these are some candidates between branch points v0.15.0 and v0.16.0 (not verified nor tested at all)

branch point - 2ce3f1a1b, 11d7e3c4c087 - Bump version: 0.15.0rc2 → 0.15.0
branch point - 07feecad0, 8d9674b68, fffc1bb1 - Bump version: 0.16.0rc1 → 0.16.0 (Mon Sep 17 12:54:30 2018 +0200)
commit b6284f58576a1234367c5f75a4689edbfa01309b
~~~
Author: Jean Cyr jean.m.cyr@gmail.com
Date: Tue Jul 31 12:43:31 2018 -0400

Improve opencl hash rate and reduce job switch time to 1 ms.
- All credit for this improvement goes to @sukharev. This is
  simply a complete implementation including binary kernels.
- Runs with very high global work multiplier to improve hash rate.
- Reduces job switch time to ~1 ms. Spend more time searching for
  solutions rather than stales.
- Extend hash smoothing interval for more representative hash rate
  ~
  commit 4b63c874750e42b79b976bb319c09c74477fc869
  ~
  Author: Jean Cyr jean.m.cyr@gmail.com
  Date: Tue Jul 24 12:52:40 2018 -0400
  Further minor CL optimizations
- Remove constant iteration parameter
- Update CL kernels to do single iteration (save instructions).
  ~~~
commit 2319a6280fa8aec7861752491d92f6dd840fca17
~
Author: Jean Cyr jean.m.cyr@gmail.com
Date: Tue Jul 24 00:00:54 2018 -0400
Implement noeval support in opencl kernel and AMD binary kernels
~
commit 6377efefa506578cab1f7cfaf46ed5f3890ef176
~~~
Author: Jean Cyr jean.m.cyr@gmail.com
Date: Mon Jul 23 15:20:14 2018 -0400
Make changes suggested in review.
- Also rename --cl-nobinary parameter to --cl-only.
- opencl (ethash.cl) noeval support functional.
  ~~~
commit 8918770138793b50c8e93adb3c085e3080fce37c
~~~
Author: Jean Cyr jean.m.cyr@gmail.com
Date: Sun Aug 26 13:58:25 2018 -0400

enable hash rate averaging for AMD
- Move averaging code to common function used by both CL and CUDA
- Make averaging based on GPU type
- CUDA doesn't need averaging, never do averaging for CUDA
Exponential averaging is governed by a constant called alpha.
Setting alpha to 1.0 results in no averaging.
~~~
commit d94896ec5d103d2e52a21499ae32eddec3c3aad6
~~~
Author: Jean Cyr jean.m.cyr@gmail.com
Date: Sat Aug 25 23:47:44 2018 -0400

Smooth AMD hash rate.
...
Reference:

https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average

~~~

hackmod on 28 Nov 2018

Sorry I'm a bit new to 'git bisect' but I just learned I could use the 'skip' command whenever I encounter an unrelated error. With this newfound knowledge, I've been able to pintpoint the culprit to one of the following commits.

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
ae9585e7fdb1fe9c477f5ac8e87022a05194d610
8cc147b02b9939c03416806353f11ad645ca40be
a720541b930073a58a397185adc8c55f23536e56
37db3f8f936956180335277e8bbed44d1bcf902a
f8516eb25087b1c83410217e06c26883fdd138d3
232b216de810b572981a740330f01ed8c5cc8375
2ce3f1a1b1131277f8d6ba8864048f4b7144f8c7
b5385037fecb26d79b5fd43f0c5973f72eb52969

ae9585e7fdb1fe9c477f5ac8e87022a05194d610
8cc147b02b9939c03416806353f11ad645ca40be
a720541b930073a58a397185adc8c55f23536e56 <- Unlikely since this is related to CUDA
37db3f8f936956180335277e8bbed44d1bcf902a <- Unlikely since this is related to CUDA
f8516eb25087b1c83410217e06c26883fdd138d3
232b216de810b572981a740330f01ed8c5cc8375
2ce3f1a1b1131277f8d6ba8864048f4b7144f8c7 <- Unlikely
b5385037fecb26d79b5fd43f0c5973f72eb52969 <- Unlikely

Brisse89 on 28 Nov 2018

This seems to be related to the introduction of binary kernels for AMD. Since there is no binary kernel for fiji, it would run the opencl kernel. I've no idea why the zawawa opencl kernel would be so much slower on fiji??? Might have something to do with the early abort logic.

jean-m-cyr on 28 Nov 2018

@Brisse89 I have a simple fix just remove "volatile" from the kernel file ethminer/libethash-cl/kernels/cl/ethash.cl. For some reason the updated kernel introduced in v0.16 declares a bunch of variables as volatile even though they are private. The output is also volatile but its updated using atomic_inc which should ensure coherency among the threads. This is only happening in rocm stack from my testing.

x3ccd4828 on 1 Dec 2018

👍2

~~Volatiles removed by pr #1737~~

Tried this with mixed results on 480 depending on BIOS version.

improves result with stock BIOS.
degrades result with mining BIOS.

Given that I can't explain these results, I've decided not to proceed.

jean-m-cyr on 1 Dec 2018

👍2

@x3ccd4828 @jean-m-cyr Thanks, this does restore and maybe even improve performance on the Fiji. I'm now seeing ~31MH/s @ 200W which is more than I've ever seen before. Ethminer 0.15 on ROCm yielded ~28MH/s @ 190W, or ~29MH/s @ 190W using AMD's legacy (aka. Orca) OpenCL driver. Sorry to hear about regressions for other GPU's. Obviously I understand why it can't be merged.

Brisse89 on 1 Dec 2018

@jean-m-cyr what driver are you using for the 480? I have tested a Rx580 (mining bios) with amdgpu-pro 18.40 as well as rocm 1.9.

x3ccd4828 on 1 Dec 2018

It's been a while since I installed amd-gpu-pro. Don't remember which version... Any way I can tell?

jean-m-cyr on 1 Dec 2018

On Ubuntu you can just check the installed package version: sudo apt list "amdgpu"

x3ccd4828 on 1 Dec 2018

jcyr@miner1:~/ethminer/build$ sudo apt list "amdgpu*"
Listing... Done
amdgpu-pro/unknown 17.40-492261 i386
amdgpu-pro-core/unknown,now 17.40-492261 all [installed,automatic]
amdgpu-pro-dkms/unknown,now 17.40-492261 all [installed]
amdgpu-pro-lib32/unknown,now 17.40-492261 amd64 [installed]

jean-m-cyr on 1 Dec 2018

With current 480 mining BIOS running opencl (--cl-nobin) I get about 28MH/s, with same opencl with volatiles removed) I get 25MH/s. Strange!!! I can't think of why that would be?

Binary kernels run at 29.5MH/s on same config.

jean-m-cyr on 1 Dec 2018

I would recommend updating the driver to either the 18.40 or the rocm stack. I think 17.40 was the old beta blockchain driver. I i haven't tested the old 17.40 driver.

x3ccd4828 on 1 Dec 2018

it won’t compile on 18.10+ also rocm requires pci atomics (3.0) for polaris.

ddobreff on 1 Dec 2018

@ddobreff Do you mean ROCm is not compiling? No need to build from source. AMD's pre compiled release for Xenial works fine for me on Debian Sid and most likely does on Ubuntu 18.10 as well. All you need is kernel > 4.17 which has the necessary kernel components up-streamed and Ubuntu 18.10 ships with 4.18 so that should be no problem. Install rocm-opencl instead of rocm-dkms since the latter is not needed on Linux > 4.17.

Brisse89 on 1 Dec 2018

ROCm opencl for non Vega requires PCIe atomics on PCIe 3.0 compliant slot.
OpenCL legacy and PAL compilers are broken after 18.10+ versions, they produce invalid asm so it will not compile properly leading to non working kernel.

ddobreff on 1 Dec 2018

👍1

It should be possible to conditionally compile without volatile for older GPUs, right?

chfast on 1 Dec 2018

volatile should not be required for ANY gpu! Pls. don't take my odd results as a reason not to make this change. It would not affect 480/580 GPUs who would naturally be using binary kernels anyway...

jean-m-cyr on 1 Dec 2018

cl 16:53:45 cl-0     Using PciId : 01:00.0 Ellesmere OpenCL 1.2 AMD-APP (2482.3) Memory : 3.99 GB

stock ethash.cl

 m 16:54:45 ethminer Speed 28.93 Mh/s gpu0 28.93 Time: 00:01

ethash.cl with volatile removed

 m 16:58:01 ethminer Speed 25.02 Mh/s gpu0 25.02 Time: 00:01

???

jean-m-cyr on 1 Dec 2018

Weird, but overall the situation is still better because this regression is much less severe than the one affecting the Fiji and Vega, and like you said, Polaris would be using the binary kernels anyway so in reality they would not encounter the regression.

Brisse89 on 2 Dec 2018

@ddobreff I believe you mentioned it makes no diff on Vega?

jean-m-cyr on 2 Dec 2018

Yes, there was absolutely no difference in Vega on standard amdgpu-pro opencl(18.10 compiler).
EDIT: volatile removing should be ok.

ddobreff on 2 Dec 2018

There was a comment above suggesting that Vega on ROCm was affected. No confirmation on whether removing volatile fixed it in that case though.

@uentity Would be great if you could test the Vega and report your hashrate.

Brisse89 on 2 Dec 2018

I have a similar issue. When running the benchmark I get something like this

ethminer/ethminer -M 2 -G


ethminer 0.18.0-alpha.1-18+commit.8294506b.dirty
Build: linux/release/gnu

 m 11:22:26 ethminer Benchmarking on platform: CL Preparing DAG for block #2
cl 11:22:26 cl-0     Using PciId : 45:00.0 gfx900 OpenCL 1.2  Memory : 15.98 GB
 i 11:22:26 cl-0     Adjusting CL work multiplier for 64 CUs.Adjusted work multiplier: 116509
 m 11:22:26 ethminer Warming up...
cl 11:22:26 cl-0     Generating DAG + Light : 1.02 GB
cl 11:22:26 cl-0     OpenCL kernel
cl 11:22:26 cl-0     Loading binary kernel /home/user_name/mine/ethminer/build/ethminer/kernels/ethash_gfx900_lws192.bin
 X 11:22:26 cl-0     Failed to load binary kernel: /home/user_name/mine/ethminer/build/ethminer/kernels/ethash_gfx900_lws192.bin
 X 11:22:26 cl-0     Falling back to OpenCL kernel...
cl 11:22:26 cl-0     Creating light cache buffer, size: 16.00 MB
cl 11:22:26 cl-0     Creating DAG buffer, size: 1024.00 MB, free: 14.97 GB
cl 11:22:27 cl-0     Loading kernels
cl 11:22:27 cl-0     Writing light cache buffer
cl 11:22:27 cl-0     Creating buffer for header.
cl 11:22:27 cl-0     Creating mining buffer
cl 11:22:28 cl-0     1024.00 MB of DAG data generated in 2033 ms.
 m 11:22:41 ethminer Trial 1... 
 m 11:22:44 ethminer Hashes per second 3611658
 m 11:22:44 ethminer Trial 2... 
 m 11:22:47 ethminer Hashes per second 3598562
 m 11:22:47 ethminer Trial 3... 
 m 11:22:50 ethminer Hashes per second 3598562
 m 11:22:50 ethminer Trial 4... 
 m 11:22:53 ethminer Hashes per second 3614558
 m 11:22:53 ethminer Trial 5... 
 m 11:22:56 ethminer Hashes per second 3618185
 m 11:22:56 ethminer min/mean/max: 3598562/3608305/3618185 H/s
 m 11:22:56 ethminer inner mean: 3608259 H/s

In my case I have a Vega FE that lists it at gfx 900 which doesn't match the available binary kernels here https://github.com/ethereum-mining/ethminer/tree/master/libethash-cl/kernels/bin I tried to just rename them to gfx900 but that causes some ELF loading error instead so I guess that may not work.

using 0.15 seems to work as expected (getting ~35 MH/s) as others has reported