Ethminer: Fiji, big performance regression in 0.16

Created on 11 Oct 2018  Â·  46Comments  Â·  Source: ethereum-mining/ethminer

Describe the bug
Performance regression
ethminer 0.15 - 28MH/s
ethminer 0.16 - 5MH/s

To Reproduce
Compare 0.15 and 0.16 on Fiji hardware.

Expected behaviour
Performance should be on par or better when updating ethminer.

Desktop (please complete the following information):

  • Operating System: Debian Sid
  • Hardware: Sapphire Nitro R9 Fury
  • Ethminer Version: 0.15 / 0.16 / 0.16.1
  • Driver: ROCm 1.9 (same issue was observed using legacy driver aka. Orca as well)

Additional context

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2679.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx803
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 
  Driver Version                                  2679.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         Fiji [Radeon R9 FURY / NANO Series]
  Device Topology (AMD)                           PCI-E, 09:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               56
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1050MHz
  Graphics IP (AMD)                               8.3
  Device Partition                                (core)
    Max number of sub-devices                     56
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4294967296 (4GiB)
  Global free memory (AMD)                        4192256 (3.998GiB)
  Global memory channels (AMD)                    16
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           3650722201 (3.4GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             29440
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        3650722201 (3.4GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Number of P2P devices (AMD)                     0
  P2P devices (AMD)                               
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        0ns (Thu Jan  1 01:00:00 1970)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  No
    Number of async queues (AMD)                  8
    Max real-time compute queues (AMD)            8
    Max real-time compute units (AMD)             56
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx803
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx803
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx803


Most helpful comment

Volatiles removed by pr #1737

Tried this with mixed results on 480 depending on BIOS version.

  • improves result with stock BIOS.
  • degrades result with mining BIOS.

Given that I can't explain these results, I've decided not to proceed.

All 46 comments

Is it possible that with the rocm now the fiji processors have an issue with the atomics? Did you check kernel logs to see if any errors were present? rocm 1.9.x was it already installed when you ran 15 or did you upgrade to it as well as the latest version 16?

https://rocm.github.io/hardware.html

No related errors in kern.log

ROCm 1.9 was already installed and has been working well with ethminer 0.15

Same here.

Same here, ROCm 1.9.1, Vega 64, hashrate < 4 MH/s.

Have you tried the --cl-only option?

Tried it just now, but doesn't make a difference. Still terrible performance on 0.16.1 but 0.15 works fine.

cl 18:50:58 cl-0     Platform: AMD Accelerated Parallel Processing
cl 18:50:58 cl-0     Device:   gfx803 / OpenCL 1.2 
 i 18:50:58 cl-0     Adjusting CL work multiplier for 56 CUs.Adjusted work multiplier: 101�945
cl 18:51:00 cl-0     OpenCL kernel
cl 18:51:00 cl-0     Creating light cache buffer, size: 44,250 MB
cl 18:51:00 cl-0     Creating DAG buffer, size: 2,766 GB, free: 1,191 GB
cl 18:51:00 cl-0     Loading kernels
cl 18:51:00 cl-0     Writing light cache buffer
cl 18:51:00 cl-0     Creating buffer for header.
cl 18:51:00 cl-0     Creating mining buffer
 m 18:51:03 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:08 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 i 18:51:11 cl-0     2,766 GB of DAG data generated in 10�513 ms.
 m 18:51:13 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:18 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:23 ethminer Speed 0,00 Mh/s gpu0 0,00 [A0] Time: 00:00
 m 18:51:28 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:33 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:38 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:43 ethminer Speed 1,97 Mh/s gpu0 1,97 [A0] Time: 00:00
 m 18:51:48 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:00
 i 18:51:51 stratum  Job: #9bbe974e… eu1.ethermine.org [172.65.207.106:5555]
 i 18:51:51 stratum  Job: #6a0af543… eu1.ethermine.org [172.65.207.106:5555]
 m 18:51:53 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:01
 i 18:51:55 stratum  Job: #efa6f38c… eu1.ethermine.org [172.65.207.106:5555]
 m 18:51:58 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:03 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:08 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:13 ethminer Speed 4,08 Mh/s gpu0 4,08 [A0] Time: 00:01
 m 18:52:18 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:01
^C m 18:52:23 ethminer Speed 4,09 Mh/s gpu0 4,09 [A0] Time: 00:01
 i 18:52:23 ethminer Shutting down...
 i 18:52:23 ethminer Shutting down miners...
 i 18:52:23 main     Disconnected from eu1.ethermine.org [172.65.207.106:5555]
 i 18:52:28 ethminer Terminated!

Any ideas @jean-m-cyr @ddobreff ?

I have no idea what fiji is???

I have to plug my old fury and find out.

I did a git bisect hoping it might be helpful to you developers. I'm not 100% confident in the results because there were two revisions which would not build which I just assumed to be bad, but maybe this could be of a little help anyway.

85e433401b08e51111c367f44b241cbd61ae8489 is the first bad commit
commit 85e433401b08e51111c367f44b241cbd61ae8489
Author: AndreaLanfranchi <[email protected]>
Date:   Sat Jun 16 13:06:56 2018 +0200

    Amend MSVC warning for unreferenced variable

:040000 040000 72a293a9ce1e8f511eeee6f868beea8ae2c0b0ce 8dba80085970044eba9f01a9199587f1401162f4 M  libapicore

API has nothing to do with hashing speed.
You must investigate in changes over libetash-cl

@Brisse89 please retest with latest 0.17 and report.
0.16 is quite old now.

Performance is still crippled in 0.17.0-rc.0

git bisect is good approach to this (I wanted to suggest it), but the https://github.com/ethereum-mining/ethminer/commit/85e433401b08e51111c367f44b241cbd61ae8489 commit is definitely not a problem. Can you point other candidates?

I did another bisection with a more narrow target based on my previous findings, and this time there were no build errors so I didn't have to assume anything.

8ad03b0a301062b0e3d163b3b387c48c89df0f52 is the first bad commit
commit 8ad03b0a301062b0e3d163b3b387c48c89df0f52
Author: Jean Cyr <[email protected]>
Date:   Fri Jul 20 18:42:27 2018 -0400

    Binary kernels revisited

    - add gooburs addaptation of zawawa binary kernel source
    - add pre-compiled binary kernels
    - Load binary kernels from INSTALLDIR/kernels
    - Copy binary kernels to INSTALLDIR/kernels
    - delete redundant cl_finish call, blocking read syncs the loop.
    - minor DAG gen optimization

    x

:100644 100644 d35b26859d927c3499c82538addf0fb8ccadd3cf cdbc063bce2fc0aa0cf2df8030043f150e6c51a6 M  CMakeLists.txt
:040000 040000 50e21d916d552af521d49cd82b3dc86cf482622e 9b7ffa209e59c412a13878e3e83ab341ef40b245 M  ethminer
:000000 040000 0000000000000000000000000000000000000000 0d6b4721525ed3fb4ea1ebb6a542e9f06f84c6de A  kernels
:040000 040000 fc6b6a8620aa5a723d3afc8487729772760e81fc 4a11f6b6ba702a01f0a787a90db838e8651f31aa M  libethash-cl

With this version I see the following errors when running ethminer

 X 12:49:36 cl-0     OpenCL init failed: clSetKernelArg: CL_INVALID_ARG_SIZE (-51)
 X 12:49:36 cl-0     OpenCL Error: clEnqueueWriteBuffer: CL_INVALID_MEM_OBJECT (-38)

Im not sure but these are some candidates between branch points v0.15.0 and v0.16.0 (not verified nor tested at all)

  • branch point - 2ce3f1a1b, 11d7e3c4c087 - Bump version: 0.15.0rc2 → 0.15.0
  • branch point - 07feecad0, 8d9674b68, fffc1bb1 - Bump version: 0.16.0rc1 → 0.16.0 (Mon Sep 17 12:54:30 2018 +0200)

  • commit b6284f58576a1234367c5f75a4689edbfa01309b
    ~~~
    Author: Jean Cyr jean.m.cyr@gmail.com
    Date: Tue Jul 31 12:43:31 2018 -0400

    Improve opencl hash rate and reduce job switch time to 1 ms.

    • All credit for this improvement goes to @sukharev. This is
      simply a complete implementation including binary kernels.

    • Runs with very high global work multiplier to improve hash rate.

    • Reduces job switch time to ~1 ms. Spend more time searching for
      solutions rather than stales.

    • Extend hash smoothing interval for more representative hash rate
      ~
      commit 4b63c874750e42b79b976bb319c09c74477fc869
      ~

      Author: Jean Cyr jean.m.cyr@gmail.com
      Date: Tue Jul 24 12:52:40 2018 -0400
      Further minor CL optimizations

    • Remove constant iteration parameter

    • Update CL kernels to do single iteration (save instructions).
      ~~~
  • commit 2319a6280fa8aec7861752491d92f6dd840fca17
    ~
    Author: Jean Cyr jean.m.cyr@gmail.com
    Date: Tue Jul 24 00:00:54 2018 -0400
    Implement noeval support in opencl kernel and AMD binary kernels
    ~
  • commit 6377efefa506578cab1f7cfaf46ed5f3890ef176
    ~~~
    Author: Jean Cyr jean.m.cyr@gmail.com
    Date: Mon Jul 23 15:20:14 2018 -0400
    Make changes suggested in review.

    • Also rename --cl-nobinary parameter to --cl-only.
    • opencl (ethash.cl) noeval support functional.
      ~~~
  • commit 8918770138793b50c8e93adb3c085e3080fce37c
    ~~~
    Author: Jean Cyr jean.m.cyr@gmail.com
    Date: Sun Aug 26 13:58:25 2018 -0400

    enable hash rate averaging for AMD

    • Move averaging code to common function used by both CL and CUDA
    • Make averaging based on GPU type
    • CUDA doesn't need averaging, never do averaging for CUDA

    Exponential averaging is governed by a constant called alpha.
    Setting alpha to 1.0 results in no averaging.
    ~~~

  • commit d94896ec5d103d2e52a21499ae32eddec3c3aad6
    ~~~
    Author: Jean Cyr jean.m.cyr@gmail.com
    Date: Sat Aug 25 23:47:44 2018 -0400

    Smooth AMD hash rate.
    ...
    Reference:

    https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average

    ~~~


Sorry I'm a bit new to 'git bisect' but I just learned I could use the 'skip' command whenever I encounter an unrelated error. With this newfound knowledge, I've been able to pintpoint the culprit to one of the following commits.

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
ae9585e7fdb1fe9c477f5ac8e87022a05194d610
8cc147b02b9939c03416806353f11ad645ca40be
a720541b930073a58a397185adc8c55f23536e56
37db3f8f936956180335277e8bbed44d1bcf902a
f8516eb25087b1c83410217e06c26883fdd138d3
232b216de810b572981a740330f01ed8c5cc8375
2ce3f1a1b1131277f8d6ba8864048f4b7144f8c7
b5385037fecb26d79b5fd43f0c5973f72eb52969

  • ae9585e7fdb1fe9c477f5ac8e87022a05194d610
  • 8cc147b02b9939c03416806353f11ad645ca40be
  • a720541b930073a58a397185adc8c55f23536e56 <- Unlikely since this is related to CUDA
  • 37db3f8f936956180335277e8bbed44d1bcf902a <- Unlikely since this is related to CUDA
  • f8516eb25087b1c83410217e06c26883fdd138d3
  • 232b216de810b572981a740330f01ed8c5cc8375
  • 2ce3f1a1b1131277f8d6ba8864048f4b7144f8c7 <- Unlikely
  • b5385037fecb26d79b5fd43f0c5973f72eb52969 <- Unlikely

This seems to be related to the introduction of binary kernels for AMD. Since there is no binary kernel for fiji, it would run the opencl kernel. I've no idea why the zawawa opencl kernel would be so much slower on fiji??? Might have something to do with the early abort logic.

@Brisse89 I have a simple fix just remove "volatile" from the kernel file ethminer/libethash-cl/kernels/cl/ethash.cl. For some reason the updated kernel introduced in v0.16 declares a bunch of variables as volatile even though they are private. The output is also volatile but its updated using atomic_inc which should ensure coherency among the threads. This is only happening in rocm stack from my testing.

Volatiles removed by pr #1737

Tried this with mixed results on 480 depending on BIOS version.

  • improves result with stock BIOS.
  • degrades result with mining BIOS.

Given that I can't explain these results, I've decided not to proceed.

@x3ccd4828 @jean-m-cyr Thanks, this does restore and maybe even improve performance on the Fiji. I'm now seeing ~31MH/s @ 200W which is more than I've ever seen before. Ethminer 0.15 on ROCm yielded ~28MH/s @ 190W, or ~29MH/s @ 190W using AMD's legacy (aka. Orca) OpenCL driver. Sorry to hear about regressions for other GPU's. Obviously I understand why it can't be merged.

@jean-m-cyr what driver are you using for the 480? I have tested a Rx580 (mining bios) with amdgpu-pro 18.40 as well as rocm 1.9.

It's been a while since I installed amd-gpu-pro. Don't remember which version... Any way I can tell?

On Ubuntu you can just check the installed package version: sudo apt list "amdgpu"

jcyr@miner1:~/ethminer/build$ sudo apt list "amdgpu*"
Listing... Done
amdgpu-pro/unknown 17.40-492261 i386
amdgpu-pro-core/unknown,now 17.40-492261 all [installed,automatic]
amdgpu-pro-dkms/unknown,now 17.40-492261 all [installed]
amdgpu-pro-lib32/unknown,now 17.40-492261 amd64 [installed]

With current 480 mining BIOS running opencl (--cl-nobin) I get about 28MH/s, with same opencl with volatiles removed) I get 25MH/s. Strange!!! I can't think of why that would be?

Binary kernels run at 29.5MH/s on same config.

I would recommend updating the driver to either the 18.40 or the rocm stack. I think 17.40 was the old beta blockchain driver. I i haven't tested the old 17.40 driver.

it won’t compile on 18.10+ also rocm requires pci atomics (3.0) for polaris.

@ddobreff Do you mean ROCm is not compiling? No need to build from source. AMD's pre compiled release for Xenial works fine for me on Debian Sid and most likely does on Ubuntu 18.10 as well. All you need is kernel > 4.17 which has the necessary kernel components up-streamed and Ubuntu 18.10 ships with 4.18 so that should be no problem. Install rocm-opencl instead of rocm-dkms since the latter is not needed on Linux > 4.17.

ROCm opencl for non Vega requires PCIe atomics on PCIe 3.0 compliant slot.
OpenCL legacy and PAL compilers are broken after 18.10+ versions, they produce invalid asm so it will not compile properly leading to non working kernel.

It should be possible to conditionally compile without volatile for older GPUs, right?

volatile should not be required for ANY gpu! Pls. don't take my odd results as a reason not to make this change. It would not affect 480/580 GPUs who would naturally be using binary kernels anyway...

cl 16:53:45 cl-0     Using PciId : 01:00.0 Ellesmere OpenCL 1.2 AMD-APP (2482.3) Memory : 3.99 GB

stock ethash.cl

 m 16:54:45 ethminer Speed 28.93 Mh/s gpu0 28.93 Time: 00:01

ethash.cl with volatile removed

 m 16:58:01 ethminer Speed 25.02 Mh/s gpu0 25.02 Time: 00:01

???

Weird, but overall the situation is still better because this regression is much less severe than the one affecting the Fiji and Vega, and like you said, Polaris would be using the binary kernels anyway so in reality they would not encounter the regression.

@ddobreff I believe you mentioned it makes no diff on Vega?

Yes, there was absolutely no difference in Vega on standard amdgpu-pro opencl(18.10 compiler).
EDIT: volatile removing should be ok.

There was a comment above suggesting that Vega on ROCm was affected. No confirmation on whether removing volatile fixed it in that case though.

@uentity Would be great if you could test the Vega and report your hashrate.

I have a similar issue. When running the benchmark I get something like this

ethminer/ethminer -M 2 -G


ethminer 0.18.0-alpha.1-18+commit.8294506b.dirty
Build: linux/release/gnu

 m 11:22:26 ethminer Benchmarking on platform: CL Preparing DAG for block #2
cl 11:22:26 cl-0     Using PciId : 45:00.0 gfx900 OpenCL 1.2  Memory : 15.98 GB
 i 11:22:26 cl-0     Adjusting CL work multiplier for 64 CUs.Adjusted work multiplier: 116509
 m 11:22:26 ethminer Warming up...
cl 11:22:26 cl-0     Generating DAG + Light : 1.02 GB
cl 11:22:26 cl-0     OpenCL kernel
cl 11:22:26 cl-0     Loading binary kernel /home/user_name/mine/ethminer/build/ethminer/kernels/ethash_gfx900_lws192.bin
 X 11:22:26 cl-0     Failed to load binary kernel: /home/user_name/mine/ethminer/build/ethminer/kernels/ethash_gfx900_lws192.bin
 X 11:22:26 cl-0     Falling back to OpenCL kernel...
cl 11:22:26 cl-0     Creating light cache buffer, size: 16.00 MB
cl 11:22:26 cl-0     Creating DAG buffer, size: 1024.00 MB, free: 14.97 GB
cl 11:22:27 cl-0     Loading kernels
cl 11:22:27 cl-0     Writing light cache buffer
cl 11:22:27 cl-0     Creating buffer for header.
cl 11:22:27 cl-0     Creating mining buffer
cl 11:22:28 cl-0     1024.00 MB of DAG data generated in 2033 ms.
 m 11:22:41 ethminer Trial 1... 
 m 11:22:44 ethminer Hashes per second 3611658
 m 11:22:44 ethminer Trial 2... 
 m 11:22:47 ethminer Hashes per second 3598562
 m 11:22:47 ethminer Trial 3... 
 m 11:22:50 ethminer Hashes per second 3598562
 m 11:22:50 ethminer Trial 4... 
 m 11:22:53 ethminer Hashes per second 3614558
 m 11:22:53 ethminer Trial 5... 
 m 11:22:56 ethminer Hashes per second 3618185
 m 11:22:56 ethminer min/mean/max: 3598562/3608305/3618185 H/s
 m 11:22:56 ethminer inner mean: 3608259 H/s

In my case I have a Vega FE that lists it at gfx 900 which doesn't match the available binary kernels here https://github.com/ethereum-mining/ethminer/tree/master/libethash-cl/kernels/bin I tried to just rename them to gfx900 but that causes some ELF loading error instead so I guess that may not work.

using 0.15 seems to work as expected (getting ~35 MH/s) as others has reported

Can we finish this?
Can it look like

#if __vega__
volatile
#endif

I would suggest just removing volatile, the performance degradation on polaris should not really matter, since it uses binary anyways.

@Brisse89 please re run your tests using additional CLI argument --cl-local-work 128 and report

@AndreaLanfranchi Seems to be working fine. 31MH/s.

I'm backporting the fix to 0.16 and 0.17

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gennadiv picture gennadiv  Â·  5Comments

ibmua picture ibmua  Â·  6Comments

chfast picture chfast  Â·  3Comments

honlen picture honlen  Â·  3Comments

skynet picture skynet  Â·  4Comments