During a another run, I got this. The miner crashes each time.

CUDA error in func 'search' at line 365 : an illegal memory access was encountered. CUDA error in func 'search' at line 365 : an illegal memory access was encountered. CUDA error in func 'search' at line 365 : an illegal memory access was encountered. ✘ 01:14:46|cudaminer1 Error CUDA mining: an illegal memory access was encountered ✘ 01:14:46|cudaminer2 Error CUDA mining: an illegal memory access was encountered ✘ 01:14:46|cudaminer4 Error CUDA mining: an illegal memory access was encountered CUDA error in func 'search' at line 365 : an illegal memory access was encountered. ✘ 01:14:46|cudaminer3 Error CUDA mining: an illegal memory access was encountered CUDA error in func 'search' at line 365 : an illegal memory access was encountered. ✘ 01:14:46|cudaminer0 Error CUDA mining: an illegal memory access was encountered

rizwansarwar on 28 Jun 2017

It may be related to overclocking, contrary to my prior observation in #80. I ran it for about 90 minutes at stock GPU settings with no error. Changed to +165 core and +2000 mem using the nvidia x server settings gui. It ran stable for about 2 minutes and errored in this way.

I dropped the mem to +1500 and started again. It ran for about 30 minutes with no problems.

Increased mem to +1900 on only one card and the error occurred again. It was reported on both GPUs simultaneously as usual, despite only changing the rate on one of them.

I am able to restart ethminer over and over at these high mem transfer rates and it fails in a short period each time.

I don't have any experience with C, or any low level hardware programming. So I'm not going to even attempt to understand what the code does.

I hope reporting how to reproduce the problem helps someone find a solution, or at least better error catching for this reproducible problem.

Ideally, the miner would catch the error and restart, while incrementing a counter showing the number of restarts due to errors. There is a point where higher transfer rates reduces performance due to failures. But it's hard to find it when the crashes are hard to detect without standing by and watching the scrolling terminal.

shanemgrey on 29 Jun 2017

👍3

@shanemgrey thanks for posting this, I agree, I suspect it is overclocking causing the problem. I am trying to downclock slowly to see the breaking point.

Have to agree, Claymore handles crashes very well, it is very handy to have especially if you can't be monitoring the miner all the time. Some sort of mining restart option would be very handy feature of this miner.

rizwansarwar on 29 Jun 2017

Just to check whether this is OS related. My miner is on Windows 7 Ultimate 64 bit and it experiences the exact same behavior (crashes, based on the overclock level). On my Windows 10 machine I have a single card, which had not crashed at all (running 21 hours now). Are you getting the same results, or it's not the OS?

YInsomniac on 29 Jun 2017

@Skromniac Still not sure, I can replicate crashes on all OS (Windows/Linux) when clocks are too high. So far what I have observed is that the crashes are becoming less frequent (every 2 minutes to 30 minutes) when I reduce the clock speed. I am going to continue to try that till I see now crashes for days. I am still not convinced it is totally down to clocking, OC makes the problem worst but I think it is not the root cause of the problem.

rizwansarwar on 29 Jun 2017

More info, each time there is a crash, there is a kernel driver error.

Jun 30 06:08:58 ubuntu kernel: [77905.021944] NVRM: Xid (PCI:0000:02:00): 31, Ch 0000001b, engmask 00000101, intr 10000000

Xid 31: according to Nvidia driver site, this error is generated when it is Driver/Application fault.

So not a hardware problem, which is good because it rules out a hardware issue. I have tried different version of the driver and I get same errors. I think we need someone who knows how the miner works to take a look at this may be help us.

For reference: I am running Ubuntu 16.04, Driver 64 bit, 381.22 and Cuda 8.0

rizwansarwar on 30 Jun 2017

Been doing some digging, I have been gradually upgrading from Driver version 367.27 to 381.22. The crashes are consistent, you get them regardless. It is really annoying because there is no watchdog feature in the miner to restart on failure. And you can't baby sit it 24/7 or auto restart.

Some more information, depending on your driver version, you get different crash error. So at I got 381.22 driver version, I got illegal memory error, but at 375.66 I get unspecified launch failure. All of it relates to some sort of search code in ethash library for this miner.

@davilizh @chfast @Genoil guys your comments please. Really struggling to find the root cause here.

rizwansarwar on 30 Jun 2017

A discussion possibly related to the memory access errors, https://stackoverflow.com/questions/25702573/simple-cuda-test-always-fails-with-an-illegal-memory-access-was-encountered-er.

Mentions the following:

If you ran your code with cuda-memcheck, you would get another indication of the illegal memory access in the kernel code.

Disscusion of CUDA parameter constraints, https://stackoverflow.com/questions/8302506/parameters-to-cuda-kernels.

ericalandouglas on 30 Jun 2017

@rizwansarwar
Sorry to reply late.
Reading through all your comments, issue should be overclocking makes the GPU fetch wrong data/instruction. To be honest, I have no experience in overclocking gpu/mem. My roughly thoughts are:

will the fault occur again if we only over clock memory clock? Since ethereum is memory bound, I think it is more important to over clock memory clock.
can we malloc all the data structures of ethereum in host memory, while only place the dag buffer in video memory?
can we add watchdog into the code to restart it when error occurs?
can we use cuda-gdb or cuda-memcheck to find out which instruction/data is wrong, so that we can add guard among them?

davilizh on 3 Jul 2017

👍1

I hope this helps for reproducability - I've restarted my rig on Friday and haven't logged in via remote desktop since then. The rig mines normally without a hitch. Somebody mentioned that the issue occurs often when you log in to check, i.e. when the main video card tries to render something else (apart from the mining).

YInsomniac on 3 Jul 2017

@Skromniac
Thanks, good news to know.
If so, we can add small region of over clocking for the main card, while add large over clocking region for others. We can even not overclocking the main card.

davilizh on 3 Jul 2017

@davilizh
Thanks for getting back. Please see my comments below.

The crash occurs with only Memory overclocked. It gets worst as the overclock gets closer to limit. But happens regardless. I have verified this by trying to gradually reducing the memory clock speed. It gets better as you get close to stock clocks but you still get crashes (sometimes 12hours apart).
Probably good idea, I am not expert in CUDA programming, but would that any performance penalty?
Absolutely a must have in my opinion, the entire miner code should be thread that gets initiated by a watch dog thread. It should try to recover miner when possible.
Sorry my wizardry powers end here, you are gurus I am just a convert trying to help and report :)

@Skromniac I will try this today, I will try to leave the display card out of the list of devices to mine. Hopefully that should prove if that is the problem.

rizwansarwar on 3 Jul 2017

👍1

@rizwansarwar
Thank you for your reply.

For #2, there should be some penalty. But as long as the code is carefully tuned, the penalty should be small. But I do not have time to realize this idea recently.

Hope Skromniac's approach can solve this issue.

davilizh on 3 Jul 2017

Here is my experience so far in case it helps.

I have 2 rigs one with 1070's only and one with 50/50 1070's and 1060's. The rig with the 1060's is using --cuda-parallel-hash 4 and the 1070 rig is not using that flag at all. Both are running Ubuntu 16.04.2 with Nvidia driver version: 378.13

Regarding @Skromniac 's comment, I have no monitors connected to my rigs, I only use SSH and the crash occurs while I am asleep as well. For me this error doesn't seem to correlate with using monitor/remote desktop, however it could be an additional trigger perhaps.

Coming from Claymore's I had to drop my memory clocks (I don't OC core) just to get it somewhat stable. With the lower clocks the best I've had so far is around 24 hours without the error. I haven't dropped lower as if I do I will switch back to Claymore's as it will provide a better hashrate.

I have had a similar experience to @rizwansarwar with stability increasing as clocks are lowered but never fully disappear.

braaad on 5 Jul 2017

Can you guys update your driver to 384 and have a try?
I have run the code on my GTX1060 for hours with driver 384 and stock clock, but cannot reproduce the issue.

davilizh on 5 Jul 2017

@braaad If you do not set cuda-parallel-hash in your command, then you are using the default value cuda-parallel-hash=4.

davilizh on 5 Jul 2017

@davilizh I will update driver and give it a shot, I can reliably cause the error if I increase my clocks so I should have an answer soon.

braaad on 5 Jul 2017

@davilizh I have installed 381.22 (The latest Linux version) but was able to quickly get the error again by bumping my clocks up by 50mhz. I have dropped my current clocks down quite a bit more now (more than I already had) to see what effect that has on stability.

braaad on 5 Jul 2017

@braaad, could you try 384.47 ?

azazhu on 5 Jul 2017

@azazhu my bad, I double checked versions after reading your comment and realised that 384.47 was a beta driver which is why I didn't see it earlier. Grabbing it now.

braaad on 5 Jul 2017

@davilizh small update, I have upgraded to driver 384.47. This version of the driver is generally more stable than all the previous versions. My 6th card in the rig has started to work now, which never got working in any of the previous versions of the driver. In Nvidia changelog for the driver, they seem to have a fixed a bug with it.

I have been playing around with settings, so far what I have observed is below.

If the memory clock of GPU with primary display is not overclocked, I don't get crash on 384.47.
If the memory clock of GPU with primary display is at same clock as all other cards (overclocked), then I get crashes within minutes.

So what I have been doing is to keep the clock of GPU with display slightly lower (-100 to -150) than all other cards. This keep the system stable and keeps it running on 384.47. I will report back soon if I observe crashes.

rizwansarwar on 5 Jul 2017

@rizwansarwar Thank you for your sharing.

davilizh on 6 Jul 2017

@davilizh so far 12+ hours without error on one rig - this is overclocked, not stock. Still a bit too soon to be 100% certain, but looks good so far.

Also, one thing to note, like @rizwansarwar, gpu0 has to have a lower clock than the others, I thought this was just a bad card but maybe its due to being gpu0.

I will hopefully get time today to update the second one.

braaad on 6 Jul 2017

@braaad Good news to know. Thank you.

davilizh on 6 Jul 2017

@rizwansarwar Hi, I don't OC my rig, but CUDA error in func 'search' at line 365 : unspecified launch failure. still shows up each time. My rig's driver is currently 378.78. Is it possibly driver's problem?

ken8203 on 6 Jul 2017

@ken8203 as @davilizh pointed out earlier, there is a beta driver 384.47 that you can try. I am running it and my miner is stable now, the clocks are not at max (may be 10-15% less), but I have not seen a crash in 24 hours. Still in monitoring state, but I believe the issue was with the driver mainly.

@davilizh I think we should monitor this a bit and then close this, as it seems to me the issue with is with the driver. I would however want to see auto-restart feature of the miner in case of a recoverable failure, that will be very neat and handy to have.

rizwansarwar on 7 Jul 2017

If you do not set cuda-parallel-hash in your command, then you are using the default value cuda-parallel-hash=4.

Mind telling us or pointing to an explanation of what exactly does this flag does? I'm a bit puzzled with what I tried.

Actually I think it should be included in readme.md since the default is _automatically applied_ without setting the flag.

Also, one thing to note, like @rizwansarwar, gpu0 has to have a lower clock than the others, I thought this was just a bad card but maybe its due to being gpu0.

__Note__: _First_ NVidia GPU that's connected to _main_ slot PCIe (x16).
It doesn't have to be connected to a display, still have a lower limit of mem clock _compared_ to the other cards. It will crashed ethminer (same error) when pushed past certain speed.
Win 10 with beta 384.47 as suggested.

oleng on 7 Jul 2017

@oleng

The --cuda-parallel-hashflag changes how the miner processes the hashes.

This is very simplified but part of the cuda kernel's work is the search part of the mining process. It runs the same operation in parallel across many cores in the gpu. When @davilizh improved the kernel he added the --cuda-parallel-hash flag to allow changing the number of threads which it processes simultaneously.

It needs some value to be automatically applied without setting the flag because otherwise the miner would not work!

In theory as many threads as possible would be best but there's going be an optimum imposed by the hardware. By default the miner uses 4 because that was the best value which @davilizh arrived at through testing and this has been confirmed by most users who have experimented with it.

I don't think there is any need to promote tweaking advanced settings in the read me because for most people changing them will probably reduce performance. The same applies for the --cuda-block-size --cuda-grid-size and --cuda-streams flags. These are set to sensible defaults and I have only reduced my hashes by changing them.

jimmykl on 7 Jul 2017

You can see the actual code change here https://github.com/ethereum-mining/ethminer/commit/73fc65daf97840f61fdcd292ac42ccb54c7f1553#diff-2b564dc4ef09c49a24fc0105fa8cfe98L45

Instead of a single ethash_search function there are 8 and the code executes as many as are set by the flag.

jimmykl on 7 Jul 2017

@jimmykl thank you for the explanation, i feel like that's aligned to what I suspected.

I don't think there is any need to promote tweaking advanced settings in the read me because for most people changing them will probably reduce performance. The same applies for the --cuda-block-size --cuda-grid-size and --cuda-streams flags. These are set to sensible defaults and I have only reduced my hashes by changing them.

Actually i managed to increase my hashrates by using those flags. Just as an all-size t-shirt works for everyone, customizing your size according to your proportion works better. Customizing the flags to fit your hardwares works better. And I feel this is especially true on overclocking in mining with multi GPUs, which is ~80-90%(?) of miners do. There are even differences in the number of CUDA cores in a same model line.

Think of it as warning them instead of trying to decide what's good for them.

At least include the explanation in --help

oleng on 7 Jul 2017

Oh and also increasing the core clock without any --cuda-parallel-hash set also crashes ethminer.
I did it in addition to a stable OC'd memory clock.

oleng on 7 Jul 2017

I have to eat my words, crash happened after 29 hours. Situation is better but it looks like we are still hitting the bug. I would say we need to find a way to replicate and fix it.

@davilizh are you able to replicate this in your environment? May be with overclocking you can replicate this quicker?

rizwansarwar on 7 Jul 2017

👍1

@rizwansarwar I can replicate in my environment with OC.
As you said in another thread (https://github.com/ethereum-mining/ethminer/issues/94#issuecomment-313800302), this is probably due to a driver issue.
Probably the best way for us it to: in case of an exception like "invalid instruction" catch it, log it, and try to restart the CUDA mining (from chfast's comment there). But I do not know how to do this.

davilizh on 8 Jul 2017

I can reproduce this too on SLI of EVGA GTX 1070, I think we should handle this in the code.

This happens often also with a mild overclock.

Update: This happens also with no overclock, downclock on the core and power target to 65%.

freiro on 8 Jul 2017

for those who still got error, try changing physix in nvidia control panel to CPU instead of one of the GPU. This happen to be worked for me.

Edit: forget it, it failed after several times

ghost on 9 Jul 2017

Does anyone have a fallback miner that they're using in the mean time while this one is being fixed?

feracon on 11 Jul 2017

@feracon I'm using Claymore dual miner in the meantime. Latest release (9.7) came with NVIDIA optimizations if you're using those GPUs

saidmasoud on 11 Jul 2017

@saidmasoud Thanks for the suggestion! I'll check it out!

feracon on 11 Jul 2017

I get this crash as well, and it seems to only occur with higher memory transfer offsets (usually around +1350 or +1400 for me). Curiously, of my 4 rigs it's happening mostly on the rig with EVGA GTX 1070s.

The traditional memory overclock symptom I'd encounter would be one card failing, which makes sense in the overclock context. Yet, in this thread's scenario all cards (in my case 6) simultaneously crash. So, I do agree that this is overclock exacerbated, but I also think there's something in software that's weird and worth investigating.

For those on linux who want to keep using ethminer but don't trust the process due to this: simply write a script that watches the wattage output of nvidia-sli. When it drops below 70w (that's the threshold I use) you know the ethminer process has failed and you can simply kill/restart. Works like a charm for me. Here's the relevant sed/cut:

/usr/bin/nvidia-smi -q -d POWER | grep "Power Draw" | sed 's/[^0-9,.]*//g' | cut -d . -f 1

michael-pesce on 11 Jul 2017

👍3

@rizwansarwar Would you consider editing the title of this issue to include error in func 'ethash_cuda_miner::search' at line 365 or similar? It's because lots of people are creating duplicates of this issue and referencing that error and it may help them see it has already been reported. Thanks!

jimmykl on 11 Jul 2017

I found #94 and #80 before i arrived here. Assuming this is the primary thread for this issue.

feracon on 11 Jul 2017

Yes, please add your report here. Those dupes should be closed.

jimmykl on 11 Jul 2017

I made a PHP script to kill ethminer if it stops hashing (for Linux):

#!/usr/bin/php
<?php
$start=time();
putenv("PATH=/bin:/usr/bin:/usr/local/bin");
while($line=fgets(STDIN)){
    if(time()-$start<=30){ echo "[*] $line"; continue; } // ignore first 30s
    if(strpos($line," 0.00MH/s")!==false){
        echo "crash detected. line=$line killing ethminer\n";
        passthru("echo \"".trim(shell_exec('date'))." crash detected. killing ethminer\" >> ~/ethminer.log");
        passthru("killall -9 ethminer");
    } else echo $line;
}

run ethminer in a loop and pipe it like this:

while [ 1 ]; do ethminer ... 2>&1 | mine-monitor; done

I agree the issue is exacerbated when using the video output and/or doing other things while mining. I'm on Ubuntu 16.04 with 6 gtx 1060s (3 different brands) formerly overclocked to 200/1200, now a little lower, @85W with a G3900 Celeron. I installed CUDA via the official .deb/repo at https://developer.nvidia.com/cuda-downloads which replaced nvidia drivers with 375.x.

The next things I might try are mining without X running or without a monitor plugged in using virtual monitors.

dhjw on 11 Jul 2017

@dhjw

I can't take credit for this, am quoting from something someone wrote to me. But I can't remember who wrote it. :( Anyway, this should solve your problem:

you don't need a monitor connected to make X work. At the time of installation save the EDID of the monitor using nvidia-settings and then use the edid.bin file in your xorg.conf to fake X that there is a monitor connected. I have this working on my rig and X has no issues. You can add edid by using nvidia-xconfig --custom-edid=. This will generate your xconfig using fake edid, X should start fine after that.

I am using these, which I probably wouldn't need with the above in place: https://www.amazon.com/gp/product/B00JKFTYA8. But they can also work.

michael-pesce on 11 Jul 2017

Just got what looks to me like the same error on Claymore, except Claymore recovered.

X is clearly something I'm completely unaware of. I'm having a hard time tracking it down because it's labeled as a single character. I can find many threads about "Do I really need X" etc, but cannot find the actual name of this program or its home. Thanks in advance!

feracon on 11 Jul 2017

@feracon Which version of Claymore this that using? I assume he added the CUDA optimisations from ethminer to 9.7 but I got this error in 9.6 too when I overclocked too much.

jimmykl on 11 Jul 2017

@jimmykl I'm using the new 9.7 with zero OC, completely stock.

I'm having a feeling it may be that the suggested batch file line I got from my pool FAQ is missing arguments the new version is expecting, maybe for the new optimization. Reading now. But at least my rig is mining!

EDIT: Claymore's Dual Ethereum AMD+NVIDIA GPU Miner v9.7 (Windows/Linux)

feracon on 11 Jul 2017

@feracon Re: Windows monitoring I use http://www.tightvnc.com and have never had any issues. If you need remote monitoring you can either setup port forwarding for VNC on your router or run a VPN server (probably best for security)

jimmykl on 11 Jul 2017

@feracon X is https://en.wikipedia.org/wiki/X_Window_System, part of the GUI on Linux systems. If you don't run it you just get a terminal with no graphics.

dhjw on 11 Jul 2017

Re: Claymore 9.7 error it's possible then that he directly copied some code from this fork and has introduced the same bug to his miner… Of course he does fix it he probably wouldn't commit it back here :-/

jimmykl on 11 Jul 2017

@jimmykl Ahh, maybe. Regardless, for anyone having this issue, Claymore may have the same but it can automatically recover, keeping you in business. Thanks for the heads up on VNC I'll check it out, I recon it's likely much lighter than TeamViewer.

@dhjw Thank you!

feracon on 11 Jul 2017

@feracon The new optimization flag is --cuda-parallel-hash and if it isn't set it uses 4 by default which the most optimal for most cards.

jimmykl on 11 Jul 2017

@jimmykl Ahh, I see valid settings are 1, 2, 4, and 8 but people reporting 8 sucks for some 1070s. Going to experiment and overclock again. Thanks for the tip.

feracon on 11 Jul 2017

These were my results:

GTX 1070
Ubuntu 16.04
Memory offset +1500
Power ceiling 115w
./ethminer -U -M --cuda-parallel-hash X:
31.10 at 1
32.36 at 2
28.87 at 3
32.42 at 4
25.86 at 5
21.60 at 6
18.59 at 7
32.22 at 8

michael-pesce on 11 Jul 2017

Hello to all here - I am a new member and happy to contribute some information to all.

I have tested the several 11.0 versions with my RIG1:

6 x GTX 1060 (ASUS Turbo) 6GB (OC Mem 10 GHz).
Windows 10: up to date
NVidia Driver: up to date

And I can confirm:

All versions of 11.0 have similar problems. Sometimes the error message is different but in general they all have the same issue. It looks like that the changes in the area of the CUDA search are buggy.

I am a software developer by myself and worked also with CUDA but unfortunately I do not have MS DevStudio 12 so I cannot contribute a bugfix. I tried to migrate the project to MS DevStudio 2017 but that failed for a lot of reasons.

I am now testing the old version: ethminer-0.9.41-genoil-1.1.7 if it has similar problems with overclocked cards and I will report.

report:

Also the version ethminer-0.9.41-genoil-1.1.7 is reporting an CUDA search error
CUDA error in func 'ethash_cuda_miner::search' at line 346 : unspecified launch failure.
X 01:25:13|cudaminer1 Error CUDA mining: unspecified launch failure

My suggestion is now: The program flow timings change by overclocking the GPU memory, I believe that the software has a general synchronisation issue in that area.

UPDATE 1:

I did not make any further tests especially not with lower over clocking because: it makes no sense.

Instead I wrote a couple of scripts which are monitoring the output of the ethminer and if they find the word "Error" they restart the rig in total. The restart takes 3 minutes and after that it is working at full overclocked speed again. The error takes place relativly rare (2 times a day at my rig).

I stay to my opinion: It is not related to the overclocking, it is related to the internal software design of ethminer for CUDA. Because: It is clearly at one defined position in the code. The different overclocking speed is only changing the software and synchronization behavior of the ethminer CUDA code and nothing else. I asume the designer has forget a sync object on a specific position in the code. And this code is running save at specific speed by accident.

Unfortunately I have no time to review the entire code - please take my opinion and my test with this very old software version as a hint for searching and fixing in the correct way. And please do not rely too much on the "overclocked = bad" opinion .

UPDATE 2:

I maybe figured out that: if you are using the ASUS GPU Tweak II Utility, you should close it after you appplied the tweak. Since I do this on startup of my rig by a script which runs 2 minutes after the GPU Tweak utility started, the ethminer software does not report any errors anymore. Maybe the tweak utility does a parallel access to the graphic cards from time to time and that is causing the error? Or: I observed that the tweak utility takes one complete CPU core after a while to do something I do not know. I have only two in my rig. Maybe the ethminer needs always big CPU headroom to function correctly. That may be also a software synchronization issue in the ethminer then.

Maybe you can check if your CPU is very busy from time to time and in these times the error occurs or maybe you check if your tweak utility is running while mining.

UPDATE 3:

I played around with some priority settings for the ethminer.exe and recognized: If I put it on a high priority the CUDA errors are comming very soon. So, this highlights my proposal that the ethminer.exe has asynchronization problem in general. Maybe somebody used boost messages and thinks of them to be thread safe. But they are not thread save. One has to take care about every shared memory or handle during programming with threads. I would start an analysis of the multi threadding design of the software and check if everything around shared memories is designed properly.

This is the end of my article for that topic :-)

My best regards, Matthias

MatthiasThoemel on 11 Jul 2017

👍1

have the same issues here, several 1060 models after 10/20min the crash, funny part is 3 rigs 8 cards each cloned drives, 1 runs without issues other 2 crash

kiwina on 11 Jul 2017

I can confirm this happens in Ubuntu 17.10, cuda 8 with default drivers (375.66 I believe) running a 1060 and a 1050Ti, both OC +1600.
Both cards crash at the same time, and ethminer stops but it is trivial to stop and start again, so I think a watchdog is the best solution (other than directly finding and fixing the issue).
Previously claymore 9.5 seemed to be running fine for 24+hr, but it is possible it was failing and recovering silently.

Edit: I meant Ubuntu 17.04

pabloi on 11 Jul 2017

Concur with @pabloi, at least one GPU occasionally crashes while running Claymore 9.5 and 9.7 with fairly high overclocks, but the watchdog restarts the miner automatically and doesn't give any details as to why it crashed.

saidmasoud on 11 Jul 2017

http://cryptomining-blog.com/8852-new-optimized-ethminer-for-nvidia-geforce-gtx-1060-gpus/
This version is rockstable for me with same overclocking which crashes latest versions.
Crash on latest versions: https://scr.hu/GWd9B6
Windows 7 x64 gtx 1070 + 1060

spyrek10 on 11 Jul 2017

@MatthiasThoemel could you post your Windows 10 script for the automatic reboot on error?

thghdbs on 11 Jul 2017

👍1

@dhjw Thanks for the script. I am using it to auto-restart ethminer if it fails. It hasn't failed so far though! (I am trying less power constraints to see if that affects failure time).

pabloi on 11 Jul 2017

@dhjw I am catching false-positives with your script. Not sure why, but sometimes after getting new work I will get a report of 0.00Mh/s without any error and, if left to itself, the miner could continue. However your script kills it and restarts it. Since I am getting this about once an hour or so, I modified the script to look for the "CUDA error" string instead of "0.00Mh/s", which will hopefully catch only true errors while still leading with this issue.

ℹ 18:33:06|stratum Received new job #0b7eeb3f ℹ 18:33:06|cudaminer0 set work; seed: #9e972470, target: #00000000dbe6 ℹ 18:33:06|cudaminer1 set work; seed: #9e972470, target: #00000000dbe6 m 18:33:06|ethminer Mining on PoWhash #0b7eeb3f : 0.00MH/s [A4+0:R0+0:F0] m 18:33:10|ethminer Mining on PoWhash #0b7eeb3f : 39.06MH/s [A4+0:R0+0:F0] m 18:33:14|ethminer Mining on PoWhash #0b7eeb3f : 39.32MH/s [A4+0:R0+0:F0] m 18:33:18|ethminer Mining on PoWhash #0b7eeb3f : 39.58MH/s [A4+0:R0+0:F0]
Edit: added sample output from miner.

pabloi on 12 Jul 2017

Of my 4 mining rigs only one is crashing consistently. Below are the crash logs from that rig today, 8 total (so far). If order matters, 7 of the 8 crashes started with cudaminer3. This is interesting, because it tells me is that this is definitely overclock related. Whereas in the past I've seen a card crash and eventually require ethminer restart this error crashes all cards at once. But, root cause still seems to be one bad card, if this ordering is actually telling.

At the end of the day I would prioritize work to have ethminer restart itself (though my pulse script works great) instead of trying to figure out why overclocking is doing this.

I have reduced the overclock on gpu3 and will let you know what happens.

miner.201707110950:  ✘  09:49:39|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer1  Error CUDA mining: an illegal memory access was encountered

miner.201707111324:  ✘  13:23:55|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer5  Error CUDA mining: an illegal memory access was encountered

miner.201707111336:  ✘  13:36:11|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer1  Error CUDA mining: an illegal memory access was encountered

miner.201707111448:  ✘  14:48:25|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer1  Error CUDA mining: an illegal memory access was encountered

miner.201707111704:  ✘  17:03:37|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer5  Error CUDA mining: an illegal memory access was encountered

miner.201707111814:  ✘  18:13:30|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer0  Error CUDA mining: an illegal memory access was encountered

miner.201707111818:  ✘  18:17:44|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer4  Error CUDA mining: an illegal memory access was encountered

miner.201707111919:  ✘  19:19:04|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer0  Error CUDA mining: an illegal memory access was encountered

michael-pesce on 12 Jul 2017

As I mentioned at the other issue post (#94) first it was working for about 48 hours, then I had these errors 2 times, each after ~1.5 hours.
Then just out of curiosity I lowered the OC on the mem clock from +700 to +650Mhz (core clock is +0, power target is 90%). These settings are applied to all cards. Turned the mining back on, and its working since. (9th of July)
Maybe it means something, maybe it doesnt, because I saw comments about crashing on stock clocks.
Maybe it will crash again today, but its interesting that it happened 2 times in 3 hours, and then runs more than 4 days without any issues.

ghost on 14 Jul 2017

I second @aiden1408 . Yesterday I increased OC from 1600 to 1700 (mem) and managed to get 4 errors in 5 minutes. Previously it would crash a 2-3 times a day.

pabloi on 14 Jul 2017

anyone knows what caused the problem yet? OC _should not_ be a problem, this is miner software after all.

oleng on 15 Jul 2017

@saidmasoud

(...) watchdog restarts the miner automatically (...)

Could you please provide more information on how you implement watchdog in this case?

piotr-dobrogost on 15 Jul 2017

@piotr-dobrogost I didn't implement it myself, it comes as part of the Claymore mining software and is enabled by default. I'm currently using Claymore until a fix for this issue is implemented

saidmasoud on 15 Jul 2017

I have the same problem. I have 3 rigs using gtx 1060s pny/evga. I have 9 PNY gtx 1060 xlr8, 6 of them are running fine but three of them don't accept the same overclocking, an even when I lower their OC they crash either right away or after a while. when I run the GPUs with no OC they show"an illegal memory was encountered" so I have to go ith -400 core to run them, but still crashing!!!

aityou on 16 Jul 2017

Windows temporary fix:
https://github.com/derubm/Ethminer_Watchdog

derubm on 17 Jul 2017

EDIT: Meanwhile I strongly support Orkblutts solution https://github.com/orkblutt/MinerLamp.
It needs less system resources, runs stable and looks great.

Powershell Solution for CUDA Crashes

So this is a powershell solution, running for some days now without issues. You can tune your cards without having to care about ethminer running into the discussed Bug. No Need to install extra software or 3rd Party Tools...

Feel free to improve. Due to testing the script I had some downtimes with my rig, so donations are very welcome :-) [0x76DC203d1cd70262459cEf56AdE865613c4b9693]

This is the output Screen:
output

Instructions:

=> Generate a run.bat but use a powershell call to Tee out a log file - tee generates a log file that ist further processed by powershell
Save the text into a run.bat in the same dir as ethminer. Excecute the ps1 file - and hopefully enjoy

````
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_SINGLE_ALLOC_PERCENT 100
powershell "./ethminer.exe --cuda-parallel-hash 4 --farm-recheck 150 -U -S eth-eu1.nanopool.org:9999 -FS eth-eu2.nanopool.org:9999 -O 0xYOURADRESS 2>&1|tee log.txt"
exit
`````

=> This is the main Powershell script (don't forget to enable powershell Script execution in Windows). To reduce Memory issues, the script opens and Closes Jobs after a while (but mining goes on). Insert the Text into a *.ps1 file and save it in the ethminer Directory.

````
function JobOpen{
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,13

gci log.txt | % { $sb = [scriptblock]::create("get-content -wait $_") ; start-job -Name LOGSEARCH -ScriptBlock $sb }

$null = $(get-job|receive-job)
# sleep 1

}

function JobClose{
Stop-Job -Name LOGSEARCH
get-job | Remove-Job
[System.GC]::Collect()
#sleep 1
}

function EthRestart{

#cls
#Write-Host "#######################################################################################################"
#$Host.UI.WriteLine($(get-job | receive-job))
 stop-process -Name ethminer
 sleep 2
 RemoveLog
 sleep 2
 Start-Process .\run.bat
 sleep 2

}

function RemoveLog {
$strFileName=".\log.txt"
If (Test-Path $strFileName){
Remove-Item $strFileName -Force
}Else{
# // File does not exist
}
}

function statOutput{

$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,0
        Write-Host "Start:    $orgstartdate"
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 50,0
        Write-Host "Nowdate:   $nowdate"
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,1
        write-host "Restart:  $ethstartdate"
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 50,1
        write-host "#Restarts: $i"
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,2
        write-host "Jobstart: $jobstartdate"

}

$i=0
$s=0
$orgstartdate= get-date
$ethstartdate=get-date
$jobstartdate= Get-Date
$nowdate = Get-Date

$d=Get-Date

RemoveLog
sleep 2
Start-Process .\run.bat
sleep 7

$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,10
gci log.txt | % { $sb = [scriptblock]::create("get-content -wait $_") ; start-job -Name LOGSEARCH -ScriptBlock $sb }

sleep 1

while(1) {
statOutput
if(($nowdate - $ethstartdate).totalseconds -ge 15) {
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,20
$Host.UI.WriteLine($(get-job | receive-job -Keep |select -last 1))
$m = $(get-job | receive-job| select -last 50 |Select-String "Error CUDA mining" )

if($m -ne $null) { 
    $i++
    JobClose
    ethrestart
    $ethstartdate= Get-Date
            JobOpen
    $jobstartdate=$nowdate        

       }

}
$null=$(get-job | receive-job)

sleep -m 50
$nowdate= Get-Date
$s++

if($s -ge 6000) {
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,5
$nowdate= Get-Date
Write-Host "GARBAGE COLLECT START $nowdate "
sleep 1
$nowdate= Get-Date
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,5
Write-Host "GARBAGE COLLECT Ended $nowdate "
$s=0
}

if(($nowdate - $jobstartdate).totalseconds -ge 60000) {
JobClose
JobOpen
$jobstartdate=$nowdate
}

if(($nowdate - $ethstartdate).totalseconds -ge 7200) {
$i++
JobClose
ethrestart
$ethstartdate= Get-Date
JobOpen
$jobstartdate=$nowdate

}
}
exit

````

Malapha on 18 Jul 2017

👍2

I have the same problem with overclocked GTX 1070. I set +100 GPU and +1300 Memory. After this claymore miner and ethminer report failures.
When I set 900-1000 for memory crashes are every 10-15 minutes, so is acceptable but I don't want this ;/

I have Ubuntu 17.04, NVIDIA Driver: 375.66 and CUDA from apt repo.
Currently I have +100 GPU and +1000 Memory and I have 185 MH/s in 6 cards.

I have 6 x Asus ROG STRIX GTX 1070 O8G-GAMING

sblmasta on 18 Jul 2017

I've also made an easy to use GUI (in Qt) that handle errors.
https://github.com/orkblutt/MinerLamp

orkblutt on 18 Jul 2017

miner_lamp

orkblutt on 18 Jul 2017

👍4

My rig is not running on Ubuntu, but I had the same issue. The problem first occurred after I added a 5th card(Evga GTX 1060 6GB) to an already running 4 x Evga GTX 1060 6GB machine. After some test I noticed that the fifth gpu had a micron ddr5 memory and was using a differen vbios version compared to the other 4 gpus. Today I flashed the Bios of the GPU and upgraded it to the same version as the others. The gpu still can't handle the OC-ing that I'm using on the ones with Samsung gddr, but it is stable at 50 % of their overclock values. For example gtx 1060 with Samsung running at +625 memory, with Micron running at +300 , both at 80% power. So far 3 hours no problem.
Before the BIOS flash, it was crashing the whole miner even at stock or slight overclock.
I will keep tracking and updating here.
Hope my findings help you further.

VickoValch on 18 Jul 2017

Latest -dev version (ethminer-0.12.0.dev1) seems to help!

Update: Unfortunately it still happens.

freiro on 22 Jul 2017

👎2

It seems to me to be clearly related to overclocking too much. Reduce overclocking on the GPU that crashes first and you can keep the rest higher. I have one card out of 6 that is more sensitive and heats up way faster than all the others, even from the same manufacturer. I wonder if a script could be devised to automatically find the setting that doesn't crash on each GPU.

Edit: I gave up tweaking the "card that crashes first" as I think it's inaccurate. I do kill ethminer and restart it when that happens, but now I only decrease overclock when a card goes offline.

dhjw on 26 Jul 2017

@dhjw so why older version works for me without any crash with exactly the same overclocking?

spyrek10 on 26 Jul 2017

👍1

Not sure @spyrek10 but I upgraded and it seems to be the same to me.

dhjw on 26 Jul 2017

In my case the problem was (or at least i hope so, 48hrs without problem) caused by SATA->MOLEX reduction in powered USB riser, it was getting very HOT (about 70C) and f.e. EWBF miner exited few seconds after start. Replacing SATA->MOLEX reduction and powering direclty from MOLEX on PSU did solve my problems (on windows and linux).

dafyk on 3 Aug 2017

@dafyk I had the same problem with high temperature on cable in MOLEX-SATA POWER. I replaced wire to sata power only and works good.

sblmasta on 3 Aug 2017

@orkblutt - I really like your solution, but for some reason minerlamp crashes on my system. The program itself works, but soon after I start mining windows says the program has stopped working. On further attempts ethminer wont start at all. In or out of your program. I have to reboot.

If the devs are reading this, I hope a watchdog feature is high on the priority list. I've so far refused to use Claymore as I don't like what he stands for. Not so much the fee, but to me there's no doubt he's ripped genoils CUDA optimizations. That's wrong. Then there's the impact his dev fee server switching has on pool servers, but that's a debate for somewhere else.

Ethminer is the best ETH miner, and asks for nothing more than a donation (which I gladly give). For ETH mining only, Claymore has no benefit other than a watchdog. I hope ethminer gets one so I never even have to consider paying him a cent. Thanks for all your hard work.

Just a final note. I've definitely been able to increase the memory overclock by a decent amount (+100, 4x1060's) with far fewer crashes using the latest ethminer 0.12 dev releases. I'm fairly comfortable leaving miner restarts an hour apart. Crashing occurs just once or twice a day. There's no way I could do that before with these clocks. Maybe these new cards are just being nicer to me now (highly unlikely), or the devs have been looking into the issue. I hope so. A watchdog would still give that needed peace of mind.

Caerus7 on 11 Aug 2017

if minerlamp won´t work :
malapha ´s powershell solution , some post above , or
https://github.com/derubm/Ethminer_Watchdog

might help untill there ´s a built-in watchdog.

derubm on 12 Aug 2017

Thanks @derubm. Minerlamp seems to be working fine now. I haven't gone back to 0.11 to see if there was some issue with that version on my system. Just taking the win :). Nice work @orkblutt. Thank you.

Caerus7 on 12 Aug 2017

Hi Guys!

Just started mining, after @michael-pesce 's answer I did a 'simple' watchdog with a bash script and I run everything with supervisord (I use that because I remembered that Docker containers were using it in the early days)

It's available here:

https://github.com/joantune/ethminerWatchdog

It's a Linux watchdog, for Nvidia's but it might be adapted to other cards

I'm running it on a screen so far so good, do read the Readme on it

joantune on 8 Oct 2017

Hi, I'm also in the same boat as everyone else.

I guess I'll be trying out MinerLamp (on Windows).

For Linux I'll probably try ethminerWatchdog by joantune, the solution seems neat if supervisord is good.
But I wanted to ask, has anyone tried this python based monitor https://github.com/philon123/MinerMon ?

Also to add to the issue discussion, can it be an issue related to the CUDA release used somehow?
Wondering after finding this issue #53 which the reporter closed on his own.

Jacxz on 22 Nov 2017

On Windows I paid the relatively small amount for Awesome Miner and have been pleased.

On Ubuntu I still use my own scripting and it hasn't let me down. Happy to share more details if folks are interested.

michael-pesce on 23 Nov 2017

as this one is not closed yet: as mentioned by many miners allready:
Illegal memory access error is in case of Nvidia cards happen due to having a card running on max overclocked memory on Power state 2. When your miner does switch to P0 state for whatever reason, memory gets an additional 200 mhz and can (or will) get unstable, which causes this error .

Solution W/o Watchdog: set your mining rig to P0 state ( Windows Nvidiainspector old version, section 5, set force P2 state to "off") on linux you should be able to do that allready with nvidia-smi.)

.Explanation:
When you run your miner on P0 state, the mem-clock overkill will not appear any longer on maxed out (depends on memory brand) GDDR5 (example samsung memory : +710 on P0 state, +910 on P2 state, memory speed in both cases : 4714mhz on windows (x2 on linux for display) ) so on p2 state you would be running +910, then p0 state snaps in and you have not +910 but +1110 - which causes the crash.
If you run your card from start with P0 state it can not run higher then supposed ( +710 in my case for example on P0 state), so no more crash will appear.

Sample Nvidia inspector with version number and section that needs to be changed:

Note: after Driver Update you need to set P0 State again!
also: Note that you have to set 200 mhz less overclock, as P0 state does add those 200 allready!
maybe things like that can be included in correct english in the readme.

derubm on 23 Nov 2017

👍2

@derubm thanks for the clear input!
I've not had any issues with illegal memory access since switching to P0 state with NVIDIA Profile Inspector 2.1.3.10 (Force P2 State -> Off). That is on Windows 10 with a single GTX 1070.
For some reason it has been stable with P2 for a long time on my other machine Windows 10 with four GTX1060. But I guess I'll switch to P0 there as well just to be safe.

With Linux I've not been able to switch to P0, right now the cards go to P0 state when they are on idle, but as I start ethminer they go to P2.
@michael-pesce I'm very interested in any suggestions on good solutions, feel free to share your scripting knowledge :)

Jacxz on 25 Nov 2017

On Linux it stays at P2 but you can still overclock the cards as high as they'll go. It depends on each card but I get between 22.52 and 25.10 on GTX 1060s. Basically I set the card a little high then observe how much hashrate comes out of it to determine the memory type (~22-23 micron, ~25 samsung) then decrease to where it's stable and doesn't get knocked offline.

[rig1] ethminer Speed 144.06 Mh/s gpu/0 23.00 gpu/1 24.94 gpu/2 22.52 gpu/3 24.86 gpu/4 25.10 gpu/5 23.65
[rig2] ethminer Speed 163.83 Mh/s gpu/0 23.40 gpu/1 22.76 gpu/2 23.48 gpu/3 23.40 gpu/4 25.02 gpu/5 22.92 gpu/6 22.84

I do my config by device UUID so things don't get mixed up. Here's my mine-setup script and settings file. Send me ETH at 0x5f8f7166c9920ea2d786e0810defdc611544fbfe :)

dhjw on 25 Nov 2017

anyone know how to get P0 State working in Linux on GTX 1070s? most/all info out there doesn't work, so any link would be greatly appreciated.

moodonis on 29 Nov 2017

In my experience, it's normal to stay at P2 in Linux. It doesn't affect how much you can overclock or the speed you get.

dhjw on 29 Nov 2017

I have it too in Ubuntu 16.04. The problem with this error (illegal memory access encountered) is that it enters an infinite loop and needs to be terminated manually. Once restarted, miner runs fine for another, say, 30 minutes.

Why not make a counter for this message occurrence and, say, after 50 consecutive messages just restart the miner or exit so that we could restart it with a shell script?

Angel996 on 6 Dec 2017

To restart ethminer automatically, start it like this:

while [ 1 ]; do ethminer --farm-recheck 200 -U -F http://127.0.0.1:8080/hostname2>&1 | mine-monitor; done

Here's my mine-monitor script. It requires PHP and a working email system like postfix configured with gmail.

If you're still getting these errors it means one of your cards is overclocked too high. When the card eventually gets knocked offline reduce the overclock a little and reboot. Eventually you should not get any more errors.

dhjw on 13 Dec 2017

This is my experience with 7 x ASUS GeForce DUAL-GTX1060--O6G(edit: currently 9) on Win10 rig on ASRock H110 Pro BTC+ with ethminer 0.12 (and Claymore as a short test)
First I tested with only 2 cards but it is consistent with 7 (soon I'll add at least 2 more possibly up to 5).
I have tested with a single ethminer process for all GPU's and a separate for each GPU as well as a combinations like 1+6 etc. The best result was while running separate processes for each GPU - in a case of failure only one card drops out. When some GPU starts failing usually it fails within a minute again so no point restarting the process again (I didn't check whether reboot would give better result ... I have just started testing so I didn't come so far - I need to tweak few things with delayed start of Asus GPU Tweak II and then shutting it down, read more about the reason below).

There is always one card (usually the same) where ethminer is failing with the error message
"CUDA error in func 'ethash_cuda_miner::search' at line 346 : unspecified launch failure"
With 2 GPU's it was on card0 with 7 it is now (usually) on card1.

Monitor is now connected to the built in Intel GPU so theoretically RDP should not influence the outcome though I have to investigate this some more (I used RDP before when I was checking/testing things so I'm not really sure whether it was influencing the outcome).
I believe I'm pretty conservative with OC and I lowered memory speed by 200 (to 9.300) compared with the recommended for the optimal hash speed/power consumption (65% - 65 degrees) just to be on the safe side (reported 22,9MH/s/card). The cards are the "OC" model so I can't lower GPU speed below the "min" value given in the GPU Tweak interface (1.607).
FYI, I'm using Asus GPU Tweak II - pain in the ass due to some glitches like resetting my settings every time something goes wrong with GPU and GPU Tweak is running in background which means I'll run it once in the beginning and close it afterwards to prevent this happening (edit: Adding new card resets the values to default ones so it is necessary to setup values every time GPU configuration is changed + when something breaks like OS getting frozen).

If one card fails all others are stable nevertheless (at least for 8 hours, my longest test so far).
Trying to use Claymore on the failing card forces the error to migrate to the card2 and hash rate for Claymore is around 19MH/s. In other words the alternative 6 ethminers + 1 Claymore wouldn't work either.

I'll post some updates after I've tested few more things like what will happen without using RDP nor Teamviewer that I used on the other system to reboot where I have 1 x AMD Vega 64 + 1 x Asus GTX 1060 6G (not OC) and where 1060 usually drops out once every 24-48 hours so I used Teamviewer to access the computer from abroad. I'm not sure whether Teamviewer itself could be source of any problem (I'm running it on my 7xGPU rig too).

After last reboot I didn't use RDP and so far it was running without any problem for 45 minutes which is promising.

I have also been running one "rig" with 2 x ASUS GeForce DUAL-GTX1060--O6G on MacPro (2011) with Ubuntu 16.04 + ethminer rock stable (is it correct English? ;) ) for weeks though without been able to tweak memory/GPU speed (only power target) so it has average 35.4 MH/s. My plan is to eventually move those 2 GPU's to the ASRock rig.
If I figure out how to tweak memory/GPU speed on linux my plan/hope is to kick out Windows so any advise is welcome. I've googled a few that I couldn't get working (honestly I didn't put so much time on it so far - had some other things to do).

Edit 1. 2 hours later: no RDP => no error (seems like).

I've just plugged in 8th GPU and I'll come back with the update. Unfortunately I have no more PCIe power connectors available and it seems like the secondary PSU is trying to be smart and won't supply any current for GPU/SATA without the threshold load on the ATA power connector ... or my brand new PSU is not working (not probable but I'm not really sure yet).

So far there is strong indication that the quoted error is (directly) related to RDP messing up with/for ethminer.

Edit 2.
After plugging in 8th GPU the system became unstable again (without connecting RDP) so few tweaks later (memory speed down to 9.100) + couple of reboots it became stable again (for one hour).
Then I found the work-around for connecting 9th GPU scrambling from all the cables I had around: type4 to 4 x AMP MATE-N-LOK + Molex to PCIe power / Molex to sata power. At a same time I ordered 20 PCIe power splitter cables from AliExpress $1.29/piece so in 3-4 weeks I'll be able to build another 12-13 x GPU rig with one PSU for each (1.200W).

Anyhow back to the rig: First Windows got stuck because I started first miner too soon (before GPU Tweak has been able to shut down completely - impatience I know :) ). Reset button and after logging in and starting miners the 9th started complaining about "out of memory" error. Few tweaks later with paging values ended at 35.000/45.000 MB - min/max and I could start even 9th miner.
20 minutes later still no error with reported hashrate 22,6MH/s in average.
If it stays like this I would be more then satisfied :)

Edit 3. 50 minutes later - still no error
Question: Any suggestion about choosing between GTX1060 "normal" or "OC". I ordered 10 OC's because "normal" were out of stock with several weeks estimated delivery time. The price was few bucks cheaper too though I would've stick with "normal" if they were available at a time. Now I'm not sure any more what is to prefer for ETH mining.

Eit 4. 15,5 hours later no error and still counting. Current reported hash rate: 22,4-22,5, in average 20,3 - 24,2 MH/s (average of 9: 22,4MH/s)
I even lowered memory speed for the non OC card and its ethminer process hasn't crashed yet neither (small change in hassrate as a result though it maybe had been higher in average since, currently: 23,8MH/s)
So case is closed for my part.

Edit 5.
ethminer on card9 produced an error after 23:27 first time and second time after roughly 22 hours.
After second time I decided to use RDP to restart the miner and see whether it would cause an earlier error (to compare with the situation without running RDP). I'll come back to you with an update.
Update edit 5. Same card dropped out next time after 60:56h(RDP used 2-3 times).

Edit 6. 7 days later and still running ...

neskoc on 17 Dec 2017

Take and my 5 cents. On GPU I start mining a week ago. For now i have 14 1070ti -+ OC, 2 farms and mining eth with auto restart ethminer if it stops on errors. This two scripts is not best solution, writen from scratch but works fine. Writed only for nvidia but i think it maybe rewriten for ati too ))
All this tested on Ubuntu 16.04

                                          Ok, Lets go!

!!! nvidia coolbits must be enabled if you want OC settings to work. Mine is 13 tested on 381 and 387 drivers, emulated monitor for each card neded my nvidia-xconfig conf for 7 GPU, edid.bin find in google, i made mine from AOC 23 mon

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 387.34 (buildmeister@swio-display-x64-rhel04-15) Tue Nov 21 03:31:45 PST 2017

Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
Screen 1 "Screen1" RightOf "Screen0"
Screen 2 "Screen2" RightOf "Screen1"
Screen 3 "Screen3" RightOf "Screen2"
Screen 4 "Screen4" RightOf "Screen3"
Screen 5 "Screen5" RightOf "Screen4"
Screen 6 "Screen6" RightOf "Screen5"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection

Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor1"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor2"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor3"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor4"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor5"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor6"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:1:0:0"
EndSection

Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070"
BusID "PCI:2:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device2"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070"
BusID "PCI:3:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device3"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:5:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device4"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:6:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device5"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:7:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device6"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070"
BusID "PCI:8:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen1"
Device "Device1"
Monitor "Monitor1"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen2"
Device "Device2"
Monitor "Monitor2"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen3"
Device "Device3"
Monitor "Monitor3"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen4"
Device "Device4"
Monitor "Monitor4"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen5"
Device "Device5"
Monitor "Monitor5"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen6"
Device "Device6"
Monitor "Monitor6"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

                     Script is for miner loop with OC settings for each GPU. 
                            Settings apply only ones at start if they enabled
                 Just edit it for your needs and run thats all, main part after it

!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:6]/GPUMemoryTransferRateOffset[3]=900

nvidia-settings -a [gpu:6]/GPUFanControlState=1

nvidia-settings -a [fan:6]/GPUTargetFanSpeed=80

while true; # This will loop your miner even if you kill -9 ethminer it will start again after do \
# To stop just CTRL+C or what ever you want =)
do
/home/m1/Miner/ethminer -U -S eth-eu2.nanopool.org:9999 -O 0xb4983146f0047d87c63b5fdb3ef9e2bee4557ea3.M1/[email protected]
done

                       Thats was not so hard, the main deal is up to go !!! 
                      While our miner script is working  we will run another one 
                                             Script for monitoring

!/bin/sh

-i 5 number GPU to monit

gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits

while true; #Loops :=))
do
while [ $gpu -gt 50 ]
do
gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits
echo "GPU load $gpu"
echo "All good $(date) GPU load $gpu No errors"
sleep 10
done
if [ $gpu -lt 40 ]
then
killall -9 ethminer
echo "Restart Miner GPU load $gpu $(date) error"
echo "Restart Miner $(date) error" >> /home/m1/Miner/ethminer.log
sleep 60
gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits
fi;
done

Thats it. Finished it esterday. I think it can be smaller. But nothing need to install, compile etc. All night i tested my GPUs with OC and power -+ very fast to test cloks and + tail -f /var/log/kern.log | grep nvrm to see what gpu couesd an error without long farm stop.
If it will help you. I like good coffe )) b4983146f0047d87c63b5fdb3ef9e2bee4557ea3
Hosted

H05ted on 21 Dec 2017

👍1

Hi hosted!
I have a small rig, but it has a similar xconfig (that trick to emulate a
monitor took a while to learn), but maybe the coolbits value is different,
and the oc values as well. I have to try some of your oc values, in gtx
1060 i remember I couldn't control the fan for instance. I either had the
wrong coolbits value or it just doesn't work with 1060s, but there's
nothing like trying.

Again, yours is a very complete solution, so thanks for posting it here.
Like I said I have a tiny rig with one 1060, from which I squeeze at most
23.6 MH/s. I was wondering how many MH/s do you get from one board with
those oc configs?

On Thu, Dec 21, 2017, 08:29 H05ted notifications@github.com wrote:

Take and my 5 cents. On GPU I start mining a week ago. For now i have 14
1070ti -+ OC, 2 farms and mining eth with auto restart ethminer if it stops
on errors. This two scripts is not best solution, writen from scratch but
works fine. Writed only for nvidia but i think it maybe rewriten for ati
too ))
                                      Ok, Lets go!
!!! nvidia coolbits must be enabled if you want OC settings to work. Mine
is 13 tested on 381 and 387 drivers, emulated monitor for each card neded
my nvidia-xconfig conf for 7 GPU, edid.bin find in google, i made mine from

AOC 23 mon

nvidia-xconfig: X configuration file generated by nvidia-xconfig nvidia-xconfig:
version 387.34 (buildmeister@swio-display-x64-rhel04-15) Tue Nov 21
03:31:45 PST 2017

Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
Screen 1 "Screen1" RightOf "Screen0"
Screen 2 "Screen2" RightOf "Screen1"
Screen 3 "Screen3" RightOf "Screen2"
Screen 4 "Screen4" RightOf "Screen3"
Screen 5 "Screen5" RightOf "Screen4"
Screen 6 "Screen6" RightOf "Screen5"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"

generated from default

Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

generated from default

Identifier "Keyboard0"
Driver "kbd"
EndSection

Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor1"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor2"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor3"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor4"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor5"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Monitor"
Identifier "Monitor6"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection

Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:1:0:0"
EndSection

Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070"
BusID "PCI:2:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device2"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070"
BusID "PCI:3:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device3"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:5:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device4"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:6:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device5"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070 Ti"
BusID "PCI:7:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
Identifier "Device6"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1070"
BusID "PCI:8:0:0"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen1"
Device "Device1"
Monitor "Monitor1"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen2"
Device "Device2"
Monitor "Monitor2"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen3"
Device "Device3"
Monitor "Monitor3"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen4"
Device "Device4"
Monitor "Monitor4"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen5"
Device "Device5"
Monitor "Monitor5"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "Screen6"
Device "Device6"
Monitor "Monitor6"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "13"
SubSection "Display"
Depth 24
EndSubSection

EndSection
                 Script is for miner loop with OC settings for each GPU.
                        Settings apply only ones at start if they enabled
             Just edit it for your needs and run thats all, main part after it
!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:6]/GPUMemoryTransferRateOffset[3]=900

nvidia-settings -a [gpu:6]/GPUFanControlState=1

nvidia-settings -a [fan:6]/GPUTargetFanSpeed=80

while true; # This will loop your miner even if you kill -9 ethminer it
will start again after do

To stop just CTRL+C or what ever you want =)

do
/home/m1/Miner/ethminer -U -S eth-eu2.nanopool.org:9999 -O
0xb4983146f0047d87c63b5fdb3ef9e2bee4557ea3.M1/[email protected]
done
                   Thats was not so hard, the main deal is up to go !!!
                  While our miner script is working  we will run another one
                                         Script for monitoring
!/bin/sh

-i 5 number GPU to monit

gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu
--format=csv,noheader,nounits

while true; #Loops :=))
do
while [ $gpu -gt 50 ]
do
gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu
--format=csv,noheader,nounits
echo "GPU load $gpu"
echo "All good $(date) GPU load $gpu No errors"
sleep 10
done
if [ $gpu -lt 40 ]
then
killall -9 ethminer
echo "Restart Miner GPU load $gpu $(date) error"
echo "Restart Miner $(date) error" >> /home/m1/Miner/ethminer.log
sleep 60
gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu
--format=csv,noheader,nounits
fi;

done

Thats it. Finished it esterday. I think it can be smaller. But nothing
need to install, compile etc. All night i tested my GPUs with OC and power
-+ very fast to test cloks and + tail -f /var/log/kern.log | grep nvrm to
see what gpu couesd an error without long farm stop.
If it will help you. I like good coffe ))
0xb4983146f0047d87c63b5fdb3ef9e2bee4557ea3
Hosted

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ethereum-mining/ethminer/issues/72#issuecomment-353278352,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA-DBPLj3pjf2XpPInPCCcBT1yCJI40aks5tCgjFgaJpZM4OGt11
.

joantune on 21 Dec 2017

Hi joantune
For fan control run nvidia-settings -a GPUFanControlState=1
then nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80 of GPU0 fan spped
try coolbits 38 i was seeking it to work too.
My cards not so good as i want but ~31.7Mhs ~500Sol each, testing is not finished yet.

H05ted on 21 Dec 2017

I'm just finished with setting up second rig (Asus new miner MB - max 19 GPU's) with 3 Asus GTX 1060 OC/1 non OC (old batch).
H05ted inspired me to dive into linux again and it is up and running (I will bi needing some adjustments so miner is automatically started after reboot but everything else is working great.
I'll post some scripts later but I would like to make some suggestion to H05ted xorg.conf
I was struggling for hours to be able to change nvidia settings for card>0 and I've just came across this command: sudo nvidia-xconfig -a --cool-bits=13 --allow-empty-initial-configuration
that makes perfect xorg.conf without need for any editing afterwards ... and bonus is that nvidia-settings is now working for all cards.
I'm not sure whether it is related to the installing (apt install) xserver-xorg-dev prior to reboot (running nvidia-xconfig complained about missing xorg-server so I installed it). Anyhow it is working now.
As I wrote I'll post some updates in the future.

neskoc on 31 Dec 2017

Ethereum Miner Monitor released - v1.0.2 - FREE!

This is a python application for monitoring linux based ethereum miners and keep alive the miner in 24/7. If you have a linux based mining rig, but don't have monitoring system, you can use this standalone script to keep your miner always running without manual checks.

The application is continuously checking the 'ethminer' process is running and the current GPUs utilization average value.

Script can restart the ethminer, or reboot the system.

The script doesn't need any extra package/module of python, just pure python3. You can use virtualenv too.

The current version was tested on Ubuntu 16.04.3 LTS (xenial), with GeForce GTX 1070 Ti && AMD Radeon R9 290X cards.

Added AMD Utilization query support!

Download: https://github.com/xstead/ethereum-miner-monitor

xstead on 1 Feb 2018

xstead, who's using ethminer these days? Are you from 2015, dude? )))

Angel996 on 1 Feb 2018

@Angel996 someone who knows what they are doing...

xstead on 1 Feb 2018

Don't feed the troll.

evilny0 on 1 Feb 2018

@evilny0 ;)

xstead on 1 Feb 2018

evilny0, what do you mean, exactly? ethminer is the slowest ethash miner nowadays, what's the point of using it?

Angel996 on 1 Feb 2018

@Angel996 if you're so convinced about this simply do not use it and do not denigrate the hard work it's ongoing to improve it.

AndreaLanfranchi on 1 Feb 2018

AndreaLanfranchi, are you convinced otherwise? Are you saying ethminer is faster and/or more power efficient than Claymore's?

Angel996 on 1 Feb 2018

Simple answer yes.

AndreaLanfranchi on 1 Feb 2018

Ok, so I believed you, I spent some time, dl'ded latest ethminer release.

Tried it on an Ubuntu 16.04 LTS rig with 5x Palit 1060 Stormx 3GB Samsung. Core -200, mem +1200.

And here is the result:

Claymore's: ~23.5 Mhs per card, ~116 Mhs per rig. Consumption ~ 90 Wt/card.
Ethminer: hardly 19.5 Mhs per card, ~96 Mhs per rig. Consumption ~ 90 Wt/card.

VERDICT: It's a good thing to be "convinced" about something, but reality is a bit different.

Angel996 on 1 Feb 2018

@Angel996 We're an open source project. We're not getting payed to do any of this. I think it is really cool, that so many developers contributed to this project and did this in their spare time. Without developers like this, there wouldn't be any cryptocurrencies, because no one would (and should) trust closed source code!
In my own test, i find about ~6 percent difference to claymore, which is a price I am willing to pay for knowing what code is executed on my machine

MariusVanDerWijden on 1 Feb 2018

Not to feed the troll but here it goes:
@Angel996 Whats the real hashrate at the pool? Claymores shows aprox 10% more than actual hashrate at the pool. Ethminer shows the exact same at the pool as in the miner.
My 6x 580 8GB rig showed ~180 in claymores with ~168 at the pool. With a correct setup ethminer i get ~175MH/s at both the pool and in the miner. I love ethminer, keep up the great work!

tonyaik on 1 Feb 2018

MariusVanDerWijden -- benefits of open source software? Sure. But when it comes to mining, it all about making money. That kind of renders slower miners totally useless. And please don't tell me you mine for educational purposes.

tonyaik, don't look at hashrate reported, did you count the actual number of shares submitted? I got two identical rigs, I can run a test, say, for an hour and see.
What exactly is "correct setup ethminer"? I looked thru --help output, I don't see a lot of options there. OC? I use same settings as with Claymore's.

Angel996 on 1 Feb 2018

@Angel996 Yes, I looked at the actual number of shares. Better hashrate at the pool and no dev-fee.

--cl-local-work and global-work is what I mean. For me, the defaults weren't optimal.

tonyaik on 1 Feb 2018

cl options are for AMD cards which are not in question (alghough I have those too). As for dev fee, here:

https://github.com/JuicyPasta/Claymore-No-Fee-Proxy

Since stratum protocol is a merely plain text TCP session, works like a charm if SSL is not used by miner.

Angel996 on 1 Feb 2018

Saw the exact same differance with my 7x 1060 rig.

Using a no-fee-proxy is a shitty thing to do. Don't want to play by the developers rules, don't use it imho. Anyhow. I'm out of this thread.

tonyaik on 1 Feb 2018

VERDICT: It's a good thing to be "convinced" about something, but reality is a bit different.

Reality is subjective.
I had the same tests over and over again.

To compare with you I have a 6x EVGA 1060 3Gb on Micron (not Samsung) and I can squeeze from each one roughly 18.67 Mh/s. using -200/+850 at 72.50 Watts using Claymore (pushing harder makes the whole system unstable and/or unresponsive), while on ethminer I get 18.52 Mh/s using -200/+750 at 72.50 Watts (system quite stable running for batches of 12 hours each). And yes ... I am limiting power as much as i can as I do my maths.

The drop in reported hashrate is nothing in respect of the 1% fee.
In addition must say claymore's uses, on average, 26% more of CPU which causes my rigs with not very powerful celerons quite laggy. With ethminer my machines work smoothly.

Plus ... my measurements using ethermine.org depicts that the reported hashrate is quite overlapping the effective and average hashrate. Those overlapping lines where never seen with claymore's. (disregard the last 3 hours where I had connectivity problems.) I always had suspicion that claymore's reports higher hashrate than effective ... but as we cannot read the code ...

immagine1

Last but not least: ethminer is free to use without your "cheats" (which may well be worked around by claymore's in near future). And is open-source: the value of being able to read the code and being sure that nothing unwanted or unexpected happens behind the curtains is much appreciated.

VERDICT : to express absolute verdicts is always not a good idea.

And I stop here.

AndreaLanfranchi on 1 Feb 2018

tonyak>> Claymores shows aprox 10% more than actual hashrate at the pool.

That's absolutely NOT true! The reported hashrate is sent to the pool for people to actually be able to make a comparison of real hashrate vs reported. Here, I've run a 4x 1060 1x 1050ti rig for 20 hours in a row. Results:

Average Hashrate for last 6 hours: 105.3 Mh/s
Last Reported Hashrate: 106.2 Mh/s

Pool is eth.nanopool.org. The slight difference is actually 1% devfee of Claymore's. This is also a good way to check if pool is stealing shares. Probably your experience is due to the fact of share theft, not Claymore misreporting hashrate.

Angel996 on 2 Feb 2018

_AndreaLanfranchi>> To compare with you I have a 6x EVGA 1060 4Gb on Micron (not Samsung) and I can squeeze from each one roughly 18.67 Mh/s. using -200/+850 at 72.50 Watts using Claymore (pushing harder makes the whole system unstable and/or unresponsive)_

I can express verdicts because I have built many rigs. I do it for myself and also for money for other people.

Micron memory is pretty fast too, it should give you about 21 mh/s (at least!). If your system gets unstable/unresponsive with further OC, it's not GPU problem, it's your power supply or wiring. Because OC over limit should just throw "GPU is off the bus" error, the rig should not hang or get unresponsive. By lowering clocks and playing with powerlimit you just get system stability as a tradeoff instead of providing steady power to your equipment.

If you use SATA -> 6PIN power converters, ditch them. Solder quality thick wires directly to your PSU and you'll be surpirsed how much better your rig would perform. I've been thru that.

Alternatively, you might want to try removing 5x GPUs and running your system on 1 GPU. See if you can OC it better. My bet is, you can (as the GPU gets more power). Also, that you mention system becoming unresponsive, suggests your CPU/motherboard/memory is underpowered because of too many GPUs in your rig, or, again, bad wiring.

Angel996 on 2 Feb 2018

UPDATE:

I tried it on a rig with 2x 1060 Hynix and 2x GTX 970 Hynix.

Claymore's:
GPU0 18.473 Mh/s, GPU1 18.465 Mh/s, GPU2 10.292 Mh/s, GPU3 10.536 Mh/s

Ethminer:
gpu/0 18.60 gpu/1 18.60 gpu/2 10.49 gpu/3 10.49

It's a tad faster even. )) That's very interesting. Also, might be a clue for ethminer developers. How does hashrate relate to memory speed?

p.s. GTX 970 used to be much faster (up to 21 mhs), but they got drastically slower after certain ethash epoch :((

Angel996 on 2 Feb 2018

4x1080ti evga
Windows
16gb ram

This code appeared twice today. Both times I was logging in via vnc. It appears itnis what triggered it.
70% power
149 overclock
499 memory overclock

miningpronto on 2 Apr 2018

I meet same problems when I try to use GPU in keras.

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: an illegal memory access was encountered

rosefun on 12 May 2020

Ethminer: Error CUDA mining: an illegal memory access was encountered

Most helpful comment

All 121 comments

Powershell Solution for CUDA Crashes

Instructions:

$d=Get-Date

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 387.34 (buildmeister@swio-display-x64-rhel04-15) Tue Nov 21 03:31:45 PST 2017

!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:6]/GPUMemoryTransferRateOffset[3]=900

nvidia-settings -a [gpu:6]/GPUFanControlState=1

nvidia-settings -a [fan:6]/GPUTargetFanSpeed=80

!/bin/sh

-i 5 number GPU to monit

AOC 23 mon

generated from default

generated from default

EndSection

!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:6]/GPUMemoryTransferRateOffset[3]=900