Ethminer: Command to quit miner on error

Created on 30 Jun 2017  Â·  15Comments  Â·  Source: ethereum-mining/ethminer

If the mining card is OC-ed to the limit, miner just stop with some "cuda error" it would be nice to have command just to quit miner on error so we can use restart scrypt

enhancement up-for-grabs

Most helpful comment

I think there should be an exit command regardless of error or not.. Press Q for example and it shuts down properly instead of just killing the process..
Killing the process actually screws up my system more than when it crashes..

All 15 comments

A much safer solution would be to have your restart script monitor the GPUs usage and restart the miner when any of the GPUs usage drops below 50% for several seconds. This is easily done on NVIDIA.

@DLS-bau Do you have working example of a script? May be you can share with rest of us, thanks

I think there should be an exit command regardless of error or not.. Press Q for example and it shuts down properly instead of just killing the process..
Killing the process actually screws up my system more than when it crashes..

Yeah command -k or so that will quit miner on any error, so simple batch loop will restart it, I use it for most miners

reb0rn21 can you share a sample bat file for batch loop restart?

:loop
ethminer.exe
goto loop

just disable widows error reporting, thats what I do

see here [Issue 72] There are some solutions with batch, powershell or php available..

My very basic (but effective) solution is to monitor the miner with a bash watchdog script.
You have to redirect ethminer output (stdout & stderr) to a log file and then run this script.

#!/bin/bash
#
# minerwd.sh
# Author: Andrea Lanfranchi
#
# Monitors ethminer output log in search of errors.
# If any is found in last 10 rows then mining rig is restarted
#
# Pre-requistes
# apt-get install inotify-tools
#

while inotifywait -e modify ~/miner.log > /dev/null 2>&1 ; do

  # Lookup last 10 rows of log file in search of errors
  # Feel free to integrate grep pattern or create more conditions
  if tail -n10 ~/miner.log | grep -io "cuda error\|error cuda"; then

    # Send mail
    echo "Miner requires restart due to error" | mail -s "Miner WatchDog Restart" prospector@localhost

    # Restart mining rig
    sudo /sbin/shutdown -f -r +2

    # Abandon WatchDog
    exit

  fi
done 

Here's something I am using for my nvidia cards.
Feel free to modify it to your needs.

#!/bin/sh

PREP_GPUS="/home/linus/set_overclocking.sh"
MINER_SCRIPT="/home/linus/start_miner.sh"

gpu0_ultilization=`nvidia-smi -i 0 --query-gpu=utilization.gpu --format=csv,noheader,nounits`

if [ $gpu0_ultilization -lt 50 ]
then
  echo "[alert] GPU seems to be down, restarting."
  $PREP_GPUS
  $MINER_SCRIPT
  echo "Done restarting miner script, going to sleep now"
else
  echo "[info] All normal"
fi

I'm using this with nvidia cards and tmux:

#!/bin/bash

file=/tmp/ethminer-restarts.log
POWER_THRESHOLD=50
PROBE_DELAY=30
STARTUP_DELAY=60
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # no color

while true
do
    sleep $PROBE_DELAY
    power_draw=$(nvidia-smi --id=0 --query-gpu=power.draw --format=csv,noheader,nounits)
    if (( $(echo "$power_draw < $POWER_THRESHOLD" | bc -l) ))
    then
      echo -ne " $RED$(date +'%H:%M') ✘$NC " | tee -a $file
      tmux respawn-pane -k -t ethminer:0.0
      sleep $STARTUP_DELAY
    else
      echo -ne "$(date +'%M') ${GREEN}✔$NC "
    fi
done

This method doesn't work everytime. If GPU fails nvidia-smi is executed in a loop without output. I am currently working on finding a better way to implement watchdog function.

@ddobreff When nvidia-smi stops working, the driver will log a XID error. You can check with:
journalctl _TRANSPORT=kernel | grep NVRM
So far i have not found a reliable why to recover from those failures. I just trigger a reboot on them (https://jjacky.com/journal-triggerd/)

We shouldn't be using this function at all, it may cause other dificulties like I forgot that I stopped the miner and while compiling the system rebooted...A better approach is to use miner as instructor for watchdog.

This method doesn't work everytime. If GPU fails nvidia-smi is executed in a loop without output.

True. I haven't tried it but I think checking exit code from nvidia-smi should allow to catch this. Another thing that should be accounted for is when nvidia-smi hangs (I think I've seen such cases).

After #757 (added --exit parameter to exit whenever an error occurred) you can use a watchdog.

Try ETHminerWatchDogDmW Windows7/8/10 [32/64] & Linux (Any Dist/Any Ver/Any Arch) (#735).

Check and feedback please.
Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

skynet picture skynet  Â·  4Comments

Penziplays picture Penziplays  Â·  5Comments

unknown2this picture unknown2this  Â·  4Comments

bartocc picture bartocc  Â·  3Comments

chfast picture chfast  Â·  3Comments