scylla_cpuscaling_setup fails on a clean Centos 7.6

Created on 4 Nov 2020  Â·  24Comments  Â·  Source: scylladb/scylla

Installation of 4.2

Do you want to set the CPU scaling governor to Performance level on boot?
Yes - sets the CPU scaling governor to performance level. No - skip this step.
[YES/no]yes
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror-hk.koddos.net
 * elrepo: mirrors.thzhost.com
 * epel: fedora.ipserverone.com
 * extras: mirror-hk.koddos.net
 * updates: mirror-hk.koddos.net
eyeota                                                                                                                                                                | 2.9 kB  00:00:00
eyeota-norach                                                                                                                                                         | 2.9 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package kernel-tools.x86_64 0:3.10.0-1127.19.1.el7 will be installed
--> Processing Dependency: kernel-tools-libs = 3.10.0-1127.19.1.el7 for package: kernel-tools-3.10.0-1127.19.1.el7.x86_64
--> Running transaction check
---> Package kernel-tools-libs.x86_64 0:3.10.0-1127.19.1.el7 will be installed
--> Processing Conflict: kernel-ml-tools-5.4.5-1.el7.elrepo.x86_64 conflicts kernel-tools < 5.4.5-1.el7.elrepo
--> Processing Conflict: kernel-ml-tools-libs-5.4.5-1.el7.elrepo.x86_64 conflicts kernel-tools-libs < 5.4.5-1.el7.elrepo
--> Finished Dependency Resolution
Error: kernel-ml-tools conflicts with kernel-tools-3.10.0-1127.19.1.el7.x86_64
Error: kernel-ml-tools-libs conflicts with kernel-tools-libs-3.10.0-1127.19.1.el7.x86_64
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
Traceback (most recent call last):
  File "/opt/scylladb/scripts/libexec/scylla_cpuscaling_setup", line 82, in <module>
    run('yum install -y cpupowerutils')
  File "/opt/scylladb/scripts/scylla_util.py", line 342, in run
    return subprocess.check_call(cmd, shell=shell, stdout=stdout, stderr=stderr, env=scylla_env)
  File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['yum', 'install', '-y', 'cpupowerutils']' returned non-zero exit status 1.
CPU scaling setup failed. Press any key to continue..
User Request bug onboarding

All 24 comments

Ideally the issue should be Fresh installation of scylla-4.2 gets completed but scylla doesn't gets started
The error message is:
`[agaur@eye0701 ~]$ sudo systemctl status scylla-server
â—Ź scylla-server.service - Scylla Server
Loaded: loaded (/usr/lib/systemd/system/scylla-server.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/scylla-server.service.d
└─capabilities.conf
Active: failed (Result: exit-code) since Wed 2020-11-04 10:53:40 UTC; 1min 25s ago
Process: 17600 ExecStartPre=/opt/scylladb/scripts/scylla_prepare (code=exited, status=1/FAILURE)

Nov 04 10:53:40 eye0701 scylla_prepare[17600]: return subprocess.check_output(cmd, shell=shell, env=scylla_env, timeout=timeout).strip().decode('utf-8')
Nov 04 10:53:40 eye0701 scylla_prepare[17600]: File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 411, in check_output
Nov 04 10:53:40 eye0701 scylla_prepare[17600]: return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
Nov 04 10:53:40 eye0701 scylla_prepare[17600]: File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 512, in run
Nov 04 10:53:40 eye0701 scylla_prepare[17600]: raise CalledProcessError(retcode, process.args,
Nov 04 10:53:40 eye0701 scylla_prepare[17600]: subprocess.CalledProcessError: Command '['/opt/scylladb/scripts/perftune.py', '--tune', 'net', '--nic', 'enp3s0f0', '--mode', '...it status 2.
Nov 04 10:53:40 eye0701 systemd[1]: scylla-server.service: control process exited, code=exited status=1
Nov 04 10:53:40 eye0701 systemd[1]: Failed to start Scylla Server.
Nov 04 10:53:40 eye0701 systemd[1]: Unit scylla-server.service entered failed state.
Nov 04 10:53:40 eye0701 systemd[1]: scylla-server.service failed.
Hint: Some lines were ellipsized, use -l to show in full.`

can you get us output from

PATH=$PATH:/opt/scylladb/bin/ /opt/scylladb/scripts/perftune.py --tune net --nic enp3s0f0 --mode None --dump-options-file --verbose

?

cpuset.conf is

# DO NO EDIT
# This file should be automatically configure by scylla_cpuset_setup
#
# CPUSET="--cpuset 0 --smp 1"
CPUSET="--cpuset 1-27,29-55 "

can you get me also contents of
/etc/scylla.d/perftune.yaml

I think the None comes from there

if you could change it from None to sq_split inside above perftune.yaml it will get you working, but I don't have a clue why this happened
(so
PATH=$PATH:/opt/scylladb/bin/ /opt/scylladb/scripts/perftune.py --tune net --nic enp3s0f0 --mode sq_split--dump-options-file --verbose will work)

also can you get me this output:
cat /sys/class/net/enp3s0f0/queues/*/rps_cpus
and ethtool -i enp3s0f0
?

[agaur@eye0701 ~]$ cat /sys/class/net/enp3s0f0/queues/*/rps_cpus ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe ffffff,effffffe

[agaur@eye0701 ~]$ ethtool -i enp3s0f0 driver: ixgbe version: 5.1.0-k firmware-version: 0x800003e2 expansion-rom-version: bus-info: 0000:03:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes

`[agaur@eye0609 ~]$ cat /opt/scylladb/scripts/perftune.py

!/usr/bin/env bash

export LC_ALL=en_US.UTF-8
x="$(readlink -f "$0")"
b="$(basename "$x")"
d="$(dirname "$x")"
CENTOS_SSL_CERT_FILE="/etc/pki/tls/cert.pem"
if [ -f "${CENTOS_SSL_CERT_FILE}" ]; then
c=${CENTOS_SSL_CERT_FILE}
fi
DEBIAN_SSL_CERT_FILE="/etc/ssl/certs/ca-certificates.crt"
if [ -f "${DEBIAN_SSL_CERT_FILE}" ]; then
c=${DEBIAN_SSL_CERT_FILE}
fi
PYTHONPATH="${d}:${d}/libexec:$PYTHONPATH" PATH="${d}/../python3/bin:${PATH}" SSL_CERT_FILE="${c}" exec -a "$0" "${d}/libexec/${b}" "$@"`

There is no file as /etc/scylla.d/perftune.yaml

@penberg this is the issue I mentioned

@gaurarpit12 @tarzanek Yeah, so if scylla_prepare fails to run, like I think it did here:

Process: 17600 ExecStartPre=/opt/scylladb/scripts/scylla_prepare (code=exited, status=1/FAILURE)

The perftune.yaml file will not exists because scylla_prepare is responsible for running that performance tuning step, which will also block Scylla from starting (it needs that information).

@tarzanek that --mode None looks pretty fishy:

PATH=$PATH:/opt/scylladb/bin/ /opt/scylladb/scripts/perftune.py --tune net --nic enp3s0f0 --mode None --dump-options-file --verbose

Where did you get that from? For me, the valid options are:

$ /opt/scylladb/scripts/perftune.py --help | grep "\-\-mode" usage: perftune.py [-h] [--mode {sq_split,sq,mq,no_irq_restrictions}] --mode {sq_split,sq,mq,no_irq_restrictions}

yeah the autodetect code is failing, I thought this is due to hwloc, but it is something else, when perftune tries to find the smallest common mode from Tuners,
so I know which method fails in perftune, just trying to figure out how to simulate it and figure out a fix
will extract some sample so we can see why it doesn't fallback to either mq.

Fwiw we could hardcode perftune.yaml to sq_split and it will work ...

@gaurarpit12 can you try pasting this into
/etc/scylla.d/perftune.yaml :

cpu_mask: 0x00ffffff,0xffffffff
mode: sq_split
nic: enp3s0f0
tune:

and can you grep for me:
cat /etc/sysconfig/scylla-server | grep SET_NIC_AND_DISKS
?
if it will say no, can you flip it to yes ?

this should hopefully get you going, also can you share what kind of machine is this? VM, hw, if hw what server is it? (type, ... )

[agaur@eye0707 ~]$ cat /etc/sysconfig/scylla-server | grep SET_NIC_AND_DISKS SET_NIC_AND_DISKS="yes"

/etc/scylla.d/perftune.yaml

Can't pass this currently as I have some jobs running on the scylla nodes that I don't want to get affected.

so after skipping CPU scaling governor from scylla setup the rest of steps worked and scylla properly starts
so let's move this bug back to
"CPU scaling governor" topic:

cpupowerutils package cannot be installed when kernel-ml-tools is installed
so I guess we should check for this and skip governor in such case automatically
resp. cpupowerutils for Centos 7 is part of kernel-tools ( https://pkgs.org/download/cpupowerutils )
so https://github.com/scylladb/scylla/blob/master/dist/common/scripts/scylla_cpuscaling_setup#L88
should be smarter and in case of Centos 7 install kernel-tools , or in this case kernel-ml-tools if elrepo kernel-ml is installed ...

@syuu1228 / @penberg ?

Seems like this only happens when kernel-ml-tools is already installed, since scylla_cpuscaling_setup tires to install kernel-tools unconditionally, but kernel-ml-tools conflict with kernel-tools.

I think we can avoid the error by skipping yum install when kernel-ml-tools is already there.

Note: cpupowerutils is alias of kernel-tools

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile

  • base: mirror-hk.koddos.net
  • elrepo: mirrors.thzhost.com
  • epel: fedora.ipserverone.com
  • extras: mirror-hk.koddos.net
  • updates: mirror-hk.koddos.net
    eyeota | 2.9 kB 00:00:00
    eyeota-norach | 2.9 kB 00:00:00
    Resolving Dependencies
    --> Running transaction check
    ---> Package kernel-tools.x86_64 0:3.10.0-1127.19.1.el7 will be installed
    --> Processing Dependency: kernel-tools-libs = 3.10.0-1127.19.1.el7 for package: kernel-tools-3.10.0-1127.19.1.el7.x86_64
    --> Running transaction check
    ---> Package kernel-tools-libs.x86_64 0:3.10.0-1127.19.1.el7 will be installed
    --> Processing Conflict: kernel-ml-tools-5.4.5-1.el7.elrepo.x86_64 conflicts kernel-tools < 5.4.5-1.el7.elrepo
    --> Processing Conflict: kernel-ml-tools-libs-5.4.5-1.el7.elrepo.x86_64 conflicts kernel-tools-libs < 5.4.5-1.el7.elrepo
    --> Finished Dependency Resolution
    Error: kernel-ml-tools conflicts with kernel-tools-3.10.0-1127.19.1.el7.x86_64
    Error: kernel-ml-tools-libs conflicts with kernel-tools-libs-3.10.0-1127.19.1.el7.x86_64
    You could try using --skip-broken to work around the problem
    You could try running: rpm -Va --nofiles --nodigest
    Traceback (most recent call last):
    File "/opt/scylladb/scripts/libexec/scylla_cpuscaling_setup", line 82, in
    run('yum install -y cpupowerutils')
    File "/opt/scylladb/scripts/scylla_util.py", line 342, in run
    return subprocess.check_call(cmd, shell=shell, stdout=stdout, stderr=stderr, env=scylla_env)
    File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['yum', 'install', '-y', 'cpupowerutils']' returned non-zero exit status 1.
    CPU scaling setup failed. Press any key to continue..

Since we changed the script not to run yum install unconditionally, this issue fixed by https://github.com/scylladb/scylla/commit/db9e6f50f3826b1e32782b89504b46c2bdc89094

But it's only available on master and scylla-4.3, need to backport if we want to fix the issue on previous releases

A workaround is to use
--no-cpuscaling-setup when running scylla_setup
(and then run scylla_cpuscaling setup manually)
so I don't think this needs a backport

Another workaround is removing kernel-ml-tools and kernel-ml-tools-lib and then run scylla_cpuscaling_setup again, it will install kernel-tools.

We should avoid kernel-ml. New deployments should use, in descending order of preference

  • our AMI/GCP image
  • CentOS 8 / RHEL 8
  • CentOS 7 with centos-kernel
  • RHEL 7 + kernel-ml
Was this page helpful?
0 / 5 - 0 ratings