Scylla: Docker images are broken

Created on 23 Jan 2020  ยท  25Comments  ยท  Source: scylladb/scylla

This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at [email protected] or in our slack channel.

  • [ x ] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version (or git commit hash): docker image: scylladb/scylla:3.2.0-202001200218
Cluster size: local machiine
OS (RHEL/CentOS/Ubuntu/AWS AMI): ArchLinux (dev environment)

I'm trying to setup Scylladb to replace our Cassandra, but so far I was unable to run a recent docker image. I tryied all 3.2 variants and all gave errors.

The command I used:

$ sudo docker run --name scylla -d scylladb/scylla:3.2.0

The full output is attached, here are the last lines:

ERROR 2020-01-23 18:15:50,431 [shard 0] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
ERROR 2020-01-23 18:15:50,431 [shard 0] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2020-01-23 18:15:50,534 INFO exited: scylla (exit status 1; not expected)
2020-01-23 18:15:51,539 INFO spawned: 'scylla' with pid 86
Scylla version 3.2.0-0.20200115.f9b11c9b30 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.2 --rpc-address 172.17.0.2 --seed-provider-parameters seeds=172.17.0.2 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.2, rpc-address: 172.17.0.2, seed-provider-parameters: seeds=172.17.0.2, blocked-reactor-notify-ms: 999999999]
ERROR 2020-01-23 18:15:51,908 [shard 0] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2020-01-23 18:15:52,003 INFO exited: scylla (exit status 1; not expected)
Connecting to http://localhost:10000
Starting the JMX server
2020-01-23 18:15:53,228 INFO spawned: 'scylla' with pid 117
Scylla version 3.2.0-0.20200115.f9b11c9b30 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.2 --rpc-address 172.17.0.2 --seed-provider-parameters seeds=172.17.0.2 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.2, rpc-address: 172.17.0.2, seed-provider-parameters: seeds=172.17.0.2, blocked-reactor-notify-ms: 999999999]
ERROR 2020-01-23 18:15:53,571 [shard 0] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
ERROR 2020-01-23 18:15:53,571 [shard 0] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2020-01-23 18:15:53,641 INFO exited: scylla (exit status 1; not expected)
JMX is enabled to receive remote connections on port: 7199
2020-01-23 18:15:55,683 INFO spawned: 'scylla' with pid 141
Scylla version 3.2.0-0.20200115.f9b11c9b30 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.2 --rpc-address 172.17.0.2 --seed-provider-parameters seeds=172.17.0.2 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.2, rpc-address: 172.17.0.2, seed-provider-parameters: seeds=172.17.0.2, blocked-reactor-notify-ms: 999999999]
ERROR 2020-01-23 18:15:56,161 [shard 0] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2020-01-23 18:15:56,300 INFO exited: scylla (exit status 1; not expected)
2020-01-23 18:15:59,307 INFO spawned: 'scylla' with pid 165
Scylla version 3.2.0-0.20200115.f9b11c9b30 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.2 --rpc-address 172.17.0.2 --seed-provider-parameters seeds=172.17.0.2 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.2, rpc-address: 172.17.0.2, seed-provider-parameters: seeds=172.17.0.2, blocked-reactor-notify-ms: 999999999]
ERROR 2020-01-23 18:15:59,755 [shard 0] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2020-01-23 18:15:59,824 INFO exited: scylla (exit status 1; not expected)
2020-01-23 18:16:00,825 INFO gave up: scylla entered FATAL state, too many start retries too quickly

There are also some python errors:

multiprocessing.pool.RemoteTraceback:
 """
 Traceback (most recent call last):
   File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1317, in do_open
     encode_chunked=req.has_header('Transfer-encoding'))
   File "/opt/scylladb/python3/lib64/python3.7/http/client.py", line 1244, in request
     self._send_request(method, url, body, headers, encode_chunked)
   File "/opt/scylladb/python3/lib64/python3.7/http/client.py", line 1290, in _send_request
     self.endheaders(body, encode_chunked=encode_chunked)
   File "/opt/scylladb/python3/lib64/python3.7/http/client.py", line 1239, in endheaders
     self._send_output(message_body, encode_chunked=encode_chunked)
   File "/opt/scylladb/python3/lib64/python3.7/http/client.py", line 1026, in _send_output
     self.send(msg)
   File "/opt/scylladb/python3/lib64/python3.7/http/client.py", line 966, in send
     self.connect()
   File "/opt/scylladb/python3/lib64/python3.7/http/client.py", line 938, in connect
     (self.host,self.port), self.timeout, self.source_address)
   File "/opt/scylladb/python3/lib64/python3.7/socket.py", line 727, in create_connection
     raise err
   File "/opt/scylladb/python3/lib64/python3.7/socket.py", line 716, in create_connection
     sock.connect(sa)
 OSError: [Errno 99] Cannot assign requested address

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/opt/scylladb/python3/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
     result = (True, func(*args, **kwds))
   File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 64, in get_url
     return urllib.request.urlopen(path).read().decode('utf-8')
   File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen
     return opener.open(url, data, timeout)
   File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open
     response = self._open(req, data)
   File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open
     '_open', req)
   File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain
     result = func(*args)
   File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1345, in http_open
     return self.do_open(http.client.HTTPConnection, req)
   File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open
     raise URLError(err)
 urllib.error.URLError: <urlopen error [Errno 99] Cannot assign requested address>
 """

 The above exception was the direct cause of the following exception:

 Traceback (most recent call last):
   File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 197, in <module>
     args.func(args)
   File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 123, in check_version
     current_version = sanitize_version(get_api('/storage_service/scylla_release_version'))
   File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 82, in get_api
     return get_json_from_url("http://" + api_address + path)
   File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 74, in get_json_from_url
     retval = result.get(timeout=5)
   File "/opt/scylladb/python3/lib64/python3.7/multiprocessing/pool.py", line 657, in get
     raise self._value
 urllib.error.URLError: <urlopen error [Errno 99] Cannot assign requested address>
[errors.txt](https://github.com/scylladb/scylla/files/4104787/errors.txt)

bug onboarding

Most helpful comment

I can also confirm it starts to work applying this:

โžœ  ~ sudo emacs /etc/sysctl.conf
โžœ  ~ sudo sysctl -p           
fs.aio-max-nr = 1048576
โžœ  ~ cat /proc/sys/fs/aio-max-nr
1048576

After this change, no restart, it works fine.

All 25 comments

Some more info:

filesystem: ext4
AIO things:
$ cat /proc/sys/fs/aio-max-nr
65536

which version of docker you are using ? can you supply more detail on the machine number of CPUs which OS it is ?

just for reference, I'm running with Ubuntu 19.10 with those version and /proc/sys/fs/aio-max-nr, and can run the docker image o.k.

$ docker version
Client: Docker Engine - Community
 Version:           19.03.3
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        a872fc2f86
 Built:             Tue Oct  8 01:00:44 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.3
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       a872fc2f86
  Built:            Tue Oct  8 00:59:17 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

$ cat /proc/sys/fs/aio-max-nr
1048576

Running Docker on ArchLinux:

$ uname -a
Linux perseu 5.4.13-arch1-1 #1 SMP PREEMPT Fri, 17 Jan 2020 23:09:54 +0000 x86_64 GNU/Linux

$ docker version
Client:
 Version:           19.03.5-ce
 API version:       1.40
 Go version:        go1.13.4
 Git commit:        633a0ea838
 Built:             Fri Nov 15 03:19:09 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.5-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.4
  Git commit:       633a0ea838
  Built:            Fri Nov 15 03:17:51 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.3.2.m
  GitCommit:        d50db0a42053864a270f648048f9a8b4f24eced3.m
 runc:
  Version:          1.0.0-rc9
  GitCommit:        d736ef14f0288d6993a1845745d6756cfc9ddd5a
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

CPU is an i7-7700 CP with 4 cores (8 threads)

I can confirm this happens for me as well on an up to date Arch. Running with --smp 2 works fine. On my 8 core system I can at max have --smp 5. Anything higher triggers the error. I have around 65k in /proc/sys/fs/aio-max-nr.

I can also confirm it starts to work applying this:

โžœ  ~ sudo emacs /etc/sysctl.conf
โžœ  ~ sudo sysctl -p           
fs.aio-max-nr = 1048576
โžœ  ~ cat /proc/sys/fs/aio-max-nr
1048576

After this change, no restart, it works fine.

Changing fs.aio-max-nr worked for me also.

@dahankzter how did you run with different --smp? You made a custom image?

No @heitorPB I did not build a special image. You can just append any scylla argument at the end of the docker run command. It will forward it to scylla automatically.

Thanks @dahankzter !

I managed to do it inside a docker-compose.yml:

version: "3.4"

services:
  scylla:
    image: scylladb/scylla:3.2.0
    container_name: scylla
    command:
      - "--smp"
      - "2"
    ports:
      - "9042:9042"

@avikivity we have --developer-mode=1 set is this a check that's being applied although developer mode is set ?

@slivne I reopened and updated scylladb/seastar#640

fixed by da00530464779bb1d3e6757e2e804ca909ee0e5d

I can also confirm it starts to work applying this:

โžœ  ~ sudo emacs /etc/sysctl.conf
โžœ  ~ sudo sysctl -p           
fs.aio-max-nr = 1048576
โžœ  ~ cat /proc/sys/fs/aio-max-nr
1048576

After this change, no restart, it works fine.

thanks a lot, it solved my problem after upgrading to Ubuntu 20.04 from 18, where /etc/sysctl.conf was replaced

@yckbilly1929 which scylla version are you running - this was supposed to be fixed in 3.3 and 4.0 and work out of the box.

@yckbilly1929 which scylla version are you running - this was supposed to be fixed in 3.3 and 4.0 and work out of the box.

i am working on a local machine and installed through container image
version:
3.2.3-0.20200315.89deac77958

though it is not caused by scylla, just in case anyone faces similar problem after upgrading to ubuntu 20, here is what I have done to make it work again

  1. recreate ssh key to fix connection error (java.net.ConnectException: Connection refused)
ssh-keygen -f /etc/ssh/ssh_host_ecdsa_key -t ecdsa
  1. update /etc/sysctl.conf which is being replaced during upgrade

@yckbilly1929 thanks, it is indeed not fixed in 3.2 (and will not be) so all is clear.

Please note 3.2 reached its End Of Life. We only support two latest releases (currently 3.3, 4.0).

Already backported; removing label.

I can also confirm it starts to work applying this:

โžœ  ~ sudo emacs /etc/sysctl.conf
โžœ  ~ sudo sysctl -p           
fs.aio-max-nr = 1048576
โžœ  ~ cat /proc/sys/fs/aio-max-nr
1048576

After this change, no restart, it works fine.

after applying this, my issue is fixed,thanks

Looks like this still doesn't work by default:

# cat /proc/sys/fs/aio-max-nr   
65536
# docker run -it --rm scylladb/scylla:4.3.1
running: (['/opt/scylladb/scripts/scylla_dev_mode_setup', '--developer-mode', '1'],)
...
Scylla version 4.3.1-0.20210210.46650adcd with build-id bfaa518f22098de476b961b707925c71441bb81b starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.10 --rpc-address 172.17.0.10 --seed-provider-parameters seeds=172.17.0.10 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.10, rpc-address: 172.17.0.10, seed-provider-parameters: seeds=172.17.0.10, blocked-reactor-notify-ms: 999999999]
Connecting to http://localhost:10000
Starting the JMX server
ERROR 2021-02-22 09:09:38,501 [shard 2] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2021-02-22 09:09:38,566 INFO exited: scylla (exit status 1; not expected)

If I raise aio-max-nr with sysctl, it works fine.

Tested with Ubuntu 20.04.2 LTS in a VM, (8 cpus, passthrough)

Thanks @tnozicka.
Since all of our U. examples use docker (version 4.1), I'd like to make sure our trainees don't get stuck.
@slivne WDYT?

It is still broken

Still broken on scylla v4.4.0

It worked for me with podman (Fedora 32).
@penberg what's your take on it?

Looks like this still doesn't work by default:

# cat /proc/sys/fs/aio-max-nr   
65536
# docker run -it --rm scylladb/scylla:4.3.1
running: (['/opt/scylladb/scripts/scylla_dev_mode_setup', '--developer-mode', '1'],)
...
Scylla version 4.3.1-0.20210210.46650adcd with build-id bfaa518f22098de476b961b707925c71441bb81b starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.10 --rpc-address 172.17.0.10 --seed-provider-parameters seeds=172.17.0.10 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.10, rpc-address: 172.17.0.10, seed-provider-parameters: seeds=172.17.0.10, blocked-reactor-notify-ms: 999999999]
Connecting to http://localhost:10000
Starting the JMX server
ERROR 2021-02-22 09:09:38,501 [shard 2] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2021-02-22 09:09:38,566 INFO exited: scylla (exit status 1; not expected)

If I raise aio-max-nr with sysctl, it works fine.

Tested with Ubuntu 20.04.2 LTS in a VM, (8 cpus, passthrough)

Perhaps we should have seastar fall back to epoll if there aren't enough aio slots configured.

Was this page helpful?
0 / 5 - 0 ratings