ray.init() fails on macOS Big Sur

Created on 17 Nov 2020  路  4Comments  路  Source: ray-project/ray

What is the problem?

Ray version and other system information (Python version, TensorFlow version, OS):
OS version: macOS Big Sur
Ray version: should impact all versions.

Issue:
ray.init() fails when starting redis, with the following error stack.

Traceback (most recent call last):
  File "debugging.py", line 2, in <module>
    ray.init()
  File "/Users/haochen/code/ant_ray/python/ray/worker.py", line 740, in init
    ray_params=ray_params)
  File "/Users/haochen/code/ant_ray/python/ray/node.py", line 200, in __init__
    self.start_head_processes()
  File "/Users/haochen/code/ant_ray/python/ray/node.py", line 801, in start_head_processes
    self.start_redis()
  File "/Users/haochen/code/ant_ray/python/ray/node.py", line 580, in start_redis
    fate_share=self.kernel_fate_share)
  File "/Users/haochen/code/ant_ray/python/ray/_private/services.py", line 720, in start_redis
    fate_share=fate_share)
  File "/Users/haochen/code/ant_ray/python/ray/_private/services.py", line 902, in _start_redis_instance
    ulimit_n - redis_client_buffer)
  File "/Users/haochen/.pyenv/versions/3.7.6/lib/python3.7/site-packages/redis/client.py", line 1243, in config_set
    return self.execute_command('CONFIG SET', name, value)
  File "/Users/haochen/.pyenv/versions/3.7.6/lib/python3.7/site-packages/redis/client.py", line 901, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/Users/haochen/.pyenv/versions/3.7.6/lib/python3.7/site-packages/redis/client.py", line 915, in parse_response
    response = connection.read_response()
  File "/Users/haochen/.pyenv/versions/3.7.6/lib/python3.7/site-packages/redis/connection.py", line 747, in read_response
    raise response
redis.exceptions.ResponseError: The operating system is not able to handle the specified number of clients, try with -33

Digging into this issue, I found it's because resource.getrlimit(resource.RLIMIT_NOFILE)[0] (see here) now returns 9223372036854775807 on Big Sur, while it returns 256 on previous macOS versions.

Removing this line can fix this issue. @ericl @edoakes @rkooo567 Do you know what is the purpose of this code? Is it still needed?

Reproduction (REQUIRED)

Just ray.init().

  • [x] I have verified my script runs in a clean environment and reproduces the issue.
  • [x] I have verified the issue also occurs with the latest wheels.
bug triage

All 4 comments

Oh interesting. We set it before because 256 limits are usually not enough to handle all connections to Redis.

In the cluster setting, you could have many hundreds of thousands of workers, so maxclients needs to be at least that large.

Just found the reason why resource.getrlimit(resource.RLIMIT_NOFILE)[0] returns 9223372036854775807 is not related to Big Sur, it's because of this line.

It looks like you've already figured out the issue, but I'll post my error message here since it's slightly different and the last line gives an idea of what we could try setting the upper limit to (namely, 4294967295). I don't know if that makes sense though, I'm not familiar with this part of the code.

[...same as above...]
  File "/Users/archit/anaconda3/envs/ray-py36/lib/python3.6/site-packages/redis/connection.py", line 756, in read_response
    raise response
redis.exceptions.ResponseError: Invalid argument '9223372036854775775' for CONFIG SET 'maxclients' - argument must be between 1 and 4294967295 inclusive
Was this page helpful?
0 / 5 - 0 ratings