Ray: Abstraction violations in command_runner interface

Created on 10 Sep 2020 · 16Comments · Source: ray-project/ray

It seems we have a few new abstraction violations for the CommandRunnerInterface.

8f0f7371a055257b4c704bad49edccbfbfc01831 (#10097) introduced run_cp_up and run_cp_down.
dc378a80b77685cc45c2b53e2f484ed9c6d8c44c (#9515) introduced run_init

cc @AmeerHajAli

@richardliaw @krfricke these abstraction violations are blocking @AmeerHajAli 's work proxying the cloud providers.

Can we either (1) revert the features that depend on this or (2) otherwise make the autoscaler / tune etc strictly use only the CommandRunnerInterface, without attempting to peek behind the scenes at the implementation class?

P0 regression

Source

ericl

Most helpful comment

@AmeerHajAli SGTM, @wuisawesome I don't imagine changing this (run_init) abstraction would cause many merge conflicts as it would be adding a 2-4 lines to command_runner.py and removing 2 from updater.py.

ijrsvt on 10 Sep 2020

👍2

All 16 comments

Cc @ijrsvt i believe docker code is relying on this too

wuisawesome on 10 Sep 2020

It looks like the rsync/cp addition on the Tune end was a matter of convenience/hiding the error message.

If we switch to raising a warning via the warnings library instead of logging a warning, this would suffice.

richardliaw on 10 Sep 2020

@ericl For run_init the solution is super easy, just add run_init to the CommandRunnerInterface and have it as a noop for non Docker Command Runners.

ijrsvt on 10 Sep 2020

@ijrsvt , This will make it work only in dockers only.

AmeerHajAli on 10 Sep 2020

From Ameer in other correspondence:

In my project I use the actual command runner (with node_provider.get_command_runner(..)) and passes the arguments as is. Breaking CommandRunnerInterface by adding more functions means that my project will not have support for these functions.

For example, DockerCommandRunner adding run_init function or k8s calling run_cp_down will not work because they are not part of CommandRunnerInterface.

Perhaps we have two options (open to more):

strictly follow and implement only the CommandRunnerInterface.
My project will subclass DockerCommandRunner only.

richardliaw on 10 Sep 2020

@AmeerHajAli how so?
It would look something like this

CommandRunnerInterace:
   def run_init():
      pass


DockerCommandRunner(CommandRunnerInterace):
   def run_init():
      <setup logic>

NodeUpdater will just call self.cmd_runner.run_init() and for any CommandRunner that does not want to implement run_init() it will use the Interace's noop version.

ijrsvt on 10 Sep 2020

@ijrsvt thanks for the clarification. That looks like it would work, I just have to implement run_init in my class too.
We should just make sure to not violate the CommandRunnerInterface.
@richardliaw , how about k8s run_cp_up and run_cp_down?

AmeerHajAli on 10 Sep 2020

@AmeerHajAli Glad we are on the same page!
@ericl Is this something we should cherrypick into 1.0

ijrsvt on 10 Sep 2020

I totally see why this this abstraction is important, but why cherry pick it into 1.0 unless there are known bugs?

wuisawesome on 10 Sep 2020

@AmeerHajAli we can get rid of it (https://github.com/ray-project/ray/issues/10691#issuecomment-689923942)

richardliaw on 10 Sep 2020

@richardliaw I see, I can remove the external calls to run_cp_ and make run_init part of the command runner interface.

Are you all ok with that? @ericl @richardliaw @ijrsvt @wuisawesome ?

AmeerHajAli on 10 Sep 2020

👍1

that'd be fine; can you also please change the logger warning to a user warning? https://docs.python.org/3/library/warnings.html

That'll make it possible for us to catch.

richardliaw on 10 Sep 2020

sgtm

wuisawesome on 10 Sep 2020

Can we put this on hold until after 1.0 is actually released?

This isn't doesn't affect the end user and we have a bunch of release blocker bugs in the autoscaler right now that this will almost certainly cause merge conflicts with. (unless you believe that fixing this issue will resolve bugs in 1.0)

wuisawesome on 10 Sep 2020