Ray: Abstraction violations in command_runner interface

Created on 10 Sep 2020  路  16Comments  路  Source: ray-project/ray

It seems we have a few new abstraction violations for the CommandRunnerInterface.

8f0f7371a055257b4c704bad49edccbfbfc01831 (#10097) introduced run_cp_up and run_cp_down.
dc378a80b77685cc45c2b53e2f484ed9c6d8c44c (#9515) introduced run_init

cc @AmeerHajAli

@richardliaw @krfricke these abstraction violations are blocking @AmeerHajAli 's work proxying the cloud providers.

Can we either (1) revert the features that depend on this or (2) otherwise make the autoscaler / tune etc strictly use only the CommandRunnerInterface, without attempting to peek behind the scenes at the implementation class?

P0 regression

Most helpful comment

@AmeerHajAli SGTM, @wuisawesome I don't imagine changing this (run_init) abstraction would cause many merge conflicts as it would be adding a 2-4 lines to command_runner.py and removing 2 from updater.py.

All 16 comments

Cc @ijrsvt i believe docker code is relying on this too

It looks like the rsync/cp addition on the Tune end was a matter of convenience/hiding the error message.

If we switch to raising a warning via the warnings library instead of logging a warning, this would suffice.

@ericl For run_init the solution is super easy, just add run_init to the CommandRunnerInterface and have it as a noop for non Docker Command Runners.

@ijrsvt , This will make it work only in dockers only.

From Ameer in other correspondence:

In my project I use the actual command runner (with node_provider.get_command_runner(..)) and passes the arguments as is. Breaking CommandRunnerInterface by adding more functions means that my project will not have support for these functions.

For example, DockerCommandRunner adding run_init function or k8s calling run_cp_down will not work because they are not part of CommandRunnerInterface.

Perhaps we have two options (open to more):

  1. strictly follow and implement only the CommandRunnerInterface.
  2. My project will subclass DockerCommandRunner only.

@AmeerHajAli how so?
It would look something like this

CommandRunnerInterace:
   def run_init():
      pass


DockerCommandRunner(CommandRunnerInterace):
   def run_init():
      <setup logic>

NodeUpdater will just call self.cmd_runner.run_init() and for any CommandRunner that does not want to implement run_init() it will use the Interace's noop version.

@ijrsvt thanks for the clarification. That looks like it would work, I just have to implement run_init in my class too.
We should just make sure to not violate the CommandRunnerInterface.
@richardliaw , how about k8s run_cp_up and run_cp_down?

@AmeerHajAli Glad we are on the same page!
@ericl Is this something we should cherrypick into 1.0

I totally see why this this abstraction is important, but why cherry pick it into 1.0 unless there are known bugs?

@AmeerHajAli we can get rid of it (https://github.com/ray-project/ray/issues/10691#issuecomment-689923942)

@richardliaw I see, I can remove the external calls to run_cp_ and make run_init part of the command runner interface.

Are you all ok with that? @ericl @richardliaw @ijrsvt @wuisawesome ?

that'd be fine; can you also please change the logger warning to a user warning? https://docs.python.org/3/library/warnings.html

That'll make it possible for us to catch.

sgtm

Can we put this on hold until after 1.0 is actually released?

This isn't doesn't affect the end user and we have a bunch of release blocker bugs in the autoscaler right now that this will almost certainly cause merge conflicts with. (unless you believe that fixing this issue will resolve bugs in 1.0)

@AmeerHajAli SGTM, @wuisawesome I don't imagine changing this (run_init) abstraction would cause many merge conflicts as it would be adding a 2-4 lines to command_runner.py and removing 2 from updater.py.

Made a PR! #10715

Was this page helpful?
0 / 5 - 0 ratings