It seems we have a few new abstraction violations for the CommandRunnerInterface.
8f0f7371a055257b4c704bad49edccbfbfc01831 (#10097) introduced run_cp_up and run_cp_down.
dc378a80b77685cc45c2b53e2f484ed9c6d8c44c (#9515) introduced run_init
cc @AmeerHajAli
@richardliaw @krfricke these abstraction violations are blocking @AmeerHajAli 's work proxying the cloud providers.
Can we either (1) revert the features that depend on this or (2) otherwise make the autoscaler / tune etc strictly use only the CommandRunnerInterface, without attempting to peek behind the scenes at the implementation class?
Cc @ijrsvt i believe docker code is relying on this too
It looks like the rsync/cp addition on the Tune end was a matter of convenience/hiding the error message.
If we switch to raising a warning via the warnings library instead of logging a warning, this would suffice.
@ericl For run_init the solution is super easy, just add run_init to the CommandRunnerInterface and have it as a noop for non Docker Command Runners.
@ijrsvt , This will make it work only in dockers only.
From Ameer in other correspondence:
In my project I use the actual command runner (with node_provider.get_command_runner(..)) and passes the arguments as is. Breaking CommandRunnerInterface by adding more functions means that my project will not have support for these functions.
For example, DockerCommandRunner adding run_init function or k8s calling run_cp_down will not work because they are not part of CommandRunnerInterface.
Perhaps we have two options (open to more):
@AmeerHajAli how so?
It would look something like this
CommandRunnerInterace:
def run_init():
pass
DockerCommandRunner(CommandRunnerInterace):
def run_init():
<setup logic>
NodeUpdater will just call self.cmd_runner.run_init() and for any CommandRunner that does not want to implement run_init() it will use the Interace's noop version.
@ijrsvt thanks for the clarification. That looks like it would work, I just have to implement run_init in my class too.
We should just make sure to not violate the CommandRunnerInterface.
@richardliaw , how about k8s run_cp_up and run_cp_down?
@AmeerHajAli Glad we are on the same page!
@ericl Is this something we should cherrypick into 1.0
I totally see why this this abstraction is important, but why cherry pick it into 1.0 unless there are known bugs?
@AmeerHajAli we can get rid of it (https://github.com/ray-project/ray/issues/10691#issuecomment-689923942)
@richardliaw I see, I can remove the external calls to run_cp_run_init part of the command runner interface.
Are you all ok with that? @ericl @richardliaw @ijrsvt @wuisawesome ?
that'd be fine; can you also please change the logger warning to a user warning? https://docs.python.org/3/library/warnings.html
That'll make it possible for us to catch.
sgtm
Can we put this on hold until after 1.0 is actually released?
This isn't doesn't affect the end user and we have a bunch of release blocker bugs in the autoscaler right now that this will almost certainly cause merge conflicts with. (unless you believe that fixing this issue will resolve bugs in 1.0)
@AmeerHajAli SGTM, @wuisawesome I don't imagine changing this (run_init) abstraction would cause many merge conflicts as it would be adding a 2-4 lines to command_runner.py and removing 2 from updater.py.
Made a PR! #10715
Most helpful comment
@AmeerHajAli SGTM, @wuisawesome I don't imagine changing this (
run_init) abstraction would cause many merge conflicts as it would be adding a 2-4 lines tocommand_runner.pyand removing 2 fromupdater.py.