@matthewrmshin has suggested removing ssh comms as a part of #2966.
Is there still a strong requirement to support ssh as a communication method?
@cylc/core what use cases for SSH exist, would we be able to rely on ZMQ over TCP in all situations?
Being nosy here, even though I don't know anything about the use cases for SSH, sorry... But would it be an alternative to support both using 0mq tunnel via ssh?
@kinow - in our "ssh task communication" method, remote job-invoked CLI commands (mostly "cylc message") use ssh to re-invoke themselves on the suite host, instead of connecting directly to the suite API from the job host. (An option if system admins won't open the right ports for normal job status messaging but do allow ssh back, and if you don't want to use one-way polling to track job status).
If we have to continue to maintain such a mechanism, I suppose 0mq tunnel over ssh might be better. If the client interface at the remote end can open an ssh tunnel on demand and then proceed as normal, presumably that would be simpler and/or cleaner than what I just described above. Is that what you mean?
Maybe we can just drop support though...
@oliver-sanders @matthewrmshin - I'd certainly be happy to drop support for this. I'll ping the mail forum to see if anyone is relying on it...
Thanks for the explanation @hjoliver ! Yup, we would maintain just the 0mq interface, and the client would have to configure SSH in their end to make sure 0mq tunnel would work.
At NCI there is a firewall between the compute nodes and the cloud system where we run Cylc. The sysadmins have added an exception for ssh communication to the cylc server. Asking for another port or two in the firewall probably wouldn't be an issue, but my understanding is that cylc currently opens a new port for each running suite.
Using polling isn't desirable as the PBS queue system is already rather overloaded, and the sysadmins prefer we don't add to the load on the system.
Tunnelling the 0mq connections over ssh sounds fine to me.
PML too (from mail forum):
We also are using this at PML as we couldn't get direct messaging to work with our cluster. We didn't try very hard for more elegant solutions once this worked though so we can probably work out a way without if we take another look.
@kinow - in light of @ScottWales comment above on number of open ports needed, would "0mq tunnel over ssh" use one port (i.e. a single ssh tunnel for multiple cylc job clients)?
(I suppose cylc-8 "single point of access" hub/proxy would fix the many ports problem (well, "problem"-ish) too, but I don't think we've decided yet for sure if job clients can go via the hub/proxy, or direct to the target workflow service.)
And from @dvalters -
We use this method with many of the Edinburgh Uni servers as I
couldn't seem to get direct messaging configured to work properly. I
guess we could use one-way polling as a fall back...
To be fair, we are not huge Cylc users in the grand scheme of things, but would be nice to keep it as an option if possible. (Saves me an afternoon re-configuring things!)
would "0mq tunnel over ssh" use one port (i.e. a single ssh tunnel for multiple cylc job clients)?
I haven't used 0mq with the SSH tunnel to give 100% certainty it would work @hjoliver , but in theory I believe it should.
Unless they did something different in the way they implemented, it should use a normal SSH tunnel connecting a local TCP socket to a remote TCP socket via SSH. So the only port required to be open in the firewall would be tcp/22 for SSH (though it should be possible to change it too).
Once Python3 is merged, I can play with it using those Docker containers to confirm (and also to learn about this SSH communication method in Cylc, which I haven't used I think).
The ssh tunnel offered by pyzmq has the potential to niceify the problem somewhat, it's pretty easy to implement our own ssh tunnel if necessary (I have already [crudely] done this whilst working on #2966 )
So, how should we proceed here? Can drop this for now, and come back to implement the ssh tunnel solution on top of the finished product if necessary (e.g. if some key sites can't get their system admins to support vanilla cylc-8)?
That would be my default position, but it does not appear to be popular at the moment. So we may just have to put back support for SSH using the old way or document how to do it via SSH tunneling.
Superseded by #3327