Cylc-flow: zmq: network layer improvements

Created on 7 Mar 2019  路  14Comments  路  Source: cylc/cylc-flow

Master issue serving as a follow-on from #2966 for the ZMQ interface.

  • [x] Update the site configuration to the new min/max port system https://github.com/cylc/cylc/pull/3082#discussion_r273419929.
  • [ ] Tidy endpoints (e.g. rename ping_suite to ping)
  • [ ] Implement TCP via SSH functionality? #3327 (see #2975)
  • [ ] Address encryption #3298

    • Current implementation uses JSON web tokens (JWT), this takes a few lines of Python and is a quick placeholder



      • Are we happy moving forward with JWT, perhaps sufficient.


      • ZMQ has recently(ish) acquired "curve" based auth, might this be better



    • Current implementation encrypts all messages with the suite passphrase



      • The encryption should change with time and offer forward security



  • [ ] Convert ZMQ server "main loop" from it's current (temporary) Thread implantation to asyncio #3328

    • Should the zmq server main loop run in the same thread/process as the scheduler?



      • If so this requires #2866





        • Server actions would be able to suspend zmq server interaction so perhaps this isn't the best idea






  • [ ] Move to REQ-REP-REP-REP... pattern (REQ-SUB) #3329

    • Add an option allowing the client to subscribe for updates on queued commands e.g.:



      • $ cylc stop suite


      • contacting...


      • command queued...


      • waiting for running tasks to complete before shutting down...


      • DONE



    • Upgrade the zmq client to handle simultaneous async requests without having to spin up threads/processes

    • See also #423

  • [ ] Open interface for suite polling (relies on previous)

    • Efficient inter-suite triggering using PUSH notification.

    • Should be possible with minimal code.

  • [x] Document endpoints

    • Add docstrings documenting input and return types

    • Use this to auto-generate API reference pages in Sphinx (see sphinx.napoleon)

superseded

All 14 comments

Assigning myself as I've looked into the REQ-REP-REP-REP... matter, however, there is plenty of room for more people...

Is this the kind of thing you are talking about? https://realpython.com/async-io-python/#chaining-coroutines

Is this the kind of thing you are talking about?

In relation to what?

Your REQ-REP-REP-REP... pattern.

No.

Care to explain?

REQ-REP-REP-REP...
  • ATM we have a REQ-REP setup (as inherited from the old REST API).
  • The REQ-REP pattern restricts you to only one response so we are unable to update the client as the to status of their request, they are left with an unhelpful "command queued" message.
  • A new pattern is needed to handle this.
  • Add an option allowing the client to subscribe for updates on queued tasks e.g.:

I might be off base here, but could the client be another suite, so that this could serve as a mechanism for inter-suite triggering #2798?

I might be off base here, but could the client be another suite, so that this could serve as a mechanism for inter-suite triggering #2798?

Bad wording on my part, by "tasks" I was referring to "commands" e.g. stop_suite or ping_suite though, yes, this could potentially be utilised for efficient inter-suite triggers.

Bad wording on my part, by "tasks" I was referring to "commands" e.g. stop_suite or ping_suite though, yes, this could potentially be utilised for efficient inter-suite triggers.

In that case it's a good wording on your part :wink:

If others agree, could we not loose this from sight? I really hate polling solutions and hope we can get something better that is also operationally acceptable on our site (there is permissions issue to consider of research suites triggering off operational ones).

I'll add inter-suite subscription to the issue description. This is much the same functionality as the subscription the UI server will require so we will be developing this set-up anyway.

I really hate polling solutions ...

I've never entirely understood the common aversion to polling. Sometimes it is the only viable solution (when the upstream system has no ability to interact with the dependent system). And if the polling mechanism has no significant impact on performance or network traffic, then what's the problem exactly? (Maybe it does have a significant negative impact in your cases @TomekTrzeciak ?)

That said, when two-way interaction is possible I certainly agree that is better than one-way polling, and that is (now) a possibility for inter-suite triggering ... so that's fine.

I've never entirely understood the common aversion to polling

It shouldn't really be an issue for most cases.

One issue is that it doesn't work well if you have lots of short tasks. This is actually a baby step towards sub-suites.

Closing this issue as superseded by:

  • #3298
  • #3327
  • #3328
  • #3329
  • #3330
  • #3331
Was this page helpful?
0 / 5 - 0 ratings