Flux-core: connector: add version negotiation for tools connecting to broker

Created on 21 Aug 2020  路  13Comments  路  Source: flux-framework/flux-core

We've had a few mishaps involving version-mismatched flux components.

One thought is to add version negotiation to flux_open(). For performance, we could have the broker end send an unsolicited version message to each new connection, and then the client could accept or reject the connection, according to the rules of semantic versioning. I guess flux_open() would block until this message is received.

It may be useful to allow this message to contain arbitrary key-value pairs, so that in the future we could use it to transmit other info from server to a fresh client, instead of using the environment.

I'm not sure if this covers all cases. Some areas to think more carefully about:

  • flux-proxy, which uses a remote process to open local connector
  • mismatch of client-side connector plugin with client
  • what happens during a system instance rolling upgrade
  • how to get a reasonable human-readable error to application flux_open() failure
  • are there situations where we might want to relax this check? How to signal that to lfux_open()?

All 13 comments

It may be useful to allow this message to contain arbitrary key-value pairs, so that in the future we could use it to transmit other info from server to a fresh client, instead of using the environment.

Very good idea!

are there situations where we might want to relax this check?

Eventually don't we want a set of flux-core versions to be operable with each other? In reality the problems with version mismatch are more subtle than just the flux-core semantic version number. However, I do agree for now a strict version check would help a lot. At some future time we could relax the strict version check to be a range of acceptable versions (which would be a pain to keep up to date).

Would it be sufficient to say that, for major version 0, an exact match is required, but for major version > 0, the major version must match? Roughly following semantic versioning rules of

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

but substituting protocol for API?

Even if that works within flux-core, we still have the problem of checking other framework projects. Maybe those projects could record the version of flux-core they were built with and check that as well? Somehow?

Yeah, I was thinking of something like that.

For major version > 0 we'll also have to be a lot better at keeping the ABI versions up-to-date.

It seems like for now we can do something simple and have a nice impact.

Even if that works within flux-core, we still have the problem of checking other framework projects.

Can you give an example here? Don't all framework projects have to go through libflux-core in some way to access the instance and/or provide services via modules within that instance? (Sorry, I'm probably missing something obvious)

It does seem like more "services" should have versioned protocols as well, though...

It does seem like more "services" should have versioned protocols as well, though...

Yeah that's what I was thinking of, and that may be the solution. For example, sched assumes a particular job-manager protocol. It could be out of date with respect to the job manager in flux-core, but compiled against a compatible flux-core that would pass the flux_open() negotiation.

It's a slightly interesting data point that libschedutil wraps up the job manager / scheduler _protocol_, so in that case, we _could_ use shared library versioning to manage change.

Thanks! We're on the same page then.

I think in the latest failing case, the actual error encountered was in parsing the job exec eventlog context. Just as a reminder that eventlogs are a kind of protocol as well.

Great point. Since the eventlogs might persist in an archive long after flux versions have rolled on, those definitely need long term support and versioning.

It's a slightly interesting data point that libschedutil wraps up the job manager / scheduler protocol, so in that case, we could use shared library versioning to manage change.

A quick note: qmanager module does this but not fluxion-resource piece.

Yep. I put it out there as food for thought. I suppose it's one way to strengthen resiliency to protocol changes, since some protocol changes might not need to break the API.

At first glance, I have to think that having a way to check the version explicit in the protocol would be more future proof than wrapping them into libraries.

If we ever have to wrap the services for outside like REST or grpc, I am not sure if the simple library approach would be flexible enough.

Cloud native applications often wrap the target "version" into their "resource" and call different service paths...
https://kubernetes.io/docs/concepts/overview/kubernetes-api/

Specific use cases missing though. @milroy will soon play with creating a mock REST adapter within Flux to talk to k8s api service. So we can have a bit more concete cases on how this will pan out.

It's a slightly interesting data point that libschedutil wraps up the job manager / scheduler protocol, so in that case, we could use shared library versioning to manage change.

I really liked the fact that libschedutil simplified how fluxion talks to job manager. However, one disadvantage was that I (will) have to put different exception handing logics for the schedutil callbacks vs. native rpc callbacks.

This would be a unique problem with a service written in C++. But my guess is services written in different languages may see similar problems.

One of the considerations for wrapping services for framework projects.

Sounds good. I definitely am on board with versioning protocols, just pointing out that there are sometimes other options that may be appropriate to deploy in critical situations.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

grondo picture grondo  路  7Comments

garlick picture garlick  路  8Comments

garlick picture garlick  路  4Comments

cmoussa1 picture cmoussa1  路  6Comments

chu11 picture chu11  路  3Comments