Community: Kubernetes LTS Brainstorming

Created on 28 Sep 2018  Â·  29Comments  Â·  Source: kubernetes/community

Placeholder to tie together some of the threads on the potential of having Kubernetes LTS (Long-term support) releases:

This should coalesce into a KEP and we should find SIG / subproject ownership for it. Arch? Release? PM?

/kind feature
/sig architecture release pm
/lifecycle frozen

cc: @timothysc @dims @jimangel @tpepper @BenTheElder @detiber @neolit123

kinfeature lifecyclfrozen siarchitecture sirelease

Most helpful comment

While I think we need some story around LTS, I also do not think it is
something that an open community of volunteers naturally gravitates towards.

If done poorly (actually, anything less than nearly-perfect) LTS may become
an actively harmful thing. It is the thing users ask for, but I still
question whether it is what they need.

On Mon, Oct 1, 2018 at 1:45 PM Timothy St. Clair notifications@github.com
wrote:

@thockin https://github.com/thockin and I have have been in violent
agreement on the idea, although we differ on how to get there.

I'm not a huge fan of LTS > 1 year for cluster managers, and this has been
set forth by precedent in multiple projects.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/community/issues/2720#issuecomment-426056352,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVNrCtdywHRbU8G6iiiz8zCByNydAks5ugn7igaJpZM4W-_kw
.

All 29 comments

/cc @imkin
Dhawal Bhanushali is a VMware engineer interested in LTS

Kubecon 2017 talk on kernel vs. distro and the need for different release cadences: EDIT sched is annoying when it comes to copy-pasting uri's

Kubernetes: Kernels and Distros

@thockin and I have have been in violent agreement on the idea, although we differ on how to get there.

I'm not a huge fan of LTS > 1 year for cluster managers, and this has been set forth by precedent in multiple projects.

While I think we need some story around LTS, I also do not think it is
something that an open community of volunteers naturally gravitates towards.

If done poorly (actually, anything less than nearly-perfect) LTS may become
an actively harmful thing. It is the thing users ask for, but I still
question whether it is what they need.

On Mon, Oct 1, 2018 at 1:45 PM Timothy St. Clair notifications@github.com
wrote:

@thockin https://github.com/thockin and I have have been in violent
agreement on the idea, although we differ on how to get there.

I'm not a huge fan of LTS > 1 year for cluster managers, and this has been
set forth by precedent in multiple projects.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/community/issues/2720#issuecomment-426056352,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVNrCtdywHRbU8G6iiiz8zCByNydAks5ugn7igaJpZM4W-_kw
.

Based on Tim's prompt "I still question whether it is what they need".
Here's what i would ask the folks who say they want LTS.

  • Do they want us to support more than 3 versions at a time?
  • Do they want in-place upgrades? (instead of creating new clusters)
  • Do they want to skip versions (say go from 1.9 to 1.11)
  • When we drop support for a 1.x version (say 1.9), If we provide
    documentation and scripts(?) to upgrade directly from that version to
    latest (1.9 -> 1.12), Is that enough?

Am sure we can formulate more questions like the ones above. We can take a
poll and figure out what exactly folks want instead of the
full-kitchen-sink-that-comes-with-a-LTS-label.

We have 2 upcoming kubecons to get this feedback right?

Thanks,
Dims

On Mon, Oct 1, 2018 at 5:01 PM Tim Hockin notifications@github.com wrote:

While I think we need some story around LTS, I also do not think it is
something that an open community of volunteers naturally gravitates
towards.

If done poorly (actually, anything less than nearly-perfect) LTS may become
an actively harmful thing. It is the thing users ask for, but I still
question whether it is what they need.

On Mon, Oct 1, 2018 at 1:45 PM Timothy St. Clair >
wrote:

@thockin https://github.com/thockin and I have have been in violent
agreement on the idea, although we differ on how to get there.

I'm not a huge fan of LTS > 1 year for cluster managers, and this has
been
set forth by precedent in multiple projects.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<
https://github.com/kubernetes/community/issues/2720#issuecomment-426056352
,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AFVgVNrCtdywHRbU8G6iiiz8zCByNydAks5ugn7igaJpZM4W-_kw

.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/community/issues/2720#issuecomment-426061593,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABbCCTZwl7f5ECB3wIW7gL-duUrkM45ks5ugoKwgaJpZM4W-_kw
.

--
Davanum Srinivas :: https://twitter.com/dims

@dims -- Yep. This sounds like it could benefit from a survey. Let's see what we can do to mock something and then gather community feedback.

Another important question, IMO:

In an ideal world, what does compatibility really mean? How much does the
Kubernetes version matter vs the individual API versions?

On Mon, Oct 1, 2018 at 3:44 PM Stephen Augustus notifications@github.com
wrote:

@dims https://github.com/dims -- Yep. This sounds like it could benefit
from a survey. Let's see what we can do to mock something and then gather
community feedback.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/community/issues/2720#issuecomment-426088803,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVFYdh4nh3VFkN3RaIY8HNA1BMBpBks5ugprSgaJpZM4W-_kw
.

I totally agree this is not something a community of volunteers gravitate towards. I see a more likely outcome being more a vendor coalition. If there are people the vendors are today or will be in the future paying to do long term support anyway, what portion of their work is redundant and could better be done in a shared fashion?

One recent concrete example: check out Greg KH's talks on the linux kernel, LTS, and Spectre/Meltdown patches (eg: https://www.youtube.com/watch?v=lQZzm9z8g_U). He argues the distro's on their own don't get the most optimal support outcomes and where they're not working in common to solve thorny problems once, the backports are slow and fraught and tremendously expensive to accomplish. Yet they attempt it because there is user demand for it. Can we channel that to a common cause?

If one accepts there is some valid user demand for longer term support than 3-6months stable production (what I'd argue you get today on a 9mo support cycle, given time to fully get onto a new release and later to get off of it before it is EOL), then a well run single source of the core's long term support is the least expensive approach for the whole of the ecosystem, can encourage conformance, can diminish fragmentation, and is the most likely path to pragmatically achieve high quality in the effort, compared to each vendor doing their own thing.

A couple of us are hashing a proposal for the next SIG Release meeting: spin off a WG which pulls in broad stakeholder representation from our own k8s devs, the user/operator ranks, and vendors. If there's something there for requirements and the compromises can be sufficiently balanced, then turn that into an KEP for SIG Release to implement.

I'm concerned that formal support for LTS versions of Kubernetes is doing a disservice to customers. I see customers often migrating from some legacy private cloud environment, lured by the promises of Cloud Native technologies. They spend a year containerizing everything, get it all just right, and then get burned when patch releases contain more than security fixes, or upgrades across minor releases are not backwards compatible. But stagnation / avoiding upgrades is also risky. The real benefit to Cloud Native technologies is in the dynamic stability, which requires embracing some degree of evolution. Rather than picking LTS versions of Kubernetes, I'd like to suggest a different way of looking at the concern, and a different approach to solving the issues.

I would argue the concerns are: 1/ upgrades break things; and 2/ compliance concerns.

Proposal to address without an LTS strategy might look something like the following:
1/ really, truly, don't put anything in a patch release that is not critical security fix or critical bug patch, and raise the bar for testing.
2/ test the upgrade/downgrade across patch releases.
3/ invest in testing upgrade/downgrade from last stable patch version of one minor release to the next minor release.
4/ bring critical APIs to v1, increasing confidence in the backward compatibility guarantees.

Were upgrades more reliable, and alpha/beta APIs not required for reasonable use of Kubernetes, the demand for LTS versions would be lower.

To address the second point (compliance) I'd like to propose that Kubernetes 1.x is the LTS version. Investing effort into safe upgrades and support for downgrade / rollback seems like a better strategy for pulling together the entire community than LTS of a single minor release.

For us (running on bare metal CoreOS), the desire for LTS boils down to not having to spend a lot of effort and risk breaking changes when upgrading.

For now we're still stuck on 1.5 as the effort for upgrading beyond it snowballed (etcd3, Docker, CoreOS, ingress, TLS, RBAC, kubeadm, etc). Once upgrading becomes just replacing binaries/images and reading a few release notes, then LTS loses its appeal.

/cc

Please count me in - I've been on the other side (customer of K8s) and I've also had the problem of the versions on my last company core product (Mule ESB at @mulesoft). Had countless discussions through the years, broke many customers upgrading to "theoretically" backward compatible patch versions.

I think the problem also expands to what mentioned on the side: a bunch of beta apis that are becoming part of the default set of APIs, that we need to invest into making them GA (sometimes rather than investing in new features).

FTR, I too personally bias towards rolling upgrades and the promise of the cloud native model. A key part of these discussions (regardless if they culminate in "LTS" as typically known, or other TBD changes that make k8s more deployable/consumable/manageable) must be defining some "customer" personas and seeking information those from folks who are running today or aspire soon to run production clusters. What are their requirements? Are there addressable needs unmet? Do they just need to upgrade more often, feed their breakage observations back to us, and trust we'll do better?

Some questions @jagosan :

To address the second point (compliance) I'd like to propose that Kubernetes 1.x is the LTS version. Investing effort into safe upgrades and support for downgrade / rollback seems like a better strategy for pulling together the entire community than LTS of a single minor release.

  1. Are you saying the "1" is the MAJOR? Ie: we aspire to actually doing what is semver? That is to say 1 is what it is and stable and compatible until we release 2.0.0? In this case, to which 1.x's would critical patches be backported? All, or only some? For how long would backports be done? What's the upgrade path? 1.x to 1.(x+1) only and only the newest 1.x gets upgrade to only 2.0? Or do some 1.x's get to upgrade to some other 2.x's? Semver basically implies the latter. But if we're to invest in testing and insuring it works ahead of users attempting it...that could be a big matrix.
  2. Or are you saying we have specific 1.x's, for some values of 'x', and that those are supported for some definition of "Long"? More concretely, are you saying that we already are doing LTS if we define that to mean there are three concurrently supported 1.x LTS releases active today, and "Long" is defined as 9 months? I do believe this is how we operate today...we're not semver at all but rather 1.MAJOR.minor.

In the latter case, and say like @bgrant0607 argues, we move towards more rapid releases (say 2 weeks for the sake of concrete argument, without going into the specifics of how to operationalize that because it's totally possible and a demonstrated pattern in the art)...how many of these do you think would be good to support? Eg: continue like today and support 3-ish prior releases (let's call this "N"), giving 6 weeks of support? Or continue like today and provide support for any release first shipping in the prior 9 months (let's call this "M") giving 18 support streams to which to backport? Is there a particular value of N or M which is "better" than others, for some definable criteria?

Most of the work required to achieve releases of higher quality and stability is independent of release frequency, such as:

  • Patches only contain critical fixes
  • Critical APIs are stable (vN)
  • kubernetes/kubernetes/master kept in a releasable state
  • Clearer testing signal
  • Higher test coverage
  • Meaningful version-skew, upgrade, and downgrade tests that people fix promptly when they are broken
  • Reliable API schema upgrades and downgrades
  • Greater supported version skew for kubectl and client-go
    And so on.

Also, I am skeptical that it will be feasible for control-plane upgrades to skip minor releases in the foreseeable future.

It sounds like a working group to explore this space comprehensively and properly document in a white paper:

  1. a set of clear problem statements
  2. a set of possible solutions to the stated problems
  3. known pros and cons of the possible solutions

... would be useful.

The outcome will probably not be "we do LTS and this is how it works", but rather something more along the lines of "these are the most important problems with the current release process faced by type-X users/operators, and here are some concrete proposals for trying to address them".

Based on that we could solicit people to work on one or more of those efforts, which might be more stable releases, the option to adopt releases into production less frequently, LTS or whatever other approaches come out of the working group.

just some quick 2c.

i proposed to @tpepper to expose all important related topics as proposals and available options in a list.
once the list is created, discussions have to occur in SIG-arch and/or WG-LTS meetings covering the topics.

once the discussions are in place a voting system has to be established where a list of SIG-chairs and possibly WG chairs need to vote.
in terms of who would be eligible can end up being a decision of the steering commitee - e.g. bring in active contributors or tech leads even if not SIG chairs.

the project is lacking a voting mechanic to promote ideas, away from endless discussions.

@neolit123 re: voting. We've used the following:
-CIVS
-CNCF SurveyMonkey account that has voting style questions with a collaborative interface that allows for you to lead it vs CNCF
-discuss.kubernetes.io has capabilities to do polling, likes, etc. depending on how you set up the thread.

I suggest that we give the working group the proposed bounded amount of
time to come up with one or more recommendations, and prioritize them via
our preferred consensus mechanism, including relevant SIG and working group
leaders. In the unlikely event that consensus cannot be reached, we can
fall back to voting, but I sincerely hope that that will not be necessary.

On Thu, Nov 8, 2018 at 11:12 AM Paris notifications@github.com wrote:

@neolit123 https://github.com/neolit123 re: voting. We've used the
following:
-CIVS
-CNCF SurveyMonkey account that has voting style questions with a
collaborative interface that allows for you to lead it vs CNCF
-discuss.kubernetes.io has capabilities to do polling, likes, etc.
depending on how you set up the thread.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/community/issues/2720#issuecomment-437120212,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ApoAeNYWYiMY562rXPj4-wfgzam0--__ks5utIIVgaJpZM4W-_kw
.

--
Quinton Hoole
[email protected]

A WG is fine as a forum for collaboration, but it isn't a decision-making entity.

I don't understand what purpose of the proposed vote would be.

I can't speak for @neolit123 but I read his comment to be in a similar direction as: https://github.com/kubernetes/community/issues/2833

When I think about governance, I think about the Steering Committee. Steering's charter includes:

  • Decide how and when official releases of Kubernetes artifacts are made and what they include
  • Declare a release, so that the committee can ensure quality/feature/other requirements are met.
    If it's steering who makes these choices (and associated support stance declarations?), some might feel they have insufficient voice, even though Steering is elected by all of us.

Similar perhaps for SIG Architecture and conformance definition.

In the end though I don't see this as an issue specific to WG LTS. At most WG LTS will bubble up proposals. These likely take the form of KEPs. There is a process for KEP approval which takes into account stakeholders.

LTS is a necessity because with major updates comes regressions that you'll come to know about after months or weeks (into a point after which you cant rollback).

So for a stable successful project, you need LTS. LTS is know to be a success across distros for reason of stability.

We have WG LTS which is the focal point for this brainstorming.

/close

@tpepper: Closing this issue.

In response to this:

We have WG LTS which is the focal point for this brainstorming.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

The long and the short of it is less than .0000001% of engineering/ops teams have the bandwidth to screw around with deploying, supporting, and upgrading a moving target.

It's common sense 101, you don't make your livelihood dependent on bleeding edge, yet anyone who adopts kubernetes is doing that.

The moment someone releases either a true competitor or a proper LTS, everyone will switch to it.

My $0.02 is that little things like renaming APIs, changing their version, etc. are necessary conceptually to evolve, but even just for the sake of having to update all of your configs and helm charts to deal with deprecation, renaming, etc. is burdensome and ridiculous.

The k8s curators need to realize this and take a more backwards compatible approach. This doesn't mean that infinite versions need to be supported, but FFS why should something called an Ingress that changes from foo/v1beta1 to some bar/v1 address in the API tree have to be reflected in our config files?

Create an Ingress, if you try to use options that are incompatible with your cluster, you'll get an error. If how you configure an option changes, shame on the authors for not doing a better job and being more forward looking.

This is the same kind of lack of overall experience that led to Golang's horrible problems around libraries, forking, etc. You know, stuff that has been solved for 50 years, but people running to fast with too little experience get supported by unicorn companies that can afford to let them run amok. I speak from first hand experience.

@gnydick thanks! we are aware of the issues and talk about it all the time, we have a WG for LTS as well. Since you feel so strongly, please help us do better:
https://github.com/kubernetes/community/tree/master/wg-lts

There's a slack channel, weekly meeting, notes/videos from previous meetings if you want to dig into what we have already looked at in the url above.

there is no right way to do API versioning.
backwards compatibility between v1 and beta is plausible but not always the case out there.

the general response to this problem is that k8s needs to enter a more widely present v1 state.
non-v1 APIs are simply WIP and most likely the k8s project needs help with moving them faster to v1 too.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bboreham picture bboreham  Â·  4Comments

vallard picture vallard  Â·  5Comments

dddd45 picture dddd45  Â·  4Comments

markyjackson-taulia picture markyjackson-taulia  Â·  4Comments

alouane picture alouane  Â·  4Comments