Jx: support Istio based user group promotion in Production

Created on 6 Apr 2018  Â·  30Comments  Â·  Source: jenkins-x/jx

it would be awesome to be able to support istio based rollouts of versions.

e.g. in Production it'd be awesome to rollout to different user groups:

Test -> Staging -> Production (internal developers -> early adopters -> rolling upgrade across everyone else)

Conceptually we could treat all of these things as kinds of Environment. e.g. so we could promote to production-internal-developers or production-early adopters etc

Then the current promotion model in Jenkins X could work well across Preview environments, namespace/cluster environments or graduated user group environments - whether thats done in app release CD pipelines, or when manually promoting things.

It would help also in the tooling (CLI, web, chatbot) to view the progress of a version through all the different kinds of environments. e.g. jx get apps command would list what version is for which user group in Production.

Implementation thoughts

We'd need to ensure that istio was installed in the Production environment; which we could check the requirements.yaml file and if the istio chart is not there, do a PR to automate it.

Then we'd need to apply the necessary Istio resources to implement which user group gets which version; which the jx promote command could turn into the necessary Pull Requests on the environment git repositories.

I guess we need to start off with a sample somewhere of doing graduated rollouts of a version across user groups with Istio then see how we could generalise that for an app version. We may need to also modify the application's chart to add the necessary istio stuff?

areistio arepromote help wanted kinenhancement lifecyclrotten prioritimportant-soon

Most helpful comment

@jstrachan we're in the process of upgrading from 0.7.1 which was via kubectl. The helm charts did not work w automatic sidecar injection. We're pretty close to getting things working with 0.8 using helm :)

All 30 comments

Wonder if it makes sense to separate the different concepts of "deployment" and "release" (see this blog: https://blog.turbinelabs.io/deploy-not-equal-release-part-one-4724bc1e726b) and make them explicit in the model. That way your Environments can contain both deployments and releases with the transition between them managed by Istio. Then you can have specific "classes" or "graduation groups" for a particular pipeline.

Controlling graduation of users to new versions of a deployment can be controlled In istio with header-based routing. Something (a service or component close to the edge that understands the request, or the app itself) would tag the request with the appropriate header i.e., x-user-group=non-paying and Istio can route with routing rules https://istio.io/docs/reference/config/istio.networking.v1alpha3.html#HTTPMatchRequest

Agreed. Thanks for the links will noodle them some more.

So we separate releases from promotion already in Jenkins X - a ‘promote’ action can be triggered via a pipeline step, CLI or UI/chat bot or some micro service.

Today we have 2 kinds of promote action based on the metadata in the Environment CRD - either do a GitOps PR of a helm chart version change - or run helm upgrade directly if folks want to opt out of GitOps.

So I guess we need to add another promotion strategy - eg have some metadata in an Environment CRD to define it as an Istio user group graduation group instead; so that when a promote is triggered on it for a version we do the Istio stuff instead of vanilla helm chart upgrading?

so a "promote" sounds like it changes a version in an environment from v1 to v2.. and using vanilla k8s this would be some kind of rolling upgrade?

so then yah, a "promote" between an environment would possibly have phases/graduations and the rollover would be controlled by istio. but you could possibly have multiple versions simultaneously in an environment, each with inbound traffic controlled with istio routing rules.

Yeah - so we could have, say, a Canary Prod and Others Prod CRDs of kind Environment - they could both point to the same cluster, namespace & git repo - but refer to different Istio user groups and use the ‘Istio promotion strategy’.

Then promoting 1.2 to Canary Prod could make the necessary changes in helm to ensure that there is a 1.2 version running in addition to 1.1 & use Istio to point the canary user group at 1.2

In parallel someone could rollback the Others Prod environment from 1.1 to 1.0.

So it sounds relatively simple from a high level ;) but am sure the devil’s in the details!

Eg getting Istio all setup with the right headers so Istio can do the routing properly. Plus we want to tear down old release versions that are no longer required & deal with shared data between versions etc

This all sounds perfectly awesome, exactly what we are looking for. Thanks for the complete Jenkins-X initiative! In addition, we would also love the ability to use the Istio mirror tag (well described in this post into a production environment, in order to gather A-B style data on builds and releases before they are made. I guess this would be some additional type of promotion?

@klercker thanks for your comments & the link to that post. We definitely want to support that use case. So I'd say, yeah doing a Canary or A/B test could be a kind of promotion; we could use the A/B test as a quality gate/check to determine if that version should roll forward or get reverted etc.

One thing we could look at is performing A/B testing in Previews on Pull Requests as well - if you are doing experiments we could A/B test before approving the merge to master - it would avoid the revert if we decide a particular A/B test wasn't worth pursuing?

Yes, that's reasonable. I see us as having pretty much a constant A/B running for certain parts of our systems, in the way you are describing above. For those components, real world transactions are very valuable for evaluating build quality and regressions, and it is hard to cover them with unit tests or test suites.

Hi,

This feature really useful. Is there anyone still active working on this ?

Best Regards
VietNC

not right now AFIAK. Ray had a great idea at JBCNConf a month back; that rather than re-implementing promotion based on istio user groups and namespaces - a real simple solution could be:

  • keep the Environments & namespaces as they are using namespaces to segment the applications + versions
  • define an istio set of load balancing rules to load different user groups to different environment (so version sets). e.g. you could use environments Employees, EarlyAccess, Production - then route traffic to one of those 3 environments (namespaces) using istio user group headers etc.

It sounds fairly straightforward I hope! :) The hardest bit is installing istio really ;) I've tried a few times via helm and failed: https://jenkins-x.io/commands/jx_create_addon_istio/

I would be happy to help support whoever is working on this. One of our deployments is on top of istio.

@gabeduke ah great!

Do you install istio via kubectl apply right? Doesn’t seem like the helm charts work - or at least I couldn’t figure it out :-(

@jstrachan we're in the process of upgrading from 0.7.1 which was via kubectl. The helm charts did not work w automatic sidecar injection. We're pretty close to getting things working with 0.8 using helm :)

awesome! Let us know if you figure it out

@jstrachan I came across this medium post earlier today, and it reminded me of this issue. The post is from December so you may well have seen it already.

https://medium.com/@joatmon08/blue-green-examples-for-istio-linkerd-on-kubernetes-9ac7535f3764

HI, what is the status with istio support? Is this still in-progress as shown on the roadmap?

@marianpetrikesg yeah, its still pending - hope we can find someone to drive it forward soon though. help always welcome! :) We basically want help automating setting up an istio load balancer across a few Environment's services (i..e. exposing all the external services in N namespaces and load balancing between Canary, EarlyAdopters, Everyone environments based on the user group, or %age of traffic

We will be deploying our services on top of istio, and would definitely want to use this jx istio support functionality.

@xuanzhong istio can automatically inject its service proxies into all applications deployed on kubernetes; so you should be able to reuse istio service discovery & load balancing with Jenkins X without any real change to Jenkins X.

Though adding Canary based promotion is going to require a little bit of work; mostly its adding a canary Environment then providing an istio load balancer between, say, Canary and Production based on the user role (or a %age of traffic)

Looking forward to this. I think this feature is one of spinnaker most critical benefits over Jenkin-x at this point.

As was mentioned in #573 a somewhat similar solution would be useful for testing an application in a preview environment together with applications in, say, staging.

A similar thing for istio service mesh might be nice so that services inside Staging or Production could invoke back into a Preview environment for a particular kind of traffic / user / feature flag

Some support for this exists now with canary deployments with flagger https://blog.csanchez.org/2019/03/05/progressive-delivery-with-jenkins-x-automatic-canary-deployments/

@carlossg regarding the following:

Some support for this exists now with canary deployments with flagger https://blog.csanchez.org/2019/03/05/progressive-delivery-with-jenkins-x-automatic-canary-deployments/

i get a "too many files open" error when executing 'jx create addon istio'

$ jx create addon istio
Downloading https://github.com/istio/istio/releases/download/1.1.7/istio-1.1.7-osx.tar.gz to /Users/jwtodd/.jx/bin/istio-osx.tar.gz...
Downloaded /Users/jwtodd/.jx/bin/istio-osx.tar.gz
error: open /Users/jwtodd/.jx/bin/istio-1.1.7/install/kubernetes/helm/istio/charts/grafana/templates/configmap-dashboards.yaml: too many open files

i am running jx v2.0.160 on osx

Yes, I got that too. A workaround is to do ulimit -S -n 512 iirc

@carlossg i am on osx where ulimit is reportedly unlimited. regardless i will try again with the above ulimit applied. i also have a linux box available to me.

the too many open files error is fixed in #4226

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://jenkins-x.io/community.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Provide feedback via https://jenkins-x.io/community.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Provide feedback via https://jenkins-x.io/community.
/close

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Provide feedback via https://jenkins-x.io/community.
/close

@jenkins-x-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Provide feedback via https://jenkins-x.io/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

Was this page helpful?
0 / 5 - 0 ratings