Kops: Running script on instance startup

Created on 30 Aug 2016 · 52Comments · Source: kubernetes/kops

I have a requirement that certain scripts need to be run whenever a new instance is brought up (node or master). For example, for compliance purposes let's say I have to ensure that all packages are up to date. I need to run:

sudo yum update

Whenever a new instance is started, whether as part of of the initial kops turn-up, or the ASG triggering a new instance.

How can this be accomplished with kops?

P1 lifecyclrotten

Source

yissachar

👍23

Most helpful comment

Using our own AMI would give us the most control, but it also introduces a management headache that we don't necessarily want to incur just to run a script on instance startup.

yissachar on 27 Oct 2016

👍15

All 52 comments

It's an anti-pattern, but I'll do this anyway. We can make it clear in the docs the alternatives that users should be using instead, and ask them to open issues.

justinsb on 30 Aug 2016

I'm looking to start dnsmasq and modify resolv.conf on boot, it could be done with an arbitrary command

lattwood on 7 Sep 2016

👍1

+1 :) This is important :)

adiri on 8 Sep 2016

That'd be nice for chef bootstrapping! +1

ls-yann-david on 14 Sep 2016

I would recommend another approach. I would encourage users to use there own aws node, instead of injecting a script. Thoughts?

chrislovecnm on 27 Oct 2016

I kind of agree with @chrislovecnm. I build a custom AMI so I can add my puppet bootstrapping scripts and other modifications I need.

jkemp101 on 27 Oct 2016

Using our own AMI would give us the most control, but it also introduces a management headache that we don't necessarily want to incur just to run a script on instance startup.

yissachar on 27 Oct 2016

👍15

A custom AMI also doesn't solve my personal issue of needing to install a different version of docker. Some components get overwritten by nodeup in a way that makes it difficult to customize. The reality of docker stability and bugs means that people will sometimes want to upgrade it or other components without having to upgrade all of k8s.

hubt on 8 Nov 2016

👍1

@hubt you should be able to override the docker tag on yaml / edit. If not we need to fix that.

I am happy to have someone design and submit a PR for this support. We need to get a standard design process, as well, but I digress. I understand that we will have edge cases for pre and post scripts.

Here is the challenge ;)

Supporting non OOB installs can be a ton of fun. We know that not everyone can run OOB, but really creating a ecosystem that enables OOB installs is our vision.

What are your guys thoughts? How do we do this well?!

chrislovecnm on 8 Nov 2016

@chrislovecnmdo is there a way to override the docker tag on yaml without rebuilding nodeup? we are struggling with getting the docker version update reliably automated. Chef is problematic, because if chef recipe is applied simultaneously with nodeup they could deadlock each other. any other hints?

OleksandrBerezianskyi on 14 Nov 2016

@OleksandrBerezianskyi at this point there is not. Now it should not install if it is already installed. Is this not the case?

chrislovecnm on 14 Nov 2016

it is not the case - if docker is already installed then the previous version will be uninstalled during the nodeup

OleksandrBerezianskyi on 14 Nov 2016

@OleksandrBerezianskyi bummer. That should not be the case. You want to file an issue? Or shall I?

chrislovecnm on 15 Nov 2016

@chrislovecnm filed an issue #908

OleksandrBerezianskyi on 16 Nov 2016

This feels strikingly similar to the Terraform provisioners discussions.. Personally I am of the mentality that kops should offer plugin capabilities - but no support.. Despite it being an anti pattern..

In other words, giving the user a clean way of hooking into Nodeup either with an interface/pattern in go or with an executable such as a bash script would be _fine_..

Of course our support goes out the window once users start hacking onto their clusters, and that might be a community nightmare dealing with the potential issues this could bring..

Another concern would be failures.. if we do a fire and forget the user would have no way of knowing if their provisioner failed. We could have perfectly valid kops clusters floating around - that are perfectly _invalid_ clusters to the user because their plugin failed.

On the other hand, if we did a wait and react approach - we introduce a few other pain points as well as kops could start failing - with the cluster technically online - which sounds dangerous..

Just my 2 cents :)

kris-nova on 16 Nov 2016

👍3

See my growing proposal on a kops plugin library.. I think creating an open ended library for the community to interface with might be worthwhile..

https://github.com/kubernetes/kops/issues/958

kris-nova on 21 Nov 2016

👍5

I think the simplest solution would be to just allow users to provide an arbitrary script that gets appended to the to the AWS User Data. Actually, we probably would want to allow them to provide different scripts for the masters vs nodes.

What are the problems with this approach? Would this be accepted if somebody put together a PR for it?

yissachar on 16 Dec 2016

👍4

@yissachar always my man ... start with a quick PR based design write up. And rock and roll!!

chrislovecnm on 16 Dec 2016

@chrislovecnm Do we have a template or example for PR design proposal?

yissachar on 16 Dec 2016

We need to document howto do that .... Oh the fun of a growing project.

So we could steal the pattern from the main repo, or I like doing or update the docs, do a quick design in the pr, and start coding

I do like velocity

chrislovecnm on 16 Dec 2016

@yissachar this will not help. We need a hook not at the end of AWS User Data but after nodeup has finished. Because nodeup will override everything that is done by your custom script including downgrading versions of packages.

OleksandrBerezianskyi on 16 Dec 2016

That custom plugin library I am dreaming up is starting to sound pretty good at this point.. https://github.com/kubernetes/kops/issues/958

kris-nova on 16 Dec 2016

We have a use-case for this as well, installing https://github.com/gluster/gluster-kubernetes requires some changes to the base image (installing the server software etc)

tanner-bruce on 30 Jan 2017

👍1

WIP in https://github.com/kubernetes/kops/pull/1766

yissachar on 20 Mar 2017

We recently had three PRs go in around a hook implementation.

https://github.com/kubernetes/kops/pull/2381 <- pre-pulling containers
https://github.com/kubernetes/kops/pull/2298 <- installing a nvidia driver

This will be in the 1.6 release and is in the current master branch. If you are using master, you require a full build including protokube and nodeup. Testing and breaking it so we can make it better would be awesome!!!!

chrislovecnm on 23 Apr 2017

The 1.6.0 beta1 got tagged 2 days ago, do we still need to do a full build?

diwu1989 on 4 May 2017

@yissachar can we close this?

chrislovecnm on 24 Jun 2017

Where can I find hooks doc?

shamil on 25 Jun 2017

Added some info here https://github.com/kubernetes/kops/pull/2795 and we have examples in the source tree as well

chrislovecnm on 25 Jun 2017

I know this thread hasn't been commented on in a couple months, however I still feel like the hooks functionality isn't quite going to work for some use cases. Here's my example: We prebake amis using a bunch of lvm mount points. Looking at hooks.go, it only maps /, dbus, and systemd. If I have a script that I add to a hook that installs stuff in /opt, and that is an lvm mount point on the host, that seems like that could cause problems. Additionally, I'm trying to use hooks to install stuff like splunk forwarders that require a --accept license flag to be run on first start, which in this case would have to be done from a chroot jail?

mgarren on 7 Aug 2017

For what its worth, our use case ( after a few days of testing) has proven that the docker-hook is insufficient for us.

In our case, we seek to install an active security agent ( nessus, by Tenable) as a part of node startup. We'd rather not bake this into our base image, for a variety of reasons.

If we had a simple startup script, as @yissachar's PR implements, we would use this.

It turns out that the docker hook, even though it has scary permissions, still doesnt work for the case of installing software. That's because though you can install the files, you cannot use service or systemctl to actually start the service. If you try to use these commands from within a docker container, even with the host mounted, it will fail

We tried experimentally adding --pids=host and --ipc=host when the docker container is run, with the idea that if that worked, it woudl be a trivial change to the existing container. But that doesn't work either.

TL;DR: if you are reading this because you need to run some script on your host-- think twice before trying to use the docker container hook. There are still a LOT of things that will not work unless you are really on the host.

We thought about using cloudwatch triggers, but in AWS, cloudwatch triggers do not really work well, because you need the instance ID to register them. IE, doing this approach would work best as a kops mod.

We use ansible for deployments. Normally, we would use ansible to install our things. But with ASGs in play, its difficult to find all of the instances. Even then, it would even harder to hook it into the scaling lifecycle.

That leaves us with two solutions that require quite a lot of work:

(1) bake Nessus into our base image.
(2) back our base image with a 'phone home' call that installs our software. In this case, we'll probably query for a tag that contains what stuff to install, and then fetch a script. If you think this sounds a lot like puppet, you're right-- it is. We'd probably do that if we used puppet instead of ansbile.

I mention these mainly to illustrate that, while solving this problems are easily described as 'not a kops problem', it certainly saves a LOT of work for users with a VERY small amount of effort ( simply merging a PR that @yissachar has basically already written )

dcowden on 8 Aug 2017

In our use case, we need to change the apt sources to some mirror since the connection is not that stable and speed of downloading is not satisfied in China. nodeup will do apt-get update and install some essential packages when it launches. So the container hook is not suitable to these kind of usage. For some reasons, we'd rather not bake this change into the base ami though.

qqshfox on 9 Aug 2017

👍2

Since it doesn't technically run right at node startup time, this might not cover all use cases mentioned here, but for many folks the startup-script DaemonSet approach (see https://github.com/kubernetes/contrib/tree/master/startup-script) might work. It's basically just a privileged container that runs an arbitrary script you pass in as an env var. We used it to install xfsprogs on kops-provisioned k8s nodes.

woodlee on 17 Aug 2017

We will keep this open, as hooks meet some needs, but it seems that people still need a start script.

chrislovecnm on 17 Aug 2017

👍3

While I'd like native support for a post-startup script, there may be an easier hack than doing nodeup builds.

I noticed in the LaunchConfiguration on AWS ASG's, there is a script that kops puts into place. The script is generated by the kops binary. I think I can modify the template in the kops source code and run "make" to get my own kops binary with a modified template. The template is locate here:

https://github.com/kubernetes/kops/blob/master/pkg/model/resources/nodeup.go

I'll try adding my own shell commands at the end of it, recompile kops, and see if it works. In addition, I'll have it download a script from s3 and execute it so that I don't have to make a new kops build each time. I can just modify the script in s3 instead.

If it does, it will be alot easier than making my own AMI or building my own nodeup binary.

I'll report back my findings.

mr-rick on 16 Sep 2017

👍1

@mr-rick so, do you have any interesting news? The same task is relevant for us as well

notmaxx on 29 Sep 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot on 6 Jan 2018

/remove-lifecycle stale

Cryptophobia on 2 Feb 2018

We would love this feature as well. Baking AMIs here and we can't append easily to EC2 startup scripts when using kops.

Cryptophobia on 2 Feb 2018

Our use-case is the same as @qqshfox. In China we need to replace the default apt sources so that we are using a China mirror, otherwise when nodeup attempts to apt update it is likely to fail or be painfully slow. Hooks can't help with this as nodeup runs apt update before running hooks. I've baked these changes into the base AMI for now, but this would have been a lot simpler if some custom script could be run before nodeup.

It seems like the result of the discussion here is that hooks solve many of the issues mentioned above, but there are still a variety of things that are valid and can't be done with hooks. So the ability to inject some kind of custom script into the userdata is seen as a desirable and valid feature. There was some WIP in #1766 to achieve this, but that PR has now been closed in favour of hooks.

So I guess we're back to needing a new PR/design proposal?

joelittlejohn on 15 Feb 2018

👍2

@joelittlejohn , hooks seem to solve our problem for us and we are using them now. apt update is expected to be the first thing ran when bootstrapping new AMIs anyways. Sounds like nodeup would be a blocker for you in any case because it will always try to run apt update.

Cryptophobia on 15 Feb 2018

@Cryptophobia Yes, nodeup will always run apt update, which is why a startup script is required to replace the default apt sources before nodeup runs. Hooks can't be used for this.

joelittlejohn on 15 Feb 2018

Yeah, but you should be bootstrapping your sources when making the AMI images is what I said. Bootstrapping as an idea is getting things like apt sources ready before deploying a virtual machine...

Cryptophobia on 15 Feb 2018

@Cryptophobia I think what we're talking about here is a solution that allows this kind of minimal customisation without the requirement to introduce an AMI bootstrapping step. This is the gist of this comment if I understand it correctly:

https://github.com/kubernetes/kops/issues/387#issuecomment-256681848

joelittlejohn on 15 Feb 2018

We have a bootstrap process for the kops.io AMIs every time we deploy new ones. Do you work in an organization or a team where you don't have any bootstrapping and security tools deployed with your instances?

The tradeoff here is a lot of work to be done on something that will create little value and can mostly be done with hooks.

I don't want the kops team to work on things that are easily done by config management tools like Ansible and AMI build pipelines. That's a lot of effort that can applied to other more worthwhile features in kops.

Cryptophobia on 15 Feb 2018

@Cryptophobia Sorry, we seem to be rehashing the same discussion about which use cases for this feature are valid and which aren't. There seems to be some agreement here that kops should (and could fairly easily) support injecting a custom step into the user data. I'm not trying to suggest that the use case I have is the one that justifies implementation of this feature, only that I would find the feature useful if it existed (and there have been a bunch of other commenters here, each with their own problem to solve, who have said the same).

joelittlejohn on 15 Feb 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 16 May 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot on 15 Jun 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 15 Jul 2018

The issue was auto-closed but I didn't get — was the feature declined or just it's not implemented yet? What's the best workaround? Is the creation of own AMI the only way?

KIVagant on 31 Aug 2018

The issue was auto-closed but I didn't get — was the feature declined or just it's not implemented yet? What's the best workaround? Is the creation of own AMI the only way?

The creation of a custom AMI is one way. It can be done with an Ansible script or shell script and automated. @joelittlejohn was making the case that it requires too much work and providing some way to inject a startup script would be nice to have if the kops team were to implement. Not sure what the kops team decided in the end.

Cryptophobia on 31 Aug 2018

Do kops edit ig ## master as well as nodes and add below step

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: {‌{DATE}}
labels:
kops.k8s.io/cluster: {‌{CLUSTER_NAME}}
name: master
spec:
image: {‌{KOPE_IMAGE}}
additionalUserData: ####### adding date or other command ######
- name: ntpdate.sh
type: text/x-shellscript
content: |
#!/bin/bash
sudo apt-get update
echo date > /date.txt
machineType: m5.large
maxSize: 10
minSize: 3
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
subnets:
- us-west-2a
- us-west-2b
- us-west-2c