We currently use store
to persist machine information (IP, credentials, name, etc). This works mostly but has some flaws. First, if the infrastructure is modified (machine name, size, IP, etc) or removed then we drift. The store almost has to be assumed in drift as soon as we create a machine. We also do not update the store with the machine info either.
This proposal would bring together a few concepts that have been brought up before: a machine configuration file and discovery based machine information. This would also have a docker-compose like feel as well. It pulls some ideas from Terraform. Here is the idea:
docker-machine.yml
)machine create -d ec2 ...
) but simply update / create the configapply
) would allow machine to apply the configuration (launching any instances needed, etc)I think this would also play well with the idea of Machine Server. It would be nice to see Machine Server create new node(s) if they crash etc. I think a great integration would be where a Swarm node dies and Machine Server automatically launches a new instance and adds it back to the cluster.
Questions / Thoughts
Huge thanks to @gabrtv for the discussion and idea :)
/cc @sthulb @nathanleclaire @bfirsh
I'm really in favor of this. "Read in a text file, spit out the system in the desired configuration" is a good goal.
However, I think we need to carefully design this before we start implementing. Some concerns I can think of off the top of my head:
docker-machine config set Drivers.DigitalOcean.Image docker
to globally set the default DO image, for instance)? The file is just the highest priority I suppose? Or if you do an apply
it chucks all the other stuff (env vars, config) out the window and always starts from a clean slate?Like I said, I'm in favor, let's consider carefully before implementing though.
Absolutely. As titled "proposal" this is for discussion :)
I create a machine based on my file, then go change some settings remotely. Does machine detect that and update the file? Or simply converge the system back to the original state from the file?
This is what we need to discuss. I can see pros and cons for both ways.
I'm pretty sure that we will need at least two separate files, like how Terraform has tfvars, so that one can be easily kept out of version control.
I am leaning this way too.
Something we need to think about is, we probably don't want everyone to have lots of per-project VMs sprouting up
I don't think we should impose that. Who are we to decide how people use it? For example, there is nothing restricting compose users from spinning up lots of containers. I think we should make people very aware of what it is doing without imposing restrictions or decisions on how they design their infrastructure.
I could see a workflow similar to this:
docker-machine.yml
to configure a staging environmentdocker-compose.yml
to build their stacks on the environmentAt this point, the environment is simply a service that dev consumes and ops can ensure how it is ran. I don't necessarily think there would be a docker-machine.yml
per project.
How this would tie in with config if we put that in (docker-machine config set Drivers.DigitalOcean.Image docker to globally set the default DO image, for instance)? The file is just the highest priority I suppose? Or if you do an apply it chucks all the other stuff (env vars, config) out the window and always starts from a clean slate?
I am still not convinced of the "global config file". I can see the advantage of having a single place for all of the things but like git, most of us use localized settings and I'm not sure about having several configs in place. I like simplicity and the definition of a "staging" environment that has all of it's definition is very appealing. What I didn't like about config mgmt. systems was all of the inheritance spread throughout.
Stuff like re-creating failed hosts starts to get into the turf of tools like Mesos. Where are the boundaries and how do we avoid duplicating effort / reinventing the wheel?
I don't think so. Ensuring a host is up or action is taken upon fail would be a huge benefit for machine server and could actually work in tandem with projects like Mesos. For example, instead of some config mgmt tool or vendor provided, you could use machine server to ensure 5 nodes are always up. Mesos would then ensure what containers are supposed to be on those are there.
I like Terraform's "plan-before-you-apply" model. Is that included in scope?
Absolutely :)
+1
This makes a lot of sense - it would be nice in a UI sense for docker-machine and docker-compose to have more parallels. For people just starting to use and try to understand the Docker tools, it would probably be a huge help to have this kind of parity.
A few random thoughts:
apply
and not up
? docker-machine up
feels more natural if taking cues from docker-composeexternal_config: my_secret_file
), so that docker-machine doesn't have to be told about multiple different files, and so that I can share files that other systems might use or control.yml
(like swap out a t1.micro for a t1.small), I think either I should have to add a --yes-i-really-want-to-destroy-a-vm-and-bring-up-a-new-one
flag, or there should be a separate command altogether.docker-machine ls
came from?region
attribute to an array of different regions, and machine would spread my instances across each region (i.e. I set instances: 9
and softlayer-region: [ tor01, dal05, sjc01 ]
, and end up with 3 hosts in each)I'm thinking a file might look like this:
# docker-machine.yml
osswarmmaster:
driver: openstack
openstack-flavor-name: tiny
openstack-image-name: Ubuntu 14.04 LTS
openstack-floatingip-pool: myfloatingips
swarm-master: true
swarm-discovery: token://1234
myawesomevm:
driver: openstack
openstack-flavor-name: large
openstack-image-name: Ubuntu 14.04 LTS
openstack-floatingip-pool: myfloatingips
instances: 4
swarm-discovery: token://1234
slbigbox:
external_file: softlayer-secrets.yml
driver: softlayer
softlayer-cpu: 4
softlayer-disk-size: 100
softlayer-memory: 8192
softlayer-region: [ tor01, dal05, sjc01 ]
instances: 15
# softlayer-secrets.yml
softlayer-user: fred
softlayer-api-key: 1234-5678-9012
The hosts resulting from this could then be named something like:
osswarmmaster
myawesomevm_1
, myawesomevm_2
, etc...slbigbox_1
, slbigbox_2
, etc...@hairyhenderson great feedback! thanks!
Why apply and not up? docker-machine up feels more natural if taking cues from docker-compose
I'm not set on the command names -- I think apply
makes sense if we make it declarative as in if there are 6 instances with the identifying tag and we remove one to match the definition. However, if we just operate like compose does (it will ignore additional containers i believe) then up
would make sense too.
It would also be useful to be able to set a driver as default for all hosts defined in the file
+1
In the case where there's an existing set of hosts and I change something in the .yml (like swap out a t1.micro for a t1.small), I think either I should have to add a --yes-i-really-want-to-destroy-a-vm-and-bring-up-a-new-one flag, or there should be a separate command altogether.
Yeah I'm not sure how we would handle this. Perhaps the driver would have to support a "Modify" operation that would do some rolling modification. In the case of EC2, it would simply stop the instance and change the type (assuming we use EBS which we currently do). However, not all drivers support this so we would have to figure those out. We could also take a cue from Terraform and support in-place modifications for certain operations or a create/destroy routine for those that don't.
Should there be some extra metadata telling me which config a host listed in docker-machine ls came from?
I'm leaning towards machine only using a single config to show what that environment looks like. We could also have a--config
option or similar to specify certain ones (like compose).It'd be neat if I could set a region attribute to an array of different regions, and machine would spread my instances across each region (i.e. I set instances: 9 and softlayer-region: [ tor01, dal05, sjc01 ], and end up with 3 hosts in each)
Absolutely!!Hosts should be brought up in parallel as much as possible, except that swarm nodes should only come up after their master is available
+1. Actually, swarm nodes do not need their master to be available -- you can start them all together and when the master is up, it will query the discovery service for what nodes are members.
I'm pretty sure that we will need at least two separate files, like how Terraform has tfvars, so that one can be easily kept out of version control.
+1. In future this could be extended to allow other storage mechanisms than a file.
FYI; this proposal in docker-compose is leaning toward having two separate files as well: https://github.com/docker/compose/issues/846 (a "definition" and a "configuration" file)
I like the concept.
There's probably a few more behaviour issues to work out.
@thaJeztah cool thx!
@ehazlett Can we make this actually support compose syntax? So people can get swarms/machines up running containers?
@sthulb i would love to see that :) I think it would also be a good integration with compose as well.
:+1:
Something I would be concerned about with a declarative file is the handling of sensitive information. Presumably someone will want or need to check their config into source control and I've seen too many horror stories of people being charged hundreds of dollars due to bots constantly scanning GitHub and other sites for keys. Possible solutions include taking the key from environmental variable, being prompted for the key, have the key in an encrypted file (e.g. Ansible Vault) and be prompted for the password or taking it from environmental variable to unlock it.
@UserTaken - very good point. If we take a cue from docker-compose.yml
, then we could use an env_file
property, which enables users to keep secrets in files but out of source control. Obviously, that adds an extra step in CI builds since secrets need to be written to temporary files, then deleted from those files.
Handling of secrets still is a hot topic. If an env-file is supported, tools such as HashiCorp Vault, Keywhiz or Sneaker could be useful.
Also, I requested the Docker security maintainers to write up their thoughts / recommendations here; https://github.com/docker/docker/issues/13490
To be clear, in terms of secrets such as API tokens which might be needed in such a docker-machine.yml
file, I would like to support either inheriting them from the environment or keeping them in some other secondary "var" file which is deliberately meant to be kept out of version control. Either way, we should actively discourage having them in whichever file is meant explicitly to be checked into version control.
bringing this back from the dead (July 10th was last response)
i really like this concept:
Ops uses docker-machine.yml to configure a staging environment
Dev uses docker-compose.yml to build their stacks on the environment
i would also like to see a number somewhere in these descriptor files. Like I want 10 of type X and 5 of type Y. The reason being is that the underlying infrastructure may need to be tailored the apps, networking, or storage access. As @ehazlett said before, let's not limit what a user wants to do.
I hope that my PR #1881 shows a working concept of using additional configuration options. Would like to see that functionality added down the road.
@kacole2 I agree that something like count: 5
would be useful and have made moves to support it in the past with flags like --n-instances
(never successfully merged), so I'd like to add something of the sort if we implement functionality like this.
Likewise, hopefully the new driver plugin model will also open the doors to extensible functionality in other areas.
@nathanleclaire where can i learn more about the plugin model? Assuming #1626? would like to contribute where possible to make it a reality.
@kacole2 Yep, that's the proposal, and https://github.com/docker/machine/pull/1902 is the PR
is this on hold ?
@vipconsult Sort of. @kunalkushwaha has a POC here: https://github.com/docker/machine/pull/2422 and we're talking about possibly trying to implement it for 0.6.0 (January), but we can't make any promises -- it's a very big thing to commit to implementing such a feature, and we would need to get feedback from a variety of other Docker teams (for instance, is this encroaching on Compose territory?) and users before making moves.
I'd really like to see this.
Why don't you split this into a separate project like docker/docker-compose?
For reference: https://github.com/efrecon/machinery
good idea
I think a separate project, will be more of wrapper around docker-machine and compose.
I think best way to add few features in libcompose like https://github.com/docker/libcompose/issues/157
and integrate them with machine could be better.
If implementing this feature in a different project make sense, can implementing it as a terraform plugin (either docker_machine resource and/or docker_machine provisioner) or adopting HCL as configuration language for docker-machine be considered ? I think re-implementing whole terraform for docker-machine is too wide scope. And if we go with plain YAML, people will start to complain the lack of string interpolation which is already implemented as HCL. Let's say AWS launched new EC2 feature and terraform development team and docker-machine development team work on adopting same feature. This is nothing but duplicated effort and wasting open source development human resource. If docker-machine project members are freed from cloning a portion of terraform or HCL, they can focus on swarm compatibility or swarm integration which are the things of docker. AWS adds around 1000 new features per year and the speed is accelerating every year. terraform community so far is catching up this pace by adopting those features as soon as it released. With current development activity, we might need to consider catching up this speed can be realistic goal for docker-machine community.
Code of launching 16 docker swarm nodes as EC2 instances can be like this:
docker_host.hcl
resource "aws_instance" "docker_host" {
count = 16
ami = "${data.aws_ami.docker_host.id}"
instance_type = "t2.medium"
vpc_security_group_ids = [
"${aws_security_group.docker_host.id}"
]
}
resource "docker_machine" "docker_host" {
count = 16
swarm = true
swarm-master = "${count.index < 3 ? true : false}"
aws_instance_id = "${aws_instance.docker_host.*.id}"
ssh_key = "${var.docker_host.ssh_key}"
}
And command can be terraform apply
or docker-machine create --config dockerhost.hcl
I hope unix philosophy is also applicable for docker-machine by doing docker thing, and do it well.
@joelhandwell I totally agree - would love to see this sort of thing. There's some definite crossover with projects like Docker for AWS/Azure/GCP, and Infrakit (see especially https://github.com/docker/infrakit/tree/master/examples/instance/terraform).
To be honest, one of the reasons that I haven't spent much time with Docker Machine lately is because Docker for AWS meets my needs _much_ better. I've been using Terraform to apply the D4AWS CloudFormation template mostly. ¯\_(ツ)_/¯