Nomad: Docker driver on Windows Server 2016 TP4 fails

Created on 11 Mar 2016  路  6Comments  路  Source: hashicorp/nomad

I'm new to Nomad, and was interested to see if it would work with the new Docker support found in Windows Server 2016 TP4. Since this environment is not yet production, this may be an issue somewhere else (not sure). In general, though, Nomad supporting Docker on Windows (as Server 2016 is released) would be great!

Nomad version

Nomad v0.3.0

Operating system and Environment details

Windows Server 2016 TP4 (on Azure)

Issue

Task with docker driver fails

Reproduction steps

Apply this fix to Windows Server 2016 TP4
Run nomad in -dev mode
Generate example nomad file with nomad init
Remove constraint section and replace image redis:latest with microsoft/redis:latest
Run nomad run example.nomad

Relevant errors:

    2016/03/11 21:05:39 [DEBUG] driver.docker: using 268435456 bytes memory for microsoft/redis:latest
    2016/03/11 21:05:39 [DEBUG] driver.docker: using 500 cpu shares for microsoft/redis:latest
    2016/03/11 21:05:39 [DEBUG] driver.docker: binding directories []string{"C:\\Users\\paul\\AppData\\Local\\Temp\\2\\NomadClient708162966\\71030d43-8622-8c2c-2bec-b5ff91e4c320\\alloc:/alloc:rw,z", "C:\\Users\\paul\\AppData\\Local\\Temp\\2\\NomadClient708162966\\71030d43-8622-8c2c-2bec-b5ff91e4c320\\redis:/local:rw,Z"} for microsoft/redis:latest
    2016/03/11 21:05:39 [DEBUG] driver.docker: networking mode not specified; defaulting to bridge
    2016/03/11 21:05:39 [DEBUG] driver.docker: allocated port 10.0.0.4:37417 -> 6379 (mapped)
    2016/03/11 21:05:39 [DEBUG] driver.docker: exposed port 6379
    2016/03/11 21:05:39 [DEBUG] driver.docker: setting container name to: redis-71030d43-8622-8c2c-2bec-b5ff91e4c320
    2016/03/11 21:05:39 [ERR] driver.docker: failed to create container from image microsoft/redis:latest: API error (500): Invalid bind mount spec "C:\\Users\\paul\\AppData\\Local\\Temp\\2\\NomadClient708162966\\71030d43-8622-8c2c-2bec-b5ff91e4c320\\alloc:/alloc:rw,z": volumeinvalid: Invalid volume specification: 'C:\Users\paul\AppData\Local\Temp\2\NomadClient708162966\71030d43-8622-8c2c-2bec-b5ff91e4c320\alloc:/alloc:rw,z'
    2016/03/11 21:05:39 [DEBUG] plugin: C:\Users\paul\nomad.exe: plugin process exited
    2016/03/11 21:05:39 [ERR] client: failed to start task 'redis' for alloc '71030d43-8622-8c2c-2bec-b5ff91e4c320': Failed to create container from image microsoft/redis:latest: API error (500): Invalid bind mount spec "C:\\Users\\paul\\AppData\\Local\\Temp\\2\\NomadClient708162966\\71030d43-8622-8c2c-2bec-b5ff91e4c320\\alloc:/alloc:rw,z": volumeinvalid: Invalid volume specification: 'C:\Users\paul\AppData\Local\Temp\2\NomadClient708162966\71030d43-8622-8c2c-2bec-b5ff91e4c320\alloc:/alloc:rw,z'
themdrivedocker themplatform-windows typbug

Most helpful comment

@pofallon We are definitely going to support Windows! We will look into this and get this fixed when we have some bandwidth.

All 6 comments

@pofallon We are definitely going to support Windows! We will look into this and get this fixed when we have some bandwidth.

We're doing some experimentation with this, too. Currently, we have made some local changes which now allow us to actually start a docker container, but Nomad has a panic about nil data shortly afterwards and then seems to kill the container.
The required changes we've identified so far:

  • Generate alloc mountpoint with drive identifier (the problem @pofallon encountered)
  • Don't set SELinux-flags on alloc mountpoint
  • Log to "json-file" instead of syslog (this seems to be the only log driver available as of TP4, at least)
  • Set HostIP to "" in port binding configurations. Otherwise docker fails with a notice about the Windows NAT implementation

The errors we're seeing now seem to happen during service registration (eg exec.SyncServices(consulContext(d.config, container.ID))) - or at least commenting this part out makes the panic not happen at once... We're running with nomad agent -dev, and consul agent -dev -bind 127.0.0.1.

If we manage to get it working somewhat reliably, we'll try to tidy up our changes into a PR, but currently we've mostly just changed "stuff that only works on Linux" to "stuff that only works on Windows"... :P

So, after a fair bit of digging, it seems this is the line of code that crashes:
cs, err := consul.NewConsulService(ctx.ConsulConfig, e.logger, e.ctx.AllocID) / https://github.com/hashicorp/nomad/blob/master/client/driver/executor/executor.go#L426

e.ctx is nil here, at least when called via the RPC executor (the same code is called as part of startup, and at that point there is no issue). My go-fu is pretty primitive, so I haven't understood yet why this happens, but will continue looking tomorrow, unless it is trivial for someone to figure out.

We're basing our edits on the v0.3.1-rc2 tag, btw.

I'm also interested in running nomad on Windows 2016
@carlpett - did you finally manage to run it? If not - can you share the code of partial solution?

I think this issue can be closed. Problems described by @carlpett & @pofallon are resolved. And current status is in https://github.com/hashicorp/nomad/issues/1488

Sounds good! Good work @mwieczorek

Was this page helpful?
0 / 5 - 0 ratings