Nomad: Example fails when using Docker for Mac beta

Created on 14 Apr 2016  ·  16Comments  ·  Source: hashicorp/nomad

Nomad version

$ nomad version
Nomad v0.3.1

Operating system and Environment details

OSX El Capitan / Docker for Mac Beta v1.11.0-beta6 (build: 5404) / Docker 1.11.0-rc3, build eabf97a

Issue

The example fails to run using docker for mac. I ran nomad init, and changed the kernel name to "darwin". Then I ran nomad start example.nomad.

$ nomad run example.nomad
==> Monitoring evaluation "195e9d44"
    Evaluation triggered by job "example"
    Allocation "40a81f2f" created: node "f2f29ff4", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "195e9d44" finished with status "complete"
$ nomad status
ID       Type     Priority  Status
example  service  50        dead
$ nomad status example
ID          = example
Name        = example
Type        = service
Priority    = 50
Datacenters = dc1
Status      = dead
Periodic    = false

==> Evaluations
ID        Priority  Triggered By  Status
195e9d44  50        job-register  complete

==> Allocations
ID        Eval ID   Node ID   Task Group  Desired  Status
40a81f2f  195e9d44  f2f29ff4  cache       run      failed
$ nomad alloc-status 40a81f2f
ID              = 40a81f2f
Eval ID         = 195e9d44
Name            = example.cache[0]
Node ID         = f2f29ff4
Job ID          = example
Client Status   = failed
Evaluated Nodes = 1
Filtered Nodes  = 0
Exhausted Nodes = 0
Allocation Time = 81.184µs
Failures        = 0

==> Task "redis" is "dead"
Recent Events:
Time                   Type               Description
13/04/16 20:49:00 EDT  Restarts Exceeded  Task exceeded restart policy
13/04/16 20:49:00 EDT  Driver Failure     Failed to start container db06a8a475749d7853c9bcab1a6fd21189bdf60c2e48de6efa27a269cb06b24f: API error (500): stat /tmp/NomadClient725463834/40a81f2f-4a82-0415-dc0f-91b1912f32d5/alloc: permission denied
13/04/16 20:48:58 EDT  Received           Task received by client

==> Status
Allocation "40a81f2f" status "failed" (0/1 nodes filtered)
  * Score "f2f29ff4-ee2d-a0c7-7edd-f138e6ae973f.binpack" = 3.566715

==> Task Resources
Task: "redis"
CPU  Memory MB  Disk MB  IOPS  Addresses
500  256        300      0     db: 127.0.0.1:51423

Reproduction steps

  1. Install docker for mac
  2. Start nomad agent in -dev mode
  3. Run nomad init
  4. Change constraint value to darwin or remove constraint block entirely
  5. Start the example job

All relevant logs/job file/etc can be found at [https://gist.github.com/Toady00/980dd5b8b5e6d94c04e60bf8acf649c1]

themdrivedocker themplatform-darwin typbug

Most helpful comment

Any update on this issue? We are still not able to run the example (or one of our own containers) using Nomad on OS X, both on El Capitan and Yosemite (not tested on Sierra yet).

The containers run fine started with Docker directly.

All 16 comments

I had some other issues with just running docker. I stumbled across this bug with pow.cx and docker for mac not playing well. I uninstalled pow and that resolved the initial issue I had with the 500 errors it looked like nomad was getting in the logs. But this still doesn't work. Hopefully this is an easier problem to fix.

The current problem is when trying to run the example job, the nomad logs says that there's no such image as redis:latest. Although it's clearly been pulled from the output of docker images and the nomad log. Here are the nomad logs:

    2016/04/14 22:08:44 [DEBUG] http: Request /v1/jobs (797.948µs)
    2016/04/14 22:08:44 [DEBUG] worker: dequeued evaluation 964962a1-9c22-a2ec-f283-5dd01645ff64
    2016/04/14 22:08:44 [DEBUG] sched: <Eval '964962a1-9c22-a2ec-f283-5dd01645ff64' JobID: 'example'>: allocs: (place 1) (update 0) (migrate 0) (stop 0) (ignore 0)
    2016/04/14 22:08:44 [DEBUG] http: Request /v1/evaluation/964962a1-9c22-a2ec-f283-5dd01645ff64 (86.282µs)
    2016/04/14 22:08:44 [DEBUG] worker: submitted plan for evaluation 964962a1-9c22-a2ec-f283-5dd01645ff64
    2016/04/14 22:08:44 [DEBUG] sched: <Eval '964962a1-9c22-a2ec-f283-5dd01645ff64' JobID: 'example'>: setting status to complete
    2016/04/14 22:08:44 [DEBUG] client: updated allocations at index 28 (pulled 1) (filtered 2)
    2016/04/14 22:08:44 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 2)
    2016/04/14 22:08:44 [DEBUG] worker: updated evaluation <Eval '964962a1-9c22-a2ec-f283-5dd01645ff64' JobID: 'example'>
    2016/04/14 22:08:44 [DEBUG] worker: ack for evaluation 964962a1-9c22-a2ec-f283-5dd01645ff64
    2016/04/14 22:08:44     2016/04/14 22:08:44 [DEBUG] client: starting task runners for alloc 'df1afcec-9625-a96d-af08-ce8effd69c27'
    2016/04/14 22:08:44 [DEBUG] client: starting task context for 'redis' (alloc 'df1afcec-9625-a96d-af08-ce8effd69c27')
[DEBUG] http: Request /v1/evaluation/964962a1-9c22-a2ec-f283-5dd01645ff64/allocations (82.049µs)
    2016/04/14 22:08:45 [DEBUG] http: Request /v1/evaluation/964962a1-9c22-a2ec-f283-5dd01645ff64 (100.349µs)
    2016/04/14 22:08:45 [DEBUG] http: Request /v1/evaluation/964962a1-9c22-a2ec-f283-5dd01645ff64/allocations (104.735µs)
    2016/04/14 22:08:45 [DEBUG] driver.docker: docker pull redis:latest succeeded
    2016/04/14 22:08:45 [DEBUG] driver.docker: identified image redis:latest as sha256:4f5f397d4b7ca414891bd2959ef71c83bb7010d095efb2497f0b2f407cb50f0d
    2016/04/14 22:08:45 [DEBUG] plugin: starting plugin: /usr/local/Cellar/nomad/0.3.1/bin/nomad []string{"/usr/local/Cellar/nomad/0.3.1/bin/nomad", "syslog", "/var/folders/p9/r1gb7tyd5nnc2s45rv1h4hzm0000gn/T/NomadClient461482095/df1afcec-9625-a96d-af08-ce8effd69c27/redis/redis-syslog-collector.out"}
    2016/04/14 22:08:45 [DEBUG] plugin: waiting for RPC address for: /usr/local/Cellar/nomad/0.3.1/bin/nomad
    2016/04/14 22:08:45 [DEBUG] plugin: nomad: 2016/04/14 22:08:45 [DEBUG] plugin: plugin address: unix /var/folders/p9/r1gb7tyd5nnc2s45rv1h4hzm0000gn/T/plugin082085563
    2016/04/14 22:08:45 [DEBUG] driver.docker: using 268435456 bytes memory for redis:latest
    2016/04/14 22:08:45 [DEBUG] driver.docker: using 500 cpu shares for redis:latest
    2016/04/14 22:08:45 [DEBUG] driver.docker: binding directories []string{"/var/folders/p9/r1gb7tyd5nnc2s45rv1h4hzm0000gn/T/NomadClient461482095/df1afcec-9625-a96d-af08-ce8effd69c27/alloc:/alloc:rw,z", "/var/folders/p9/r1gb7tyd5nnc2s45rv1h4hzm0000gn/T/NomadClient461482095/df1afcec-9625-a96d-af08-ce8effd69c27/redis:/local:rw,Z"} for redis:latest
    2016/04/14 22:08:45 [DEBUG] driver.docker: networking mode not specified; defaulting to bridge
    2016/04/14 22:08:45 [DEBUG] driver.docker: allocated port 127.0.0.1:52047 -> 6379 (mapped)
    2016/04/14 22:08:45 [DEBUG] driver.docker: exposed port 6379
    2016/04/14 22:08:45 [DEBUG] driver.docker: setting container name to: redis-df1afcec-9625-a96d-af08-ce8effd69c27
    2016/04/14 22:08:45 [ERR] driver.docker: failed to create container from image redis:latest: no such image
    2016/04/14 22:08:45 [DEBUG] plugin: /usr/local/Cellar/nomad/0.3.1/bin/nomad: plugin process exited
    2016/04/14 22:08:45 [ERR] client: failed to start task 'redis' for alloc 'df1afcec-9625-a96d-af08-ce8effd69c27': Failed to create container from image redis:latest: no such image
    2016/04/14 22:08:45 [INFO] client: Not restarting task: redis for alloc: df1afcec-9625-a96d-af08-ce8effd69c27
    2016/04/14 22:08:45 [DEBUG] client: updated allocations at index 30 (pulled 0) (filtered 3)
    2016/04/14 22:08:45 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 3)

and the images

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
redis               latest              4f5f397d4b7c        6 weeks ago         177.6 MB

Appears to be the same problem, but I'm running nomad natively, not in a
container. Should I close this issue, and comment on that one?

On Friday, April 15, 2016, Alex Dadgar [email protected] wrote:

Related? #1080 https://github.com/hashicorp/nomad/issues/1080


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/hashicorp/nomad/issues/1091#issuecomment-210552477

Lets leave both issues open

same issue here,
I have a container with all needed to reproduce
https://github.com/eBayClassifiedsGroup/KomPaaS
(on MacOs and Linux this is agnostic from OS)

Easily reproducible by running go test github.com/hashicorp/nomad/client/driver -run TestDockerDriver_Start_LoadImage

It doesn't seem to be the case that binding /tmp (from #1080) fixes the problem here. (At least, if I am doing things right. I was adding a /tmp:/tmp:rw bind to DockerDriver.containerBinds().)

The "no such image" error message is a red herring. See fsouza/go-dockerclient#528.

The underlying issue is that the LogConfig syslog-address is pointing to a Mac temp directory, which isn't being passed into the container. (In my case: LogConfig:{Type:syslog Config:map[syslog-address:unix:///var/folders/r0/m8hpb98s4xj8z9q5vfd6f1w102j22s/T/plugin563018527]})

The container binding thing I mentioned above doesn't work because by default TMPDIR on the Mac isn't /tmp but a folder in /var/folders. If I explicitly set TMPDIR to /tmp and add the binding above, I get a little farther. The erroneous "no such image" error goes away but I now get:

--- FAIL: TestDockerDriver_Start_LoadImage (0.67s)
    docker_test.go:290: err: Failed to start container 4b67dba6f84d342b5e68b9141dbcd61bb986c66d53e31650430caa498389b516: API error (500): Failed to initialize logging driver: dial unix /tmp/plugin150580377: connect: connection refused

I guess that's progress. 😄

Also encountering this problem in v0.4.0. Might be worth noting that I removed the constraint entirely.

docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 21:49:11 2016
 OS/Arch:      darwin/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:20:08 2016
 OS/Arch:      linux/amd64
Recent Events:
Time                   Type            Description
07/05/16 00:42:34 EDT  Not Restarting  Error was unrecoverable
07/05/16 00:42:34 EDT  Driver Failure  Failed to create container from image redis:latest: no such image
07/05/16 00:42:33 EDT  Received        Task received by client

I've been trying to get Nomad to run on the Docker for Mac beta, here's some findings:

The first issue is the temporary directory location, as noted by @joeshaw. Docker shares /tmp and /private from the host on the VM, but not /var/folders. There are 2 ways around this:

1) run Nomad as TMPDIR=/tmp nomad agent .... This changes the location Nomad creates the temp files on the host and, because it's shared by default, the Docker process in the virtual machine can find them.

2) actually mount /var/folders from the host in the virtual machine. This is currently not possible through the Docker UI (it follows symlinks so it mounts /private/var/folders instead, which is redundant) but can be done by connecting to the VM and running ln -s /private/var/folders /var/folders. After this Nomad runs without configuration changes.

The second issue is the "connection refused" error: Failed to initialize logging driver: dial unix /var/folders/1f/spcnmq1s3qq0gslhp8cn0f8w0000gn/T/plugin842313528: connect: connection refused

This is due to osxfs (the FUSE driver Docker uses to mount the host filesystem) not supporting socket files. This is a known issue as explained on the official docs.

I can't find the osxfs repository or issue tracker to follow the status of this issue though.

EDIT: see docker/for-mac#483

The example of go test ... by @joeshaw was reproducible on my Mac OSX --after setting my $GOPATH env var and running go get github.com/hashicorp/nomad (I explicate this because it was non-obvious to me as a GoLang newbie.)

Any update on this issue? We are still not able to run the example (or one of our own containers) using Nomad on OS X, both on El Capitan and Yosemite (not tested on Sierra yet).

The containers run fine started with Docker directly.

Hey,

No update on this. It isn't really a production use case so it is fairly low priority

@dadgar for us (@bownty) being able to run nomad across development, staging and production is high priority. Having a different tool set for development and the rest of the environment is far from optimal - so having this issue blocked is a real pain point at this time for us :(

Is there any work-around or ETAs on how this can be mitigated?

The referenced PR #1806 has made in-roads on this!

This should work now 👍

Was this page helpful?
0 / 5 - 0 ratings