Nomad: Can't pull image from private docker registry

Created on 2 Jul 2016 · 5Comments · Source: hashicorp/nomad

Nomad version

Nomad v0.3.2

Operating system and Environment details

Ubuntu 14.04 on AWS, a working nomad cluster with 2 clients and one server on three differents vm, in ready state.

Issue

I want to schedule a job that is a docker job with an image from private registry, it schedules it well (Evaluation successful) and fails at allocation because "No image found".

I've tried to docker pull from the allocated node, and it behaves the same until I docker login. Problem is I have auth field in my job configuration, with ssl = true, username, password and server_address to "dev.mydomain.com:5000" in auth field and image to "dev.mydomain.com:5000/myusername/myimage:latest". I have to precise that my nomad server and my nomad agents are launched within docker containers with docker socket volume and nomad binary and configuration volume.

Reproduction steps

nomad run myjob.nomad

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

nomad alloc-status -address=http://10.0.3.66:4646 -verbose df0a35d9
ID = df0a35d9-7dae-46e1-0dbc-40d996de6a35
Eval ID = 3ab2f8e5-b6be-ef27-bb08-a9b55f7f4a13
Name = website.cache[0]
Node ID = 6f438c0e-5677-481a-c38b-b14b43ff1543
Job ID = website
Client Status = failed
Evaluated Nodes = 2
Filtered Nodes = 0
Exhausted Nodes = 0
Allocation Time = 61.445µs
Failures = 0

==> Task Resources
Task: "website"
CPU Memory MB Disk MB IOPS Addresses
250 256 300 0 web: 10.0.3.66:80

==> Task "website" is "dead"
Recent Events:
Time Type Description
02/07/16 17:44:52 CEST Not Restarting Error was unrecoverable
02/07/16 17:44:52 CEST Driver Failure Failed to create container from image dev.mydomain.com:5000/username/myimage:latest: no such image
02/07/16 17:44:52 CEST Received Task received by client

==> Status
Allocation "df0a35d9-7dae-46e1-0dbc-40d996de6a35" status "failed" (0/2 nodes filtered)

Score "6f438c0e-5677-481a-c38b-b14b43ff1543.binpack" = 4.689970
Score "ee342874-cc22-5259-d476-7bb4b3ca02c2.binpack" = 4.689970

Job file (if appropriate)

It's a working ansible template

There can only be a single job definition per file.

Create a job with ID and Name 'example'

job "website" {
# Run the job in the global region, which is the default.
# region = "global"

# Specify the datacenters within the region this job can run in.
datacenters = ["dc1"]

# Service type jobs optimize for long-lived services. This is
# the default but we can change to batch for short-lived tasks.
type = "service"

# Priority controls our access to resources and scheduling priority.
# This can be 1 to 100, inclusively, and defaults to 50.
# priority = 50

# Restrict our job to only linux. We can specify multiple
# constraints as needed.
constraint {
    attribute = "${attr.kernel.name}"
    value = "linux"
}

# Configure the job to do rolling updates
update {
    # Stagger updates every 10 seconds
    stagger = "10s"

    # Update a single task at a time
    max_parallel = 1
}

# Create a 'cache' group. Each task in the group will be
# scheduled onto the same machine.
group "cache" {
    # Control the number of instances of this groups.
    # Defaults to 1
    # count = 1

    # Configure the restart policy for the task group. If not provided, a
    # default is used based on the job type.
    restart {
        # The number of attempts to run the job within the specified interval.
        attempts = 10
        interval = "5m"

        # A delay between a task failing and a restart occurring.
        mode = "delay"
    }
        # Mode controls what happens when a task has restarted "attempts"
        # times within the interval. "delay" mode delays the next restart
        # till the next interval. "fail" mode does not restart the task if
        # "attempts" has been hit within the interval.500
    # Define a task to run
    task "website" {
        # Use Docker to run the task.
        driver = "docker"

        # Configure Docker driver with the image
        config {
            image = "{{ registry_host }}:{{ registry_port }}/username/myimage:latest"
            port_map {
                web = 80
            }
            ssl = true
            auth {
                 username = "{{ registry_user }}"
                 password = "{{ registry_password }}"
                 server_address = "{{ registry_host }}:{{ registry_port }}"
            }
            dns_servers = ["172.17.0.1"]
        }

        // service {
        //  name = "${TASKGROUP}-website"
        //  tags = ["global", "cache"]
        //  port = "web"
        //  check {
        //      name = "alive"
        //      type = "tcp"
        //      interval = "10s"
        //      timeout = "2s"
        //  }
        // }

        env {
            "ENDPOINT_API_IP" = "appfirewall-80.service.consul"
            "ENDPOINT_API_PORT" = "9007"
        }

        # We must specify the resources required for
        # this task to ensure it runs on a machine with
        # enough capacity.
        resources {
            cpu = 250 # 250 Mhz
            memory = 256 # 256MB
            network {
                mbits = 1
                port "web" {
                    static = 80
                }
            }
        }

        # The artifact block can be specified one or more times to download
        # artifacts prior to the task being started. This is convenient for
        # shipping configs or data needed by the task.
        # artifact {
        #     source = "http://foo.com/artifact.tar.gz"
        #     options {
        #         checksum = "md5:c4aa853ad2215426eb7d70a21922e794"
        #     }
        # }

        # Specify configuration related to log rotation
        # logs {
        #     max_files = 10
        #     max_file_size = 15
        # }

        # Controls the timeout between signalling a task it will be killed
        # and killing the task. If not set a default is used.
        # kill_timeout = "20s"
    }
}

}

stagwaiting-reply themdrivedocker

Source

alanzanattadev

Most helpful comment

We have the following in our job files:

image = "docker-registry.service.consul:5000/debian:latest"

and /etc/default/docker has --insecure-registry=docker-registry.service.consul:5000 set. Works great!

ghost on 26 Jul 2016

👍4

All 5 comments

@alanzanattadev Can you try https:// in front of the image name? Also does the username and password change over time?

I know with some private registries the credentials have to be refreshed over time. Also please share the actual logs of Nomad client process.

diptanu on 6 Jul 2016

the very same problem
https not helped

luckyraul on 21 Jul 2016

nomad alloc-status

Driver Failure failed to create image: Failed to pullhost/project/frontend:production: Error: image project/frontend not found

If manually pull image - job is working

luckyraul on 21 Jul 2016

We have the following in our job files:

image = "docker-registry.service.consul:5000/debian:latest"

and /etc/default/docker has --insecure-registry=docker-registry.service.consul:5000 set. Works great!

ghost on 26 Jul 2016

👍4

Closing as https://github.com/hashicorp/nomad/pull/2190 brings our auth in line with dockers. If still an issue please re-open. This will be available in 0.5.3