Nomad v0.3.2
Ubuntu 14.04 on AWS, a working nomad cluster with 2 clients and one server on three differents vm, in ready state.
I want to schedule a job that is a docker job with an image from private registry, it schedules it well (Evaluation successful) and fails at allocation because "No image found".
I've tried to docker pull from the allocated node, and it behaves the same until I docker login. Problem is I have auth field in my job configuration, with ssl = true, username, password and server_address to "dev.mydomain.com:5000" in auth field and image to "dev.mydomain.com:5000/myusername/myimage:latest". I have to precise that my nomad server and my nomad agents are launched within docker containers with docker socket volume and nomad binary and configuration volume.
nomad run myjob.nomad
nomad alloc-status -address=http://10.0.3.66:4646 -verbose df0a35d9
ID = df0a35d9-7dae-46e1-0dbc-40d996de6a35
Eval ID = 3ab2f8e5-b6be-ef27-bb08-a9b55f7f4a13
Name = website.cache[0]
Node ID = 6f438c0e-5677-481a-c38b-b14b43ff1543
Job ID = website
Client Status = failed
Evaluated Nodes = 2
Filtered Nodes = 0
Exhausted Nodes = 0
Allocation Time = 61.445碌s
Failures = 0
==> Task Resources
Task: "website"
CPU Memory MB Disk MB IOPS Addresses
250 256 300 0 web: 10.0.3.66:80
==> Task "website" is "dead"
Recent Events:
Time Type Description
02/07/16 17:44:52 CEST Not Restarting Error was unrecoverable
02/07/16 17:44:52 CEST Driver Failure Failed to create container from image dev.mydomain.com:5000/username/myimage:latest: no such image
02/07/16 17:44:52 CEST Received Task received by client
==> Status
Allocation "df0a35d9-7dae-46e1-0dbc-40d996de6a35" status "failed" (0/2 nodes filtered)
It's a working ansible template
job "website" {
# Run the job in the global region, which is the default.
# region = "global"
# Specify the datacenters within the region this job can run in.
datacenters = ["dc1"]
# Service type jobs optimize for long-lived services. This is
# the default but we can change to batch for short-lived tasks.
type = "service"
# Priority controls our access to resources and scheduling priority.
# This can be 1 to 100, inclusively, and defaults to 50.
# priority = 50
# Restrict our job to only linux. We can specify multiple
# constraints as needed.
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}
# Configure the job to do rolling updates
update {
# Stagger updates every 10 seconds
stagger = "10s"
# Update a single task at a time
max_parallel = 1
}
# Create a 'cache' group. Each task in the group will be
# scheduled onto the same machine.
group "cache" {
# Control the number of instances of this groups.
# Defaults to 1
# count = 1
# Configure the restart policy for the task group. If not provided, a
# default is used based on the job type.
restart {
# The number of attempts to run the job within the specified interval.
attempts = 10
interval = "5m"
# A delay between a task failing and a restart occurring.
mode = "delay"
}
# Mode controls what happens when a task has restarted "attempts"
# times within the interval. "delay" mode delays the next restart
# till the next interval. "fail" mode does not restart the task if
# "attempts" has been hit within the interval.500
# Define a task to run
task "website" {
# Use Docker to run the task.
driver = "docker"
# Configure Docker driver with the image
config {
image = "{{ registry_host }}:{{ registry_port }}/username/myimage:latest"
port_map {
web = 80
}
ssl = true
auth {
username = "{{ registry_user }}"
password = "{{ registry_password }}"
server_address = "{{ registry_host }}:{{ registry_port }}"
}
dns_servers = ["172.17.0.1"]
}
// service {
// name = "${TASKGROUP}-website"
// tags = ["global", "cache"]
// port = "web"
// check {
// name = "alive"
// type = "tcp"
// interval = "10s"
// timeout = "2s"
// }
// }
env {
"ENDPOINT_API_IP" = "appfirewall-80.service.consul"
"ENDPOINT_API_PORT" = "9007"
}
# We must specify the resources required for
# this task to ensure it runs on a machine with
# enough capacity.
resources {
cpu = 250 # 250 Mhz
memory = 256 # 256MB
network {
mbits = 1
port "web" {
static = 80
}
}
}
# The artifact block can be specified one or more times to download
# artifacts prior to the task being started. This is convenient for
# shipping configs or data needed by the task.
# artifact {
# source = "http://foo.com/artifact.tar.gz"
# options {
# checksum = "md5:c4aa853ad2215426eb7d70a21922e794"
# }
# }
# Specify configuration related to log rotation
# logs {
# max_files = 10
# max_file_size = 15
# }
# Controls the timeout between signalling a task it will be killed
# and killing the task. If not set a default is used.
# kill_timeout = "20s"
}
}
}
@alanzanattadev Can you try https:// in front of the image name? Also does the username and password change over time?
I know with some private registries the credentials have to be refreshed over time. Also please share the actual logs of Nomad client process.
the very same problem
https not helped
nomad alloc-status
Driver Failure failed to create image: Failed to pullhost/project/frontend:production: Error: image project/frontend not found
If manually pull image - job is working
We have the following in our job files:
image = "docker-registry.service.consul:5000/debian:latest"
and /etc/default/docker has --insecure-registry=docker-registry.service.consul:5000 set. Works great!
Closing as https://github.com/hashicorp/nomad/pull/2190 brings our auth in line with dockers. If still an issue please re-open. This will be available in 0.5.3
Most helpful comment
We have the following in our job files:
and
/etc/default/dockerhas--insecure-registry=docker-registry.service.consul:5000set. Works great!