Nomad: Container runs fine when using docker CLI but fails when run through Nomad

Created on 28 Dec 2015  路  12Comments  路  Source: hashicorp/nomad

I've been having a weird issue with an app. It runs fine when running it with Docker's CLI but fails when I schedule it with Nomad. I created a simpler test case, taking specific details of the app out and pushed an image to the public Docker registry at c4milo/nomad-test:1.0.0.

Pulling the test image

docker pull c4milo/nomad-test:1.0.0

Running the container using Docker CLI

docker run --net=host -ti image_id uwsgi --env ENV=dev --die-on-term --master --http 9090 --workers 2 --threads 2 --need-app --callable app --thunder-lock --chdir /app --file app.py

Job definition:

job "test-job" {
    datacenters = ["dc1"]
    distinct_hosts = true
    type = "service"
    priority = 50
    constraint {
        attribute = "$attr.kernel.name"
        value = "linux"
    }

    # Configure the job to do rolling updates
    update {
        # Stagger updates every 10 seconds
        stagger = "10s"

        # Update a single task at a time
        max_parallel = 1
    }

    group "instances" {
        count = 1

        restart {
            interval = "1m"
            attempts = 2
            delay = "15s"
            on_success = true
            mode = "delay"
        }

        # Define a task to run
        task "test" {
            # Use Docker to run the task.
            driver = "docker"

            config {
                image = "c4milo/nomad-test:1.0.0"
                server_address = "registry.docker.com:443"
                network_mode = "host"
                command = "uwsgi"
                args = [
                    "--env ENV=dev",
                    "--die-on-term",
                    "--master",
                    "--http ${NOMAD_PORT_http}",
                    "--workers 1", "--threads 1",
                    "--need-app", "--callable app",
                    "--chdir /app",
                    "--file app.py"
                ]
            }

            resources {
                cpu = 500 # 500 Mhz
                memory = 512 # 256MB
                network {
                    mbits = 10
                    port "http" {}
                }
            }

            service {
                port = "http"
                check {
                    name = "alive"
                    type = "http"
                    path = "/"
                    interval = "10s"
                    timeout = "2s"
                }
            }
        }
    }
}

Dockerfile

FROM centos:6

RUN yum install -y \
    epel-release \
    wget \
    rpm \
    unzip \
    centos-release-SCL \
    ca-certificates \
    gsl \
    blas-devel \
    lapack-devel \
    libxslt-devel

RUN yum install -y \
    python-pip \
    python-devel

RUN pip install --upgrade pip Flask
RUN pip install uwsgi

WORKDIR /app
COPY . /app

RUN python -m compileall /app
CMD ["python /app/app.py"]

Allocation status output

$ nomad alloc-status c12205ab-f6c8-1113-7878-eae0f532cfe5
ID                = c12205ab-f6c8-1113-7878-eae0f532cfe5
EvalID            = 7ec6468e-aed9-34a0-41ed-254d234afc7e
Name              = test-job.instances[0]
NodeID            = 2beee4ee-1339-646d-fb85-537327e998f9
JobID             = test-job
ClientStatus      = running
NodesEvaluated    = 3
NodesFiltered     = 0
NodesExhausted    = 0
AllocationTime    = 59.62碌s
CoalescedFailures = 0

==> Task "test" is "pending"
Recent Events:
Time               Type        Description
16:26:35 12/28/15  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
16:26:35 12/28/15  Started     <none>
16:26:20 12/28/15  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
16:26:20 12/28/15  Started     <none>
16:25:50 12/28/15  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
16:25:50 12/28/15  Started     <none>
16:25:35 12/28/15  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
16:25:35 12/28/15  Started     <none>
16:25:20 12/28/15  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
16:25:20 12/28/15  Started     <none>

==> Status
Allocation "c12205ab-f6c8-1113-7878-eae0f532cfe5" status "running" (0/3 nodes filtered)
  * Score "62a553b3-b829-0897-c8d4-d14056b9366c.binpack" = 8.147441
  * Score "f0fd7eaf-69d8-934e-d0ea-922585c92c97.binpack" = 1.841511
  * Score "2beee4ee-1339-646d-fb85-537327e998f9.binpack" = 14.610137

Most helpful comment

Sure, however I still think it could be more user friendly and prone to this kind of mistakes.

All 12 comments

Here is the repo with all the files: https://github.com/c4milo/nomad-test

@c4milo Were you able to see the logs from the container?

@diptanu no, it seems to be terminating right away. Docker daemon logs don't show anything abnormal either. My next step triaging the issue was to put some additional logging in the Docker task driver but I've spent a good deal of time already ruling out other possibilities, and I would like to use a second pair of eyes now :/

@c4milo Usually I triaged issues like this in the past by comparing the json the docker daemon receives while running via the cli and the cluster manager. Can you see the difference?

If you have the container ids then sharing the json returned by docker inspect should tell us how the containers are getting setup in both cases.

I'm not sure I'm following, how am I supposed to run docker inspect on a container that immediately terminates upon scheduling with Nomad?

@c4milo this might be redundant information but as long as the container is being created it is possible to inspect it. Just look it up with docker ps -a and run an inspect on the appropriate container. The fact that it terminates directly does not prevent you from inspecting the json configuration.

It's no redundant at all. Thanks a lot for the additional clarification @iverberk

Here it goes, the one with "ExitCode": 1, is when run through Nomad:

diff --git a/nomad.json b/cli.json
index 4f8e4a6..7c3744d 100644
--- a/nomad.json
+++ b/cli.json
@@ -1,19 +1,27 @@
 [
 {
-    "Id": "15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60",
-    "Created": "2015-12-30T20:52:55.922662765Z",
+    "Id": "c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555",
+    "Created": "2015-12-30T20:59:28.190230218Z",
     "Path": "uwsgi",
     "Args": [
-        "--env ENV=dev",
+        "--env",
+        "ENV=dev",
         "--die-on-term",
         "--master",
-        "--http 23933",
-        "--workers 1",
-        "--threads 1",
+        "--http",
+        "9090",
+        "--workers",
+        "2",
+        "--threads",
+        "2",
         "--need-app",
-        "--callable app",
-        "--chdir /app",
-        "--file app.py"
+        "--callable",
+        "app",
+        "--thunder-lock",
+        "--chdir",
+        "/app",
+        "--file",
+        "app.py"
     ],
     "State": {
         "Status": "exited",
@@ -23,17 +31,17 @@
         "OOMKilled": false,
         "Dead": false,
         "Pid": 0,
-        "ExitCode": 1,
+        "ExitCode": 0,
         "Error": "",
-        "StartedAt": "2015-12-30T20:52:55.977034702Z",
-        "FinishedAt": "2015-12-30T20:52:55.981442507Z"
+        "StartedAt": "2015-12-30T20:59:28.266952875Z",
+        "FinishedAt": "2015-12-30T20:59:31.319496838Z"
     },
     "Image": "2ada554a624b0469e95fc98577271d83754c59ead37ea52888d70e11b4b03c02",
-    "ResolvConfPath": "/var/lib/docker/containers/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60/resolv.conf",
-    "HostnamePath": "/var/lib/docker/containers/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60/hostname",
-    "HostsPath": "/var/lib/docker/containers/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60/hosts",
-    "LogPath": "/var/lib/docker/containers/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60-json.log",
-    "Name": "/test-3001caa8-7c81-a0c1-4d04-af3e5496be41",
+    "ResolvConfPath": "/var/lib/docker/containers/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555/resolv.conf",
+    "HostnamePath": "/var/lib/docker/containers/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555/hostname",
+    "HostsPath": "/var/lib/docker/containers/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555/hosts",
+    "LogPath": "/var/lib/docker/containers/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555-json.log",
+    "Name": "/trusting_almeida",
     "RestartCount": 0,
     "Driver": "overlay",
     "ExecDriver": "native-0.2",
@@ -42,47 +50,31 @@
     "AppArmorProfile": "",
     "ExecIDs": null,
     "HostConfig": {
-        "Binds": [
-            "/var/lib/nomad/alloc/3001caa8-7c81-a0c1-4d04-af3e5496be41/alloc:/alloc:rw,z",
-            "/var/lib/nomad/alloc/3001caa8-7c81-a0c1-4d04-af3e5496be41/test:/local:rw,Z"
-        ],
+        "Binds": null,
         "ContainerIDFile": "",
-        "LxcConf": null,
-        "Memory": 536870912,
+        "LxcConf": [],
+        "Memory": 0,
         "MemoryReservation": 0,
-        "MemorySwap": -1,
+        "MemorySwap": 0,
         "KernelMemory": 0,
-        "CpuShares": 500,
+        "CpuShares": 0,
         "CpuPeriod": 0,
         "CpusetCpus": "",
         "CpusetMems": "",
         "CpuQuota": 0,
         "BlkioWeight": 0,
         "OomKillDisable": false,
-        "MemorySwappiness": null,
+        "MemorySwappiness": -1,
         "Privileged": false,
-        "PortBindings": {
-            "23933/tcp": [
-                {
-                    "HostIp": "10.213.42.22",
-                    "HostPort": "23933"
-                }
-            ],
-            "23933/udp": [
-                {
-                    "HostIp": "10.213.42.22",
-                    "HostPort": "23933"
-                }
-            ]
-        },
+        "PortBindings": {},
         "Links": null,
         "PublishAllPorts": false,
-        "Dns": null,
-        "DnsOptions": null,
-        "DnsSearch": null,
+        "Dns": [],
+        "DnsOptions": [],
+        "DnsSearch": [],
         "ExtraHosts": null,
         "VolumesFrom": null,
-        "Devices": null,
+        "Devices": [],
         "NetworkMode": "host",
         "IpcMode": "",
         "PidMode": "",
@@ -91,7 +83,7 @@
         "CapDrop": null,
         "GroupAdd": null,
         "RestartPolicy": {
-            "Name": "",
+            "Name": "no",
             "MaximumRetryCount": 0
         },
         "SecurityOpt": null,
@@ -112,62 +104,47 @@
         "Name": "overlay",
         "Data": {
             "LowerDir": "/var/lib/docker/overlay/2ada554a624b0469e95fc98577271d83754c59ead37ea52888d70e11b4b03c02/root",
-            "MergedDir": "/var/lib/docker/overlay/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60/merged",
-            "UpperDir": "/var/lib/docker/overlay/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60/upper",
-            "WorkDir": "/var/lib/docker/overlay/15cf649405971447c25ca85f8dbbd054f473bc8a17d0fe35178ff49ef14acc60/work"
+            "MergedDir": "/var/lib/docker/overlay/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555/merged",
+            "UpperDir": "/var/lib/docker/overlay/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555/upper",
+            "WorkDir": "/var/lib/docker/overlay/c345768419bfb041acc78d5b95760070b6282d33391d7c280ae6295c8c810555/work"
         }
     },
-    "Mounts": [
-        {
-            "Source": "/var/lib/nomad/alloc/3001caa8-7c81-a0c1-4d04-af3e5496be41/alloc",
-            "Destination": "/alloc",
-            "Mode": "rw,z",
-            "RW": true
-        },
-        {
-            "Source": "/var/lib/nomad/alloc/3001caa8-7c81-a0c1-4d04-af3e5496be41/test",
-            "Destination": "/local",
-            "Mode": "rw,Z",
-            "RW": true
-        }
-    ],
+    "Mounts": [],
     "Config": {
         "Hostname": "per-nomad-worker07",
         "Domainname": "",
         "User": "",
-        "AttachStdin": false,
-        "AttachStdout": false,
-        "AttachStderr": false,
-        "ExposedPorts": {
-            "23933/tcp": {},
-            "23933/udp": {}
-        },
-        "Tty": false,
-        "OpenStdin": false,
-        "StdinOnce": false,
+        "AttachStdin": true,
+        "AttachStdout": true,
+        "AttachStderr": true,
+        "Tty": true,
+        "OpenStdin": true,
+        "StdinOnce": true,
         "Env": [
-            "NOMAD_CPU_LIMIT=500",
-            "NOMAD_IP=10.213.42.22",
-            "NOMAD_PORT_http=23933",
-            "NOMAD_ALLOC_DIR=/alloc",
-            "NOMAD_TASK_DIR=/local",
-            "NOMAD_MEMORY_LIMIT=512",
             "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
         ],
         "Cmd": [
             "uwsgi",
-            "--env ENV=dev",
+            "--env",
+            "ENV=dev",
             "--die-on-term",
             "--master",
-            "--http 23933",
-            "--workers 1",
-            "--threads 1",
+            "--http",
+            "9090",
+            "--workers",
+            "2",
+            "--threads",
+            "2",
             "--need-app",
-            "--callable app",
-            "--chdir /app",
-            "--file app.py"
+            "--callable",
+            "app",
+            "--thunder-lock",
+            "--chdir",
+            "/app",
+            "--file",
+            "app.py"
         ],
-        "Image": "c4milo/nomad-test:1.0.0",
+        "Image": "2ada554a624b",
         "Volumes": null,
         "WorkingDir": "/app",
         "Entrypoint": null,
@@ -175,7 +152,8 @@
         "Labels": {
             "License": "GPLv2",
             "Vendor": "CentOS"
-        }
+        },
+        "StopSignal": "SIGTERM"
     },
     "NetworkSettings": {
         "Bridge": "",

So, this change in my Nomad job definition seemed to made it work:

diff --git a/app.nomad b/app.nomad
index bd35830..a21e5e3 100644
--- a/app.nomad
+++ b/app.nomad
@@ -1,5 +1,5 @@
 job "test-job" {
-   datacenters = ["dc1"]
+   datacenters = ["iad2", "dc1"]
    distinct_hosts = true
    type = "service"
    priority = 50
@@ -39,14 +39,14 @@ job "test-job" {
                network_mode = "host"
                command = "uwsgi"
                args = [
-                   "--env ENV=dev",
+                   "--env", "ENV=dev",
                    "--die-on-term",
                    "--master",
-                   "--http ${NOMAD_PORT_http}",
-                   "--workers 1", "--threads 1",
-                   "--need-app", "--callable app",
-                   "--chdir /app",
-                   "--file app.py"
+                   "--http", "${NOMAD_PORT_http}",
+                   "--workers", " ", "--threads", "1",
+                   "--need-app", "--callable", "app",
+                   "--chdir", "/app",
+                   "--file", "app.py"
                ]
            }

Since this was just an issue of splitting the command line arguments I am going to close it. If something else comes up we can open it.

Sure, however I still think it could be more user friendly and prone to this kind of mistakes.

Stumbled upon the same issue where container exits and cleaned up immediately. @diptanu gave the tip on checking the task's stdout and stderr logs which worked very well in diagnosing the issue.

Was this page helpful?
0 / 5 - 0 ratings