Docker-py: Pulling an image for run much slower than commandline

Created on 13 Mar 2017  路  5Comments  路  Source: docker/docker-py

Running the same command on an image that hasn't been downloaded is much slower with the SDK than with the CLI. Does the CLI have access to some other cache than the SDK? Are the two tests not really comparable? I'm new with this, so apologies if I'm not using it correctly.

Context:

$ pip freeze | grep docker && python --version && docker version
docker==2.1.0
docker-pycreds==0.2.1
Python 2.7.13 :: Continuum Analytics, Inc.
Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 10:40:59 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 07:52:04 2017
 OS/Arch:      linux/amd64
 Experimental: true
$ sw_vers -productVersion
10.11.6 # MacOS

Reproducer:

import docker
import unittest
from subprocess import Popen, PIPE

class DockerTests(unittest.TestCase):
    def test_hello_world_sdk(self):
        # Slow if image not already downloaded
        client = docker.from_env()
        output = client.containers.run("ubuntu", "echo hello world")
        self.assertEqual(output, 'hello world\n')

    def test_hello_world_popen(self):
        # Even if download is necessary, runs in a couple seconds
        p = Popen(['docker', 'run', 'ubuntu', 'echo', 'hello', 'world'],
                  stdout=PIPE)
        output = p.stdout.read()
        self.assertEqual(output, 'hello world\n')

Using the SDK is fast if the image has already been downloaded:

$ python -m unittest -v docker_engine_app.tests.DockerTests.test_hello_world_sdk
test_hello_world_sdk (docker_engine_app.tests.DockerTests) ... ok

----------------------------------------------------------------------
Ran 1 test in 1.557s

OK

But much slower starting from scratch:

$ docker rmi ubuntu
Untagged: ubuntu:latest
$ python -m unittest -v docker_engine_app.tests.DockerTests.test_hello_world_sdk
test_hello_world_sdk (docker_engine_app.tests.DockerTests) ... ok

----------------------------------------------------------------------
Ran 1 test in 57.183s

OK

But if the CLI needs to download the image, it isn't anywhere near as slow.

$ docker rmi ubuntu
Untagged: ubuntu:latest
$ python -m unittest -v docker_engine_app.tests.DockerTests.test_hello_world_popen
test_hello_world_popen (docker_engine_app.tests.DockerTests) ... Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
Digest: sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535
Status: Downloaded newer image for ubuntu:latest
ok

----------------------------------------------------------------------
Ran 1 test in 2.298s

OK
groudocumentation kinquestion

Most helpful comment

docker pull ubuntu is actually translated into docker pull ubuntu:latest. Same thing for your rmi command which only untags the ubuntu:latest image (you probably have ubuntu:16.04 tagged as well which prevents the CLI for actually removing the associated layers). So your CLI command is very fast because it's not actually downloading any new data, just checking that the tag matches the version you already have locally and re-tagging it accordingly.

On the other hand, the API (and the Python API client) when asked to pull ubuntu, actually pulls the entire repository (all images tagged in the official ubuntu repository, of which there are a lot).

If you change your code to use equivalent pull commands, I believe you will see comparable execution times:

    def test_hello_world_sdk_with_cli_pull(self):
        client = docker.from_env()
        call(['docker', 'pull', 'ubuntu'])
        output = client.containers.run("ubuntu", "echo hello world")
        self.assertEqual(output, 'hello world\n')

    def test_hello_world_sdk_with_sdk_pull(self):
        client = docker.from_env()
        client.images.pull('ubuntu:latest')
        output = client.containers.run("ubuntu", "echo hello world")
        self.assertEqual(output, 'hello world\n')

All 5 comments

The difference in behavior can be isolated to the pull:

    def test_hello_world_sdk_with_cli_pull(self):
        client = docker.from_env()
        call(['docker', 'pull', 'ubuntu'])
        output = client.containers.run("ubuntu", "echo hello world")
        self.assertEqual(output, 'hello world\n')

    def test_hello_world_sdk_with_sdk_pull(self):
        client = docker.from_env()
        client.images.pull('ubuntu')
        output = client.containers.run("ubuntu", "echo hello world")
        self.assertEqual(output, 'hello world\n')
$ docker rmi ubuntu
Untagged: ubuntu:latest
$ python -m unittest -v docker_engine_app.tests.DockerTests.test_hello_world_sdk_with_cli_pull
test_hello_world_sdk_with_cli_pull (docker_engine_app.tests.DockerTests) ... Using default tag: latest
latest: Pulling from library/ubuntu
Digest: sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535
Status: Downloaded newer image for ubuntu:latest
ok

----------------------------------------------------------------------
Ran 1 test in 2.188s

OK



md5-918d4cfd25c62b93effae74a86ae2082



$ docker rmi ubuntu
Untagged: ubuntu:latest
$ python -m unittest -v docker_engine_app.tests.DockerTests.test_hello_world_sdk_with_sdk_pull
test_hello_world_sdk_with_sdk_pull (docker_engine_app.tests.DockerTests) ... ok

----------------------------------------------------------------------
Ran 1 test in 63.027s

OK

docker pull ubuntu is actually translated into docker pull ubuntu:latest. Same thing for your rmi command which only untags the ubuntu:latest image (you probably have ubuntu:16.04 tagged as well which prevents the CLI for actually removing the associated layers). So your CLI command is very fast because it's not actually downloading any new data, just checking that the tag matches the version you already have locally and re-tagging it accordingly.

On the other hand, the API (and the Python API client) when asked to pull ubuntu, actually pulls the entire repository (all images tagged in the official ubuntu repository, of which there are a lot).

If you change your code to use equivalent pull commands, I believe you will see comparable execution times:

    def test_hello_world_sdk_with_cli_pull(self):
        client = docker.from_env()
        call(['docker', 'pull', 'ubuntu'])
        output = client.containers.run("ubuntu", "echo hello world")
        self.assertEqual(output, 'hello world\n')

    def test_hello_world_sdk_with_sdk_pull(self):
        client = docker.from_env()
        client.images.pull('ubuntu:latest')
        output = client.containers.run("ubuntu", "echo hello world")
        self.assertEqual(output, 'hello world\n')

Thanks! Yes: Changing client.run('ubuntu', 'echo hello world') to client.run('ubuntu:latest', 'echo hello world') fixes this, even without an explicit pull. The difference in behavior surprises me coming from the command line, but it's probably just me.

In your defense, our SDK docs say "similar to docker pull" - it should probably clarify in which ways it's not similar!

Anyway, glad I could help!

I still see docker pull in bash running much faster than the python code. I guess its multithreaded?

Was this page helpful?
0 / 5 - 0 ratings