Docker-py: _stream_helper returns invalid JSON or cannot decode

Created on 7 May 2016  路  18Comments  路  Source: docker/docker-py

docker-py
1.8.0

python
3.5.0

docker
Version 1.11.1-beta10 (build: 6662)

OS
Mac OS X 10.11 Beta 15A279b


While using https://github.com/6si/shipwright, I found that invalid JSON was being returned by docker-py. According to the docs, the output of client.build should be an iterable of valid JSON objects encoded as strings (or the JSON objects decoded if decode=True is passed). e.g.

'{"stream":"Step 1 : FROM ubuntu:xenial\\n"}'

However, I have found while using the setup listed above, that it is returning strings like:

'{"stream":"Step 1 : FROM ubuntu:xenial\\n"}\r\n{"stream":"Step 2 : COPY foo\\n"}\r\n'

(Note the trailing CRLFs as well, differs from the documentation)

I suspect this issue is being triggered by the Docker side of things, it looks like the chunks in the chunked response used to align exactly with JSON objects, but now no longer do, although I'm not familiar enough with the Docker API to know if this is correct.

I also suspect I reached this issue because of the pre-release Docker installation, however I see that docker-py doesn't list compatible Docker versions, which makes it difficult to know if this is a bug or unimplemented functionality.

kinenhancement

Most helpful comment

1.10.0 is now available on pypi.

All 18 comments

I think there's an issue with requests concatenating the entire response when the connection closes before we start reading it. Do you think that could be what you're seeing?

If the objects are separated by CRLFs, we can easily parse that somewhere.

@shin- I don't think this is an issue with requests, it _looks_ like docker-py assumes chunks will be at the JSON object boundary, and that something at the Docker API level has broken that previously-true assumption. Obviously there is no such guarantee from Requests, Docker (according to the docs), or HTTP. It looks like Docker is using "almost" line-delimited JSON, but with the difference that it sends CRLFs instead of just LFs.

I think the best solution to this would be to continually receive chunks into a buffer, and then attempt to consume CRLF delimited JSON objects out of that buffer. docker-py should be able to handle receiving a chunk that contains multiple JSON objects, or a chunk that doesn't contain any valid JSON and must be concatenated with past or future blocks for the JSON to be valid.

Quick note, I don't think this is an enhancement (as I understand them), this is a bug fix for me.

After some investigation about ways to solve this, it looks like _stream_raw_result and _stream_raw_result_old would be the correct way of doing this. I don't have the context for why these aren't being used here, but it looks like they solve this problem exactly. Much better to delegate the chunking to Requests I think.

Hi,

seems like this issue describes a problem I'm seeing with docker-compose.

When doing docker restart <container>, where the container is started by docker-compose which is tailing the log, the docker-compose CLI _sometimes_ emits the following:

CONTAINER exited with code 143
Exception in thread Thread-5:
Traceback (most recent call last):
  File "threading.py", line 810, in __bootstrap_inner
  File "threading.py", line 763, in run
  File "compose/cli/log_printer.py", line 190, in watch_events
  File "compose/project.py", line 343, in events
  File "site-packages/docker/client.py", line 253, in _stream_helper
  File "json/__init__.py", line 338, in loads
  File "json/decoder.py", line 369, in decode
ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 689 - 1382)

Then it fails to output any more log data that container (which is now running, and also outputs proper logs if checked with docker logs).
From the trace of it, I guess this is not an actual docker-compose issue but just wanted to report it.

Env:

Mac OS 10.11.3

Docker For Mac Version 1.12.0-a (build: 11213, ad6ab836187e4111082447b7c0a6a74d01929a5c)

docker-compose version 1.8.0, build f3628c7
Docker version 1.12.0, build 8eab29e

This is still an issue for me as well (I'm having this in Ansible's docker_image module and got here through ansible/ansible-modules-core#4116 and then #4116).

The fix proposed in #1081 works perfectly for me. I suggest that gets merged.

I also would like to see this released, I'm having some trouble right now to work this around.... 馃憤

Thanks @shin- there is any expected date to be released?

@pmarques A fix has been merged in master and will be in the upcoming 1.10. Feel free to try it out.

Also, since this is now in master, I'll go ahead and close this issue.

I don't want to commit to a date yet at that point, but it will be soon.

Ok, Thanks!

1.10.0 is now available on pypi.

Hi shin, I am still seeing this issue with docker-py 1.10.6. What can I do about it?

I am also still seeing this issue with 1.10.6. I am able to reproduce it reliably using Docker for Mac, both 1.13.0 and the 1.12.x version I had installed prior. It does not seem to occur with Docker 1.12.1 that I have installed on an Ubuntu OpenStack instance.

We implemented another fix in #1389 which is available in docker==2.0.2. It should solve this issue definitively.

Just updated the package I'm working on to use the new APIClient in 2.0.0+ and am still seeing multiple JSON objects concatenated together with newlines using 2.1.0-dev, which I checked out after my comment above:

STRIPPED RESPONSE: '{"status":"Pulling from library/ubuntu","id":"16.04"}
{"status":"Digest: sha256:71cd81252a3563a03ad8daee81047b62ab5d892ebbfbf71cf53415f29c130950"}
{"status":"Status: Image is up to date for ubuntu:16.04"}'
ERROR:kolla.image.build.base:Unknown error when building
Traceback (most recent call last):
  File "/Users/erhudy/Documents/OpenStack/kolla/kolla/image/build.py", line 428, in builder
    stream = json.loads(response.strip().decode('utf-8'))
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 367, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 3 column 58 (char 55 - 206)

@erhudy - Yes - unfortunately that is the way the response is streamed by then Engine API. The fix we implemented in 2.0.2 uses an improved JSON decoder that is able to parse those artifacts. You need to use decode=True when calling APIClient.build to take advantage of it.

Okay, thanks for the clarification.

Was this page helpful?
0 / 5 - 0 ratings