Jaeger: "Unexpected end of JSON input" (empty body response from server)

Created on 8 Oct 2020  路  15Comments  路  Source: jaegertracing/jaeger

Describe the bug
What happens is that the UI receive a 200 status code with no JSON body from the server part.
The strange thing is that it seems caused by some particular spans.
When querying for spans other than these particular spans, the UI displays them gracefully.

I originally thought this was the same bug as #412.

To Reproduce
Steps to reproduce the behavior:

  1. git clone https://github.com/gautaz/moleculer-tests.git
  2. cd moleculer-tests
  3. ./init.sh
  4. docker-compose up -d
  5. curl "http://127.0.0.1:$(docker-compose port gateway 10000 | cut -d: -f2)/calculator/add?left=1&right=a"
  6. open the Jaeger GUI on "http://127.0.0.1:$(docker-compose port jaeger 16686 | cut -d: -f2)"
  7. find traces for the calculator service

Expected behavior
Even if some spans are corrupted, the server part should at least return the valid ones to the UI

Screenshots
N/A

Version (please complete the following information):

  • OS: Linux
  • Jaeger version: 1.17
  • Deployment: Docker

What troubleshooting steps did you try?
We tried to increase the Jaeger loglevel but nothing came out of it for now.

Additional context
To generate spans without causing the issue:
curl "http://127.0.0.1:$(docker-compose port gateway 10000 | cut -d: -f2)/calculator/add?left=1&right=1"

bug

Most helpful comment

As a quick fix if marshal fails we should just return an error. Once we accepted the trace data into the backend, it should be serializable. The NaN case needs to be treated on ingestion, e.g. with a sanitizer that would replace numeric NaN tags with string/NaN.

All 15 comments

@rubenvp8510 are you able to take a look at this one?

@jpkrohling sure.

same for me on fedora.

OK, after one more attempt I was able to reproduce this, I copied my/etc/passwd into moleculer-tests and renamed to .passwd order to make it works.

I'll take a look to see what the problem is.

I copied my /etc/passwd into moleculer-tests and renamed to .passwd order to make it works.

Wait, what? At least it's not your /etc/shadow :-)

Hahahaha I know I know.. security first ;)

Why are these passwd shenanigans needed in the first place? Docker containers normally run without any of that.

@yurishkuro not sure, I think is because the program inside the container expects that file and if not found it exits with status 1. This is not usual though

Which program? I highly doubt that nats or mongo or jaeger containers will depend on this file, and everything else is just part of this repository.

it is because the docker-compose try to mount .passw file

    volumes:
      - ./.passwd:/etc/passwd

Not sure how this works, so I can't tell you why the container needs this.

Regarding of the issue, it is because one of the tags contains a NaN value and go can't marshal it into JSON, https://github.com/golang/go/issues/3480

Hello, sorry for the delayed response.

The password file is mounted because these are development containers using directly the services source code located on your host (from the cloned project).
If nothing was done, the root user would for instance be operating things like npm install and the result would be a bunch of files owned by root on your host filesystem inside the project.

Sorry for the constraint but we are mostly using Linux hosts here and in fact I forgot mentioning the init step in the "Steps to reproduce" (I'll fix that now).

@rubenvp8510 Looking at your latest comment regarding golang/go#3480, where does this failing marshaling step occur?

@rubenvp8510

  • how does NaN get into the trace?
  • do we ignore marshaling errors, if this is causing empty response with 200 status code?

@gautaz It is failing when trying to write the JSON response, https://github.com/jaegertracing/jaeger/blob/master/cmd/query/app/http_handler.go#L457, but is a tag that have the NaN value.

{
  "key": "result",
  "type": "float64",
  "value": "NaN"
}

@yurishkuro

  • not sure how that happens, I'll look in to this more closely.
  • Yes , we are ignoring the marshal errors and it is causing the empty response with 200 status code. We could return a valid JSON of an empty search result or some kind of error message so we can display it on the UI may be?.

The thing is, marshal function tries to marshal the response structure and is an operation that could entirely fails or success. I'm wondering if there is a way to just skip problematic tag values.

As a quick fix if marshal fails we should just return an error. Once we accepted the trace data into the backend, it should be serializable. The NaN case needs to be treated on ingestion, e.g. with a sanitizer that would replace numeric NaN tags with string/NaN.

Was this page helpful?
0 / 5 - 0 ratings