Telegraf: JSON input losing precision on 64bit integers

Created on 20 May 2020 · 10Comments · Source: influxdata/telegraf

Relevant telegraf.conf:

   data_format = "json"
   tag_keys = ["DEVICEID"]
   json_time_key = "measuredDtm"
   json_time_format = "2006-01-02T15:04:05+00:00"

System info:

telegraf 1.14 on ubuntu18.04

Steps to reproduce:

I am using telegraf to consume json input format. I have a tag key set to transform DEVICEID into an influxdb tag. The problem is that the tag ends up with truncated precision before is becomes a string.

Expected behavior:

source json which contains: ...,"DEVICEID":2882429806056571124,... should yield a metric with tag of DEVICEID=2882429806056571124.

Actual behavior:

source json which contains: ...,"DEVICEID":2882429806056571124,... actually yields a tag of DEVICEID=2882429806056571000. Note the zeroes in the least significant digits.

arejson bug

Source

nathanpegram

All 10 comments

It looks like the problem is in the go encoding/json library, specifically in json.unmarshal. I think all json numbers are parsed into float64.

nathanpegram on 20 May 2020

It is possible to optionally parse them as a json.Number, but it isn't a type we support in the Telegraf metric. It's possible we could expose an option to select if the value should be converted to an int64, uint64, float64.

danielnelson on 21 May 2020

Because I am using it as a tag, I ultimately want it parsed into a string.

nathanpegram on 21 May 2020

But in general, I could see wanting to parse signed 64 bit integers as well since they are supported by influx. I think it would make sense to use whatever options are available in the json decoding to preserve information and then parse into influx compatible formats within parsers/json/parser.go. This would allow for placing priority on parsing to string without loss for keys that have been flagged as tags or string_fields.

nathanpegram on 21 May 2020

👍1

Wondering if this is causing a loss of precision regarding timestamps as well?
Ran a local test and noticed the millisecond precision was dropped. Might not be related, and is very probable I've missed something.

Tested via the following command:
./telegraf --config jsonfile.conf --test

jsonfile.conf:

# Reload and gather from file[s] on telegraf's interval.
[[inputs.file]]
  files = ["output.json"]

  data_format = "json"
  json_strict = true
  tag_keys = ["Path", "TimeStamp"]
  json_name_key = "Path"
  json_time_key = "TimeStamp"
  json_time_format = "unix_ms"

output.json:

[
    {
        "TimeStamp": "1590009582002",
        "Path": "processor_time",
        "Value": 10.656527371660751
    },
    {
        "TimeStamp": "1590009582002",
        "Path": "memory_committed_bytes",
        "Value": 79.157851910953838
    }
]

Results (truncated slightly to highlight the timestamp value):

Starting Telegraf 
> processor_time,Value=10.65652737166075 1590009582000000000
> memory_committed_bytes,Value=79.15785191095384 1590009582000000000

The timestamp that was expected: 1590009582002000000

atxviking on 21 May 2020

That is probably rounded due to the agent precision setting:

[agent]
  precision = "1s"

danielnelson on 21 May 2020

BTW, you can check if that is the case by setting `precision = "1ms" in the agent. This takes effect across all of Telegraf but I'm planning to make it configurable per plugin in 1.15.

danielnelson on 21 May 2020

That was the issue, aka my fault. Thanks for the quick response, and that would be a really nice addition to 1.15

atxviking on 21 May 2020

@danielnelson is there any fix for this on the horizon? Alternatively can you recommend a workaround or a strategy that I could pursue for implementing a fix in a fork? I'm happy to work on the implementation, particularly if the direction has some endorsement and potential to make it's way back into the main line.

nathanpegram on 2 Oct 2020

@nathanpegram Could you try seeing if setting your "DEVICEID" as a json_string_fields doesn't lose the precision? Then converting it to a tag if is reported out as a field.

The problem looks to be that it's being converted to a float64 and losing precision. As long as you skip the float step, this shouldn't be a problem. json -> int64 -> string or json -> string, just _not_ json -> float64 -> string

A valuable bug fix would possibly be around int64 and possibly adding a json_int64_fields = []. There might be further work we could do with the JSON parser for tags.

@reimda @ssoroka