Telegraf: Add the ability to replace or ignore values in the CSV parser

Created on 23 Sep 2019  路  6Comments  路  Source: influxdata/telegraf

Feature Request

Opening a feature request kicks off a discussion.

Proposal:

When trying to parse data from https://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt , they substitute MM in for fields that are missing a value. This data cannot be parsed by the CSV parser because if it hits that MM, then it throws a field conversion error.

It would be nice to be able to define a set of values to "ignore" when parsing CSV.

Current behavior:

There is no way to configure ignoring missing values.

Desired behavior:

A way to ignore specific values for parsing csv data where a value is substituted in for a missing value, such as in the link above.

Use case:

This would allow me to parse additional CSV data without needed to pre-process it or write a custom script.

feature request

Most helpful comment

Your case is even more clear cut to me. Anytime a column is empty it shouldn't be an error, instead we should just skip the field on that line. I think we could do this safely across the board.

So this:

date,advbw,bwhist
2007-10-27,1.917726488,
2019-09-01,,196.773939248
2019-09-02,408.762870552,194.191607076

Leaving off the timestamp should look like:
```
bandwidth advbw=1.917726488
bandwidth bwhist=196.773939248
bandwidth advbw=408.762870552,bwhist=194.191607076

All 6 comments

Seconded!

@pierwill Would you be able to add a quick description of how this works in your dataset, if by luck it is a public dataset a link would be great too.

I'm using data from https://metrics.torproject.org/bandwidth.csv with the following config:

[[inputs.file]]
  files = ["data/raw/bandwidth.csv"]
  data_format = "csv"
  csv_header_row_count = 1
  csv_column_names = ["date", "advbw", "bwhist"]
  csv_column_types = []
  csv_skip_rows = 0
  csv_skip_columns = []
  csv_delimiter = ","
  csv_comment = "#"
  csv_trim_space = false
  csv_tag_columns = []
  name_override = "bandwidth"
  csv_timestamp_column = "date"
  csv_timestamp_format = "2006-01-02"

A typical error I'm getting as a result of missing data is

[inputs.file] Error in plugin: column type: parse float error strconv.ParseFloat: parsing "": invalid syntax

Your case is even more clear cut to me. Anytime a column is empty it shouldn't be an error, instead we should just skip the field on that line. I think we could do this safely across the board.

So this:

date,advbw,bwhist
2007-10-27,1.917726488,
2019-09-01,,196.773939248
2019-09-02,408.762870552,194.191607076

Leaving off the timestamp should look like:
```
bandwidth advbw=1.917726488
bandwidth bwhist=196.773939248
bandwidth advbw=408.762870552,bwhist=194.191607076

Would this resolve an issue that I'm currently encountering?

I'm parsing a csv that's generated from an application's logs. In some cases cells will be populated with a float, and in some cases when no data is recorded it's blank. As a result of this, when a row containing a blank cell is fed into a database column that's set to a float type, I'm getting the below error.

2020-04-01T17:01:50Z E! [outputs.influxdb] When writing to [http://database:8086]: received error partial write: field type conflict: input field "speed" on measurement "file" is type string, already exists as type float dropped=4; discarding points

@astro-arphid Yes, I believe so.

Was this page helpful?
0 / 5 - 0 ratings