Telegraf: Add the ability to replace or ignore values in the CSV parser

Created on 23 Sep 2019 · 6Comments · Source: influxdata/telegraf

Feature Request

Opening a feature request kicks off a discussion.

Proposal:

When trying to parse data from https://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt , they substitute MM in for fields that are missing a value. This data cannot be parsed by the CSV parser because if it hits that MM, then it throws a field conversion error.

It would be nice to be able to define a set of values to "ignore" when parsing CSV.

Current behavior:

There is no way to configure ignoring missing values.

Desired behavior:

A way to ignore specific values for parsing csv data where a value is substituted in for a missing value, such as in the link above.

Use case:

This would allow me to parse additional CSV data without needed to pre-process it or write a custom script.

feature request

Source

russorat

👍5

Most helpful comment

Your case is even more clear cut to me. Anytime a column is empty it shouldn't be an error, instead we should just skip the field on that line. I think we could do this safely across the board.

So this:

date,advbw,bwhist
2007-10-27,1.917726488,
2019-09-01,,196.773939248
2019-09-02,408.762870552,194.191607076

Leaving off the timestamp should look like:
```
bandwidth advbw=1.917726488
bandwidth bwhist=196.773939248
bandwidth advbw=408.762870552,bwhist=194.191607076

danielnelson on 17 Oct 2019

❤4 👍1

All 6 comments

Seconded!

pierwill on 16 Oct 2019

@pierwill Would you be able to add a quick description of how this works in your dataset, if by luck it is a public dataset a link would be great too.

danielnelson on 16 Oct 2019

I'm using data from https://metrics.torproject.org/bandwidth.csv with the following config:

[[inputs.file]]
  files = ["data/raw/bandwidth.csv"]
  data_format = "csv"
  csv_header_row_count = 1
  csv_column_names = ["date", "advbw", "bwhist"]
  csv_column_types = []
  csv_skip_rows = 0
  csv_skip_columns = []
  csv_delimiter = ","
  csv_comment = "#"
  csv_trim_space = false
  csv_tag_columns = []
  name_override = "bandwidth"
  csv_timestamp_column = "date"
  csv_timestamp_format = "2006-01-02"

A typical error I'm getting as a result of missing data is

[inputs.file] Error in plugin: column type: parse float error strconv.ParseFloat: parsing "": invalid syntax