Dlib: Example: reading csv files

Created on 14 Sep 2019  路  10Comments  路  Source: davisking/dlib

Hello,

my goal is reading csv files by using dlib framework. I searched on the web and did not find any satisfactory result. For example, this repo works but in a hard-coded manner (strong assumption of data types and feature numbers).

I am wondering if there is a "mighty" csv reader which could handle all kinds of different csv files?

Cheers

All 10 comments

You can read csv data into a matrix by just doing in >> your_matrix;. Aside from that, you are on your own. It's pretty easy to read csv data with the regular C++ iostreams though in any case.

I am not 100% comfortable with c++ iostream yet. Could you please tell me what type in is?

I find your examples code extremely well documented and perfectly suited for beginners. I am very grateful for your work. It would be really nice however if a simple example about (or including) reading csv data can be added.

Sorry to interrupt, in is a stream where you can read data from. It can be from a file, a string...
Probably, in your case, it would be a stream from a file, which would look like:

std::ifstream in("path/to/file.csv");

I don't know if I'm doing any favor by sharing this simple generic snippet code I made to read CSV files in C++, but here it is anyway:

// parse a token from a csv line
template<typename T>
auto parse_token(std::istringstream& ss, char sep = ',') -> T
{
    T result;
    std::string token;
    std::getline(ss, token, sep);
    std::stringstream stoken(token);
    stoken >> result;
    return result;
}

// load a csv file
template<typename T>
auto load_csv_file(const std::string& csv_path, bool has_header = true) -> std::vector<T>
{
    std::vector<T> items;
    std::ifstream data(csv_path);
    std::string line;
    if (has_header) std::getline(data, line);
    while (std::getline(data, line))
    {
        items.emplace_back(line);
    }
    return items;
}

Now, let's assume our csv file contains fields like: label, x_min, y_min, x_max, y_max.
Then, I would create my objects to be constructible from a std::string, that is a line from the csv file. That would look like:

struct sample_info
{
    sample_info() = default;
    sample_info(const std::string& csv_line)
    {
        std::istringstream iss(csv_line);
        label = parse_token<std::string>(iss);
        x_min = parse_token<int>(iss);
        y_min = parse_token<int>(iss);
        x_max = parse_token<int>(iss);
        y_max = parse_token<int>(iss);
    }
    std::string label{};
    int x_min{}, y_min{}, x_max{}, y_max{};
};

And with that set up, you can just read the csv file from its path like:

auto samples = load_csv_file<sample_info>("path/to/file.csv");

I hope this example helps you understand how streams work in C++ and how amazing templates are.

Sorry to interrupt, in is a stream where you can read data from. It can be from a file, a string...
...

@arrufat There is nothing to be sorry about. Thanks a lot for your detailed "csv + streams 101"!

Edit: following your code, I created an example using tuple to simplify the constructor of SampleInfo.

class LineParser {
public:
  LineParser(const std::string &csv_line) : iss(csv_line) {}
  template <typename T> T operator()(T &&) const { return parse_token<T>(iss); }

private:
  mutable std::istringstream iss;
};

struct SensorData {
  SensorData() = default;
  SensorData(const std::string &csv_line) {
    LineParser line_parser(csv_line);
    data = execute_all(line_parser, data);
  }
  std::tuple<std::string, int, int, double, double, double, double, double,
             double, double, double, double, double>
      data{};
};

The execute_all function runs line_parser on each of the element and saves the result in a new tuple.

// This is where execute_all is define.
#include <tuple>   // std::tuple
#include <utility> // std::index_sequence, std::make_index_sequence

/**
 * Indices trick (a detailed solution)
 */

template <typename T>
using Bare =
    typename std::remove_cv<typename std::remove_reference<T>::type>::type;

template <typename Op, typename Tuple, typename std::size_t... Is>
Bare<Tuple> f_all_dispatch(Op &&op, Tuple &&t, std::index_sequence<Is...>) {
  return Bare<Tuple>{
      std::forward<Op>(op)(std::get<Is>(std::forward<Bare<Tuple>>(t)))...};
};

template <typename Op, typename Tuple>
Bare<Tuple> execute_all(Op &&op, Tuple &&t) {
  return f_all_dispatch(
      op, t, std::make_index_sequence<std::tuple_size<Bare<Tuple>>::value>{});
}

Based on this version, I am going to create dlib matrix type from csv file. Just in case I have other related issues I would leave this open for now. Once I got the csv dlib version, I will post my solution and close this issue.

By the way, does anyone know if some template magic exists to simplify std::tuple<int, int, double, double, double, double> t; to std::tuple<int, int, make_multiple<double, 4>> t;
?

By the way, does anyone know if some template magic exists to simplify std::tuple t; to std::tuple> t;
?

Maybe the repeat layer in dlib dnn can help you get some inspiration:
http://dlib.net/dlib/dnn/core_abstract.h.html#repeat

just doing in >> your_matrix;.

I don't know what the expected way is to implement that. By that, I guess in contains all information of the matrix instead of just a single element, right? Below is my testing code:

  std::ifstream ifs(argv[1]);
  dlib::matrix<double, 5, 1> my_matrix;
  dlib::set_all_elements(my_matrix, 0);
  std::cout << "test matrix: " << my_matrix << std::endl;
  ifs >> my_matrix;
  std::cout << "test matrix: " << my_matrix << std::endl;

It seems the line with ifs >> my_matrix; did not change the value of my_matrix at all. I know it might be a stupid error somewhere but just failed to find it :(

What is in the contents of your file pointed by argv[1]?
I just ran a small test with the file containing:

1
1
1
1

and the program works fine.

What is in the contents of your file pointed by argv[1]?

I tried it with your test file. Same result. The two prints are both all zeros.

Thanks! I don't know what I did wrong with my configuration. But now it works without any problem.

One thing to note: if a matrix is defined to have a certain size, i.e. dlib::matrix<double, 5, 1>, the csv file must have the same size of data. Otherwise, the matrix will not be assigned. Using dlib::matrix<double> however works all the time. Maybe I got the wrong size in my previous tests.

Closing the issue.

Was this page helpful?
0 / 5 - 0 ratings