Hello,
my goal is reading csv files by using dlib framework. I searched on the web and did not find any satisfactory result. For example, this repo works but in a hard-coded manner (strong assumption of data types and feature numbers).
I am wondering if there is a "mighty" csv reader which could handle all kinds of different csv files?
Cheers
You can read csv data into a matrix by just doing in >> your_matrix;. Aside from that, you are on your own. It's pretty easy to read csv data with the regular C++ iostreams though in any case.
I am not 100% comfortable with c++ iostream yet. Could you please tell me what type in is?
I find your examples code extremely well documented and perfectly suited for beginners. I am very grateful for your work. It would be really nice however if a simple example about (or including) reading csv data can be added.
Sorry to interrupt, in is a stream where you can read data from. It can be from a file, a string...
Probably, in your case, it would be a stream from a file, which would look like:
std::ifstream in("path/to/file.csv");
I don't know if I'm doing any favor by sharing this simple generic snippet code I made to read CSV files in C++, but here it is anyway:
// parse a token from a csv line
template<typename T>
auto parse_token(std::istringstream& ss, char sep = ',') -> T
{
T result;
std::string token;
std::getline(ss, token, sep);
std::stringstream stoken(token);
stoken >> result;
return result;
}
// load a csv file
template<typename T>
auto load_csv_file(const std::string& csv_path, bool has_header = true) -> std::vector<T>
{
std::vector<T> items;
std::ifstream data(csv_path);
std::string line;
if (has_header) std::getline(data, line);
while (std::getline(data, line))
{
items.emplace_back(line);
}
return items;
}
Now, let's assume our csv file contains fields like: label, x_min, y_min, x_max, y_max.
Then, I would create my objects to be constructible from a std::string, that is a line from the csv file. That would look like:
struct sample_info
{
sample_info() = default;
sample_info(const std::string& csv_line)
{
std::istringstream iss(csv_line);
label = parse_token<std::string>(iss);
x_min = parse_token<int>(iss);
y_min = parse_token<int>(iss);
x_max = parse_token<int>(iss);
y_max = parse_token<int>(iss);
}
std::string label{};
int x_min{}, y_min{}, x_max{}, y_max{};
};
And with that set up, you can just read the csv file from its path like:
auto samples = load_csv_file<sample_info>("path/to/file.csv");
I hope this example helps you understand how streams work in C++ and how amazing templates are.
Sorry to interrupt,
inis a stream where you can read data from. It can be from a file, a string...
...
@arrufat There is nothing to be sorry about. Thanks a lot for your detailed "csv + streams 101"!
Edit: following your code, I created an example using tuple to simplify the constructor of SampleInfo.
class LineParser {
public:
LineParser(const std::string &csv_line) : iss(csv_line) {}
template <typename T> T operator()(T &&) const { return parse_token<T>(iss); }
private:
mutable std::istringstream iss;
};
struct SensorData {
SensorData() = default;
SensorData(const std::string &csv_line) {
LineParser line_parser(csv_line);
data = execute_all(line_parser, data);
}
std::tuple<std::string, int, int, double, double, double, double, double,
double, double, double, double, double>
data{};
};
The execute_all function runs line_parser on each of the element and saves the result in a new tuple.
// This is where execute_all is define.
#include <tuple> // std::tuple
#include <utility> // std::index_sequence, std::make_index_sequence
/**
* Indices trick (a detailed solution)
*/
template <typename T>
using Bare =
typename std::remove_cv<typename std::remove_reference<T>::type>::type;
template <typename Op, typename Tuple, typename std::size_t... Is>
Bare<Tuple> f_all_dispatch(Op &&op, Tuple &&t, std::index_sequence<Is...>) {
return Bare<Tuple>{
std::forward<Op>(op)(std::get<Is>(std::forward<Bare<Tuple>>(t)))...};
};
template <typename Op, typename Tuple>
Bare<Tuple> execute_all(Op &&op, Tuple &&t) {
return f_all_dispatch(
op, t, std::make_index_sequence<std::tuple_size<Bare<Tuple>>::value>{});
}
Based on this version, I am going to create dlib matrix type from csv file. Just in case I have other related issues I would leave this open for now. Once I got the csv dlib version, I will post my solution and close this issue.
By the way, does anyone know if some template magic exists to simplify std::tuple<int, int, double, double, double, double> t; to std::tuple<int, int, make_multiple<double, 4>> t;
?
By the way, does anyone know if some template magic exists to simplify std::tuple
t; to std::tuple > t;
?
Maybe the repeat layer in dlib dnn can help you get some inspiration:
http://dlib.net/dlib/dnn/core_abstract.h.html#repeat
just doing
in >> your_matrix;.
I don't know what the expected way is to implement that. By that, I guess in contains all information of the matrix instead of just a single element, right? Below is my testing code:
std::ifstream ifs(argv[1]);
dlib::matrix<double, 5, 1> my_matrix;
dlib::set_all_elements(my_matrix, 0);
std::cout << "test matrix: " << my_matrix << std::endl;
ifs >> my_matrix;
std::cout << "test matrix: " << my_matrix << std::endl;
It seems the line with ifs >> my_matrix; did not change the value of my_matrix at all. I know it might be a stupid error somewhere but just failed to find it :(
What is in the contents of your file pointed by argv[1]?
I just ran a small test with the file containing:
1
1
1
1
and the program works fine.
What is in the contents of your file pointed by
argv[1]?
I tried it with your test file. Same result. The two prints are both all zeros.
Can you try with this self-contained example?
https://send.firefox.com/download/caf6987d3ef33a4e/#mjomEyCjvxr0DIW4OffJOw
Thanks! I don't know what I did wrong with my configuration. But now it works without any problem.
One thing to note: if a matrix is defined to have a certain size, i.e. dlib::matrix<double, 5, 1>, the csv file must have the same size of data. Otherwise, the matrix will not be assigned. Using dlib::matrix<double> however works all the time. Maybe I got the wrong size in my previous tests.
Closing the issue.