DataFrame.ReadCsv at the moment loops over all the rows in a file twice: once to find the number of rows and schema, and the 2nd time to append the data. Each line is read in as a string and string.Split is called on it. Profiling shows that a sizeable chunk of time is spend in the Split calls. We should explore using a ReadOnlySpan
Is there a reason to not make api ReadStream async?
Nope. This is not the most performant implementation. This exists at the moment so folks can work on real data. Making this method more efficient is on my backlog.
Could we also make it such that it can handle escaping using quoted strings.
Most helpful comment
Could we also make it such that it can handle escaping using quoted strings.