Csvhelper: Performance degradation comparing to ver. 2.16.3

Created on 27 Nov 2017  路  11Comments  路  Source: JoshClose/CsvHelper

I've been using CsvHelper since ver. 2. Recently I've upgraded to 6.0.0.0 and noticed great performance degradation for write and little less degradation for read (using ClassMap).
My data class

```c#
class Data
{
public Data(Int32 num1, Byte num2, String word1, String word2)
{
Num1 = num1;
Num2 = num2;
Word1 = word1;
Word2 = word2;
}

public Data()
{

}

public Int32 Num1 { get; set; }
public Byte Num2 { get; set; }
public String Word1 { get; set; }
public String Word2 { get; set; }

}

My mapper class 
```c#   
class DataCsvMap : CsvHelper.Configuration.ClassMap<Data>
{
    public DataCsvMap()
    {
        Map(m => m.Num1).Index(0);
        Map(m => m.Num2).Index(1);
        Map(m => m.Word1).Index(2);
        Map(m => m.Word2).Index(3);
    }
}

My methods to test v.6.0.0, same for 2.16.3 (except csvWriter.NextRecord();):
```c#
class Program
{
const Int32 RECORD_COUNT = 10000000;
static void WriteStaticTest()
{
using (Stream fileStream = File.Create("V6.txt"))
using (StreamWriter streamWriter = new StreamWriter(fileStream))
using (CsvWriter csvWriter = new CsvWriter(streamWriter))
{
streamWriter.NewLine = "\r\n";
csvWriter.Configuration.Delimiter = "\t";
csvWriter.Configuration.HasHeaderRecord = false;
csvWriter.Configuration.QuoteNoFields = true;
csvWriter.Configuration.RegisterClassMap();

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        for (Int32 i = 0; i < RECORD_COUNT; i++)
        {
            csvWriter.WriteRecord<Data>(new Data(i, 1, "word for line", "line for word"));
            csvWriter.NextRecord();
        }
        stopwatch.Stop();
        Console.WriteLine("V6 write of {0} records takes {1} ms", RECORD_COUNT, stopwatch.Elapsed.TotalMilliseconds);
    }
}

static void ReadStaticTest()
{
    using (Stream fileStream = File.OpenRead("V6.txt"))
    using (StreamReader streamReader = new StreamReader(fileStream))
    using (CsvReader csvReader = new CsvReader(streamReader))
    {
        csvReader.Configuration.Delimiter = "\t";
        csvReader.Configuration.HasHeaderRecord = false;
        csvReader.Configuration.RegisterClassMap<DataCsvMap>();

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        Data d = null;
        while (csvReader.Read())
            d = csvReader.GetRecord<Data>();

        stopwatch.Stop();
        Console.WriteLine("V6 read of {0} records takes {1} ms", RECORD_COUNT, stopwatch.Elapsed.TotalMilliseconds);
    }
}

static void Main(string[] args)
{
    WriteStaticTest();
    ReadStaticTest();

    Console.ReadKey();
    File.Delete("V6.txt");
}

}
```

Here is my setup: .net 4.6.1, Release build. 5 runs for each CsvHelper version (doing it alternately).
Here are my results (average):
Write 1'000'000 - v.2.16.3 takes 1103 ms - v.6.0.0 takes 3867 ms
Read 1'000'000 - v.2.16.3 takes 2008 ms - v.6.0.0 takes 3454 ms
Write 10'000'000 - v.2.16.3 takes 10723 ms - v.6.0.0 takes 39084 ms
Read 10'000'000 - v.2.16.3 takes 20066 ms - v.6.0.0 takes 34895 ms

So in 6.0.0 write is slower approx. 3 times, read is slower approx. 1.5 times.

performance

Most helpful comment

I was able to cut the reading time in half. Writing looks like it's going to be much harder to find significant gains.

All 11 comments

Thanks! I'll look into it.

How many properties does the class that's being written/read have?

The class is Data from source code, it has 4 properties (of different types)

I tested just parsing and that's twice as slow. I made major improvements to the speed of parsing, so this is very concerning to me. Here are some results I see.

Version 2.16.3
Write: 00:00:59.4421266
Parse: 00:00:24.3046222
Read: 00:00:48.5294866

// Parser speed improvements

Version 3.0.0-beta5
Write: 00:00:58.0688798
Parse: 00:00:13.8719079
Read: 00:00:38.2780997

3.0.0-beta6
Write: 00:00:54.0468400
Parse: 00:00:12.3032375
Read: 00:00:36.4623984

// Write degradation

3.0.0-beta7
Write: 00:01:13.9142060
Parse: 00:00:12.3463482
Read: 00:00:36.3777355

Version 3.0.0-chi5
Write: 00:01:18.3042761
Parse: 00:00:14.0855853
Read: 00:00:38.9330303

// Parsing/Reading degradation

Version 3.0.0-chi6
Write: 00:01:21.7548163
Parse: 00:00:34.5194271
Read: 00:01:01.3978627

Version 6.0.0
Write: 00:01:43.9613862
Parse: 00:00:44.0045795
Read: 00:01:15.1836657

Parsing and reading were down considerably at one point.

I've narrowed down the commits that caused the problems and will have to dig and see what can be done.

I was able to cut the reading time in half. Writing looks like it's going to be much harder to find significant gains.

Using this benchmark: https://www.codeproject.com/Articles/1175263/Why-to-build-your-own-CSV-parser-or-maybe-not

Reading 1,000,000 lines:

  • CsvHelper 3.0.0-beta7: 6.0445 seconds
  • CsvHelper 6.1.1: 22.9620 seconds

I haven't delved into where the difference lies.

I've figured out the issue with reading and can get it down to that speed again. Writing I've only figured out a few things that are causing the slow down. I should just put the reading code out there so at least that can be faster for now. The fix is a breaking change for the reader and writer.

I released version 7.0.0 on github which has the reading performance improvements. The writing performance is not improved yet.

I've tried ver. 7.0.1, read performance is twice faster than ver. 6.0 (on my tests), write performance is a little bit faster than ver. 6.0. Thanks!

You're welcome! Write performance is going to be a little harder to get large gains from.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Dushyant262 picture Dushyant262  路  4Comments

KuraiAndras picture KuraiAndras  路  5Comments

DmitryEfimenko picture DmitryEfimenko  路  3Comments

mabead picture mabead  路  3Comments

dsotiriades picture dsotiriades  路  3Comments