CsvHelper 🚀 - Add async/await from *Async methods

Do you have any preferences on how that would work; like what methods it would be implemented on?

JoshClose on 13 Nov 2013

Ideally an async version would exist for all the reading and writing logic.

For example, we have to do this currently:

public Task<Record> GetRecordAsync(string recordId)
{
    Guard.That(RecordId).IsNotNullOrEmpty();

    Record result = null;

    using (var reader = this._reader())
    {
        var relatedRecords = reader.GetRecords<Record>();

        if (relatedRecords != null)
        {
            result =
                relatedRecords.FirstOrDefault(
                    x => string.Equals(x.RecordId, recordId, StringComparison.OrdinalIgnoreCase));
        }
    }

    return Task.FromResult(result);
}

It would be great if this was async right down to the IO so that calling code could do the following:

public await Task<Record> GetRecordAsync(string recordId)
{
    Guard.That(RecordId).IsNotNullOrEmpty();

    Record result = null;

    using (var reader = this._reader())
    {
        var relatedRecords = await reader.GetRecordsAsync<Record>().ConfigureAwait(false);

        if (relatedRecords != null)
        {
            result =
                relatedRecords.FirstOrDefault(
                    x => string.Equals(x.RecordId, recordId, StringComparison.OrdinalIgnoreCase));
        }
    }

    return result;
}

roryprimrose on 13 Nov 2013

I agree that this would be a fantastic feature to have in CsvHelper. I use this library almost whenever I need to parse CSV file since it makes life so much easier, but unfortunately in a project that's going async all the way I had to resort to manual parsing. Has any work been started on implementing this? I actually just forked the repo and am going to start implementing the Task Async pattern for .NET 4.5 when I have some spare time.

jblacker on 18 Mar 2014

I haven't started anything on this yet. I should just do it, 'cause I don't think it'll be too many lines of code.

JoshClose on 18 Mar 2014

On second thought, let me know what your plans are on this.

The simple case would be to just run the current stuff as a Task.

c# public async Task<IEnumerable<T>> GetRecordsAsync<T>() { return Task.Run( () => { return GetRecords<T>().ToList(); } ); }

This would have to call .ToList() because it returns an IEnumerable that will get a row on each iteration.

The more complex scenario is to make it async all the way down to the TextReader/TextWriter level, so that each iteration of the IEnumerable is async, and you would be able to call GetRecordAsync<T>, and use the parser async also. Doing this would be quite a large change into the bowels of the parser.

JoshClose on 18 Mar 2014

I actually thought about doing that as well, but the IO calls made by the underlying TextWriter/TextReader are still made using synchronous blocking calls. I would do what you mentioned, but in a file IO intensive application the long term solution is probably to go async all the way down to the parser.

jblacker on 18 Mar 2014

👍2

Yeah. That will be a little tricky to get those lower level calls to work in an async manner without having to duplicate code.

JoshClose on 18 Mar 2014

So after looking into this a little, I'm not sure it's worth having the async passed all the way down. Will that actually save any time? Just having the whole thing run on a background thread will probably be fine. There is no opportunity to have multiple things running at the same time.

I will probably make async Task<T> GetRecordAsync<T> methods.

I will probably also do a async Task<IEnumerable<T>> GetRecordsAsync<T> also, but that will not yield results; it'll just return everything when it's finished.

For the IEnumerable<T> scenario, I will probably make a IObservable<T> GetRecordsObservable<T> that will have the same effect as what you'd think you'd want with a Task<IEnumerable<T>>.

JoshClose on 19 Mar 2014

It looks like implementing IObservable is a big no-no, and pulling in parts of Rx to implement would be huge. I got up to around 30 classes and just gave up. You pretty much have to pull the whole framework in to do Observable.Create.

JoshClose on 20 Mar 2014

Maybe it's not worth it and wrapping in an async Task object is enough as long as it's configured to always run on a background thread.

jblacker on 20 Mar 2014

@roryprimrose Do you have a reason that you'd like the async to go down to the IO level? I think having a task run at a higher level won't lose any speed. As long as the IO, parsing, and converting to an object is done in a task, I would think it would be fine. That whole part could be on a single separate thread.

JoshClose on 22 Mar 2014

It sounds ok because you would still be awaiting a task to complete that is in itself doing more than just the IO work. All the way down would be better but this is a good trade-off.

roryprimrose on 22 Mar 2014

The reader pulls a bunch of data into a buffer at once, and most of the work is done elsewhere. I'll continue with just putting tasks at a higher level.

I'll shelve something in the next day or so to see what you guys think.

JoshClose on 22 Mar 2014

Will this affect apps built with ASP.NET MVC or Web API?

Sent from my Windows Phone

From: Josh Close
Sent: 3/21/2014 8:44 PM
To: JoshClose/CsvHelper
Subject: Re: [CsvHelper] Add async/await from *Async methods (#202)

The reader pulls a bunch of data into a buffer at once, and most of the work is done elsewhere. I'll continue with just putting tasks at a higher level.

I'll shelve something in the next day or so to see what you guys think.

Reply to this email directly or view it on GitHubhttps://github.com/JoshClose/CsvHelper/issues/202#issuecomment-38341018.

mkrain on 22 Mar 2014

What do you mean? Affect them how?

There will be new methods called GetRecord(s)Async and WriteRecord(s)Async, so it will be new stuff and shouldn't change any existing code.

JoshClose on 22 Mar 2014

Alright. Can everyone have a look at this? It's pretty simple but should do that job. Any suggestions are welcome, but I would like to keep it simple if possible. https://github.com/JoshClose/CsvHelper/commit/2c3f5336db501d71e2b2ef937883f0a56437c662

JoshClose on 22 Mar 2014

Just want to make sure extra threads aren't being created. In a web app that's not ideal. However, if you're going to use task factory it should be ok.

Sent from my Windows Phone

From: Josh Close
Sent: 3/21/2014 9:02 PM
To: JoshClose/CsvHelper
Cc: Jeremiah Medina
Subject: Re: [CsvHelper] Add async/await from *Async methods (#202)

What do you mean? Affect them how?

There will be new methods called GetRecord(s)Async and WriteRecord(s)Async, so it will be new stuff and shouldn't change any existing code.

Reply to this email directly or view it on GitHubhttps://github.com/JoshClose/CsvHelper/issues/202#issuecomment-38341306.

mkrain on 22 Mar 2014

I'm leaning towards not putting this feature in. The reasoning is in the discussion in the commit. https://github.com/JoshClose/CsvHelper/commit/2c3f5336db501d71e2b2ef937883f0a56437c662

JoshClose on 25 Mar 2014

Putting Task.Run() around everything (and everywhere) is indeed not the way to go.

@JoshClose was there no way to get the async done at a lower level? E.g. where you read the stream? (Stream has many async methods) I'm about to parse a 45MB stream provided by WebClient. It will be a slow stream...

9Rune5 on 28 Feb 2017

👍2

It would have to be in the parser, which means basically every single method in the entire library would have to change to Async. I don't think there would be a performance improvement at all from it. It's simple enough for a consumer to do Task.Run where they want to.

JoshClose on 28 Feb 2017

Task.Run is a simple workaround, but the whole point of using async is to avoid using yet another worker thread (which cannot easily be canceled in a asp.net app) . If every library implementer out there responds the same, then we are back to square one. A lot of effort went into C# to avoid resorting to Task.Run() for IO bound operations.

In my current case however, I will only read my input file once per day, so hogging an extra worker thread for what is a very limited time is not much of an issue, other than violating my project's coding guidelines and being the odd duck out. Next time I might not be so lucky.

But yeah, the core would then need to be async, otherwise there is no point in doing it.

9Rune5 on 1 Mar 2017

What stinks about this is way deep down there is a single call that happens. https://github.com/JoshClose/CsvHelper/blob/master/src/CsvHelper/FieldReader.cs#L108

I'll look into doing it again all the way down to the parser reading.

JoshClose on 3 Mar 2017

I think there will be a lot of code duplication. I can't have the sync methods call the async methods and wait on them because this needs to work in .NET 2.0.

JoshClose on 3 Mar 2017

Making things async would only make sense when an operation takes more than 10-15 ms IMHO. Reading a char doesn't make it faster.

What we are doing in our library is reading / writing the file stream into memory (async). Then we let the parser do its magic.

The only thing that could make (a bit) of sense is if CsvHelper would read batches from the memory stream async, but I don't think it's worth the effort.

GeertvanHorrik on 2 Jun 2017

If everything is ultimately chucked into a memory stream, then Geert has a point.

But is this really the case? If I start streaming a huge csv file from a slow website (actually my main usage of CsvHelper thus far), I first have to wait for everything to be stuffed into a MemoryStream..? That does not sound healthy.

9Rune5 on 2 Jun 2017

But that can be done async (since you can pass in a stream to CsvHelper). So we wrote a method like this (of course this doesn't compile, but it's for explanation purposes only):

public async Task<IEnumerable<TRecord>> ReadCsvAsync<TRecord, TMap>(Uri uri)
{
    var downloadedBytes = await _downloader.DownloadSlowStuffAsync();

    using (var memoryStream = new MemoryStream(downloadedBytes))
    {
        return CsvHelper.ParseSync(memoryStream);
    }
}

The slow part in your code isn't the parsing, it's the retrieval of the file.

You could even split this up:

Store the first line (as header)
Download 1MB at a time. While the download continues, you can already start parsing the code (since the download is async anyway).

It's a bit more complex, but then you would have a true async method.

GeertvanHorrik on 2 Jun 2017

No data is read into a buffer of size chosen in config (2048 by default).
No memory stream inside, at least not anymore.

jamesbascle on 2 Jun 2017

Is there any place other than TextReader.Read where an async call would occur? https://github.com/JoshClose/CsvHelper/blob/master/src/CsvHelper/FieldReader.cs#L109 Same would go for writing.

JoshClose on 2 Jun 2017

If you are really reading char by char, I would not even bother making this async (since thread switching, creating tasks, etc is more expensive).

GeertvanHorrik on 2 Jun 2017

👍1

It's reading a buffer of Configuration.BufferSize = 2048 size. That buffer is then read char by char and keeps track of positions so it can be pulled out of the buffer all at once for performance.

// Read from TextReader into buffer
charsRead = Reader.Read( buffer, bufferPosition, configuration.BufferSize );

// Pull data out of buffer into string
field.Append( new string( buffer, fieldStartPosition, length ) );

JoshClose on 2 Jun 2017

👍1

@GeertvanHorrik @JoshClose Agreed! While it's possible to add async functionality the only async function that could be called in the stacktrace from a reading standpoint is reading a character which gets called a lot so this story is probably not a good idea. I wanted to find a reason it would make sense to add async but there really isn't any.

kibbled on 22 Jun 2017

I agree. There would have to be duplicate methods all the way up the tree too. I don't think it makes sense in this library. If someone just wants reading to not block, they can start a new task. I'll need some good convincing and a solid plan to open this one back up.

JoshClose on 23 Jun 2017

To jump in here- I don't think the right solution is to not offer *Async methods.

If the stream is being read character by character then I think the solution is to not read character by character but rather chunks at a time.

IO operations absolutely need async overloads. And the implementation should be appropriate for async. Giving up isn't the right solution.

MgSam on 26 Jun 2017

The characters themselves are read on chunks into a buffer, but inside of
the read character method. Thus we would be creating a task for each
character. That would be highly inefficient.

On Jun 26, 2017 11:21 AM, "Sam" notifications@github.com wrote:

To jump in here- I don't think the right solution is to not offer *Async
methods.

If the stream is being read character by character then I think the
solution is to not read character by character but rather chunks at a time.

IO operations absolutely need async overloads. And the implementation
should be appropriate for async. Giving up isn't the right solution.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/JoshClose/CsvHelper/issues/202#issuecomment-311091798,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_ohW9ZDhHOhNlHHQ1bjM3qvJtTRUJ9ks5sH8yDgaJpZM4BLKDJ
.

jamesbascle on 26 Jun 2017

👍1

@jamesbascle I understand that is the current implementation- I'm saying it should be rewritten so that it's done efficiently and not reading per character.

Every IO library nowadays is offering Async overloads. An answer that is essentially "it's impossible" seems extremely unlikely.

MgSam on 26 Jun 2017

I see a way of refactoring the buffer reads so every CsvFieldReader.GetChar() doesn't have to be async.

Is there an easy way to have the same code ran for methods so I don't have to duplicate all the code for every method? Most of the code all the way down will be identical. There will just be Read vs ReadAsync at the bottom. I don't want to use any sort of wrapper that just does Wait or ConfigureWait.

JoshClose on 1 Jul 2017

I suggest putting a feature request in for an implementation of the parser which is not character based. There may be scenarios where one performs better than another. ICsvReader with implementations such as CsvCharReaderImpl, CsvLineReaderImpl, etc. I'm simplifying this of course but by only point is not to overload the existing classes where it doesn't fit. A number of other stories would be required for this effort including decoupling some of these classes and coming up with an interface design that would work for different implementations. Whether or not this effort would be worth it is another story. :)

kibbled on 3 Jul 2017

I'm not sure I see the idea behind that post clearly.

The data is not loaded into memory char by char, but on a continuous basis
according to however large you set the buffer size to.

The parser, regardless of how the data is loaded into that buffer, has to
operate on a character by character basis, to detect control characters,
and other such things.

jamesbascle on 3 Jul 2017

@jamesbascle I was thinking of using ReadLine (async) then parsing each line char by char after that. Just brainstorming mind you. If someone were to write an async parser that's where I'd start.

kibbled on 3 Jul 2017

What if there is a \r\n in a field? ReadLine may not be a whole CSV line.

Currently it's reading into a buffer. The char by char reading is done from that buffer. I was thinking of just having ReadBuffer and ReadBufferAsync methods so the ReadChar wouldn't have to be async.

My current issue is finding away to not duplicate all the code in all the methods.

JoshClose on 3 Jul 2017

If you reached bufferlength - 10, you can start reading the next batch anyway. Since it's async it doesn't matter ;-)

GeertvanHorrik on 3 Jul 2017

I can't see a way of making async methods without making a duplicate async method for every method in CsvParser. CsvFieldReader.GetChar would not be async, but there would need to be an async method to load the buffer every time it ran out. A check to see if the buffer needs loading would have to happen any time before a char is retrieved. CsvParser gets a char in every method in that class. I'm not seeing a way around this. I don't want to duplicate the code in every method.

I've read some people talking about how you should either create a sync method or an async method, but not both. If the code is IO bound or is going to take a long time, then it should be async, otherwise sync. CsvHelper isn't IO bound by default. The user decides whether that is the case or not. If their TextReader is a network stream, then it's going to be downloading as it's parsing. The user could also choose to download it asynchronously ahead of time, then run it through CsvHelper.

Every IO library nowadays is offering Async overloads.

What are some examples of this? I'd like to see how they overcome all the duplication of code.

The thing that's really holding me back here is code duplication. I need a way to run the same code sync and async without duplication. If the TextReader isn't attached to network, it's going to be a lot faster to no use async methods.

Also, method wrapping won't work. Should I expose asynchronous wrappers for synchronous methods? Should I expose synchronous wrappers for asynchronous methods?

JoshClose on 22 Aug 2017

👍2

What are some examples of this? I'd like to see how they overcome all the duplication of code.

JSON.NET and Entity Framework are the two biggest examples. I'm sure there are lots of others as well.

If you don't want to duplicate code, one thing you could do is make every method async, and make the synchronous versions just call .Result on the async versions. This might not be "recommended" but it will effectively re-use code. (Just make sure all your async methods are all doing ConfigureAwait(false) or else this will cause big problems)

In WinRT, Microsoft actually eliminated the non-async overloads altogether, though this would be breaking so I don't recommend this.

For reference, Stephen Toub wrote a blog post about this topic.

MgSam on 5 Sep 2017

So, looks like Json.NET just copies all the methods exactly and replaces calls with async calls. I will look into what that would entail here and how I can make it have minimal amounts of code duplication. This means the reader will probably only have the Read() method be async and the consumer would have to loop the records manually, which isn't a big deal.

await csv.Read();
csv.ReadHeader();
while( await csv.Read() )
{
    var record = csv.GetRecord<MyType>();
}

I'll add this to the 3.0 milestone.

JoshClose on 5 Sep 2017

The IDE experience isn't as nice, but another thing I've done here is use T4 templates to promote code re-use. (Alternative templating engines would also work, of course) That way you move all the code into templates and just use <# if(isAsync) #> or something similar to output essentially the same code but with the async tweaks.

MgSam on 5 Sep 2017

lol, oh man. I really don't want to do the T4 thing, but I'll look into it. If it makes it easier and is still readable, maybe I'll do that.

Got any good T4 doc sources you recommend?

JoshClose on 5 Sep 2017

I learned how to use it from the MSDN docs. It's similar to writing ASP.NET, if you've ever done that.

The only tricky part is if you want it to be invoked as part of the build (it doesn't need to be, if you're ok just running it once via Visual Studio and checking in the generated files).

Entity Framework also uses it for their code generators, I believe.

MgSam on 5 Sep 2017

👍1

Well, seems like T4 will work as I have all the parsing working using async methods. I found a VS plugin, tangible T4 that is helping a lot. I guess this is going to be a thing. ;)

JoshClose on 5 Sep 2017

Got all the async reading done. Now on to writing, which shouldn't take long at all.

Thanks @MgSam for the direction.

Here is a unit test to show how reading will work. Obviously returning a Task<IEnumerable<T>> won't work, so you have to write a couple more lines.

using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;

namespace CsvHelper.Tests.Async
{
    [TestClass]
    public class ReadingTests
    {
        [TestMethod]
        public async Task ReadingTest()
        {
            using( var stream = new MemoryStream() )
            using( var writer = new StreamWriter( stream ) )
            using( var reader = new StreamReader( stream ) )
            using( var csv = new CsvReader( reader ) )
            {
                writer.WriteLine( "Id,Name" );
                writer.WriteLine( "1,one" );
                writer.WriteLine( "2,two" );
                writer.Flush();
                stream.Position = 0;

                var records = new List<Simple>();
                await csv.ReadAsync();
                csv.ReadHeader();
                while( await csv.ReadAsync() )
                {
                    records.Add( csv.GetRecord<Simple>() );
                }

                Assert.AreEqual( 2, records.Count );

                var record = records[0];
                Assert.AreEqual( 1, record.Id );
                Assert.AreEqual( "one", record.Name );

                record = records[1];
                Assert.AreEqual( 2, record.Id );
                Assert.AreEqual( "two", record.Name );
            }
        }

        private class Simple
        {
            public int Id { get; set; }

            public string Name { get; set; }
        }
    }
}

JoshClose on 6 Sep 2017

Done. db5945439d8213c38006df4e8a57fc4a55364f52

JoshClose on 6 Sep 2017

This is on NuGet now. https://www.nuget.org/packages/CsvHelper/3.0.0-chi08

JoshClose on 6 Sep 2017

👍1

I was afraid you were going for the .Result solution, but just checked the code and it looks good! Well done, thanks for implementing this.

GeertvanHorrik on 6 Sep 2017

👍1

I read too many articles telling me not to do it. ;)

I was just waiting for a good solution to not reproduce code. T4 seemed to work well. The only hard part is doing active development on a template. I'm basically doing active development on the result file, and then will have to merge the changes back in later.

JoshClose on 6 Sep 2017

👍1

What happened to this? I cant find WriteRecordsAsync from https://github.com/JoshClose/CsvHelper/commit/2c3f5336db501d71e2b2ef937883f0a56437c662 in 6.0 version.

nawfalhasan on 29 Nov 2017

There is no WriteRecordsAsync. That part doesn't need to be asynchronous. There is FlushAsync and NextRecordAsync though. NextRecordAsync will call FlushAsync for you.

JoshClose on 29 Nov 2017

WriteRecords will never block? It holds the records until I call FlushAsync? Is that effective, when I call WriteRecords many times, with only few records in every write?

hogan on 25 May 2018

WriteRecords will write several objects at once and will call NextRecord which in turn calls Flush. This doesn't use the async stuff at all. If you want to write async methods, you'll need to loop your records and call WriteRecord, then NextRecordAsync.

JoshClose on 25 May 2018

@JoshClose - can you confirm that the following works when reading in a physical file?

 private static async Task<List<string>> ReadCsvAsync(string filename)
 {
      using (var streamReader = File.OpenText(filename))
      {
           using (var reader = new CsvReader(streamReader))
           {
                var list = new List<string>();

                while (await csv.ReadAsync())
                {
                     list.Add(csv.GetField(0));
                }
           }
      }
 }

As soon as my break point hits the WHILE loop with the ReadAsync() call, it goes into lala land and never moves to the next line. If I change it to Read() then everything works.

brett-burkhart on 5 Jun 2018

My guess is that brett-burkhart encountered a deadlock. A quick grep indicates that .ConfigureAwait(false); is not called anywhere in the code base? (which means deadlocks could be encountered in Winforms/WPF applications)

9Rune5 on 5 Jun 2018

Good call rune. That is an important thing to so to async/await calls on
libraries so it works in the UI frameworks

On Tue, Jun 5, 2018, 5:07 PM 9Rune5 notifications@github.com wrote:

My guess is that brett-burkhart encountered a deadlock. A quick grep
indicates that .ConfigureAwait(false); is not called anywhere in the code
base? (which means deadlocks could be encountered in Winforms/WPF
applications)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/JoshClose/CsvHelper/issues/202#issuecomment-394860726,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_ohRQBrfdEYU_Ydoohk6grVyMUJz25ks5t5vMmgaJpZM4BLKDJ
.

jamesbascle on 6 Jun 2018

Are you saying CsvHelper should be using ConfigureAwait(false) or the caller of CsvHelper should be?

JoshClose on 7 Jun 2018

Yes.

Apply .ConfigureAwait(false) liberally within the library code. Basically every time you do not need to resume the same context after the await. For non-GUI code, this pretty much means 'always'.

It seems counterintuitive at first (why, oh why, isn't it the default?!), but eventually it becomes second nature.

BTW: It may seem unnecessary to do this for every await in any given method, but some tasks may end up executing synchronously (e.g. if you mock a certain function or a task finishes much sooner than anticipated), so AFAICT best practices dictates showering every await with a corresponding ConfigureAwait().

Tip: Read everything written by Stephen Cleary. Start here: https://msdn.microsoft.com/en-us/magazine/jj991977.aspx?f=255&MSPPError=-2147217396

9Rune5 on 7 Jun 2018

Thanks for the link to Stephen Cleary. I would have asked for that. I'll change all async calls to use ConfigureAwait(false).

JoshClose on 8 Jun 2018

I added issue #1047 for this.

JoshClose on 8 Jun 2018

👍2

Done.

JoshClose on 8 Jun 2018

Csvhelper: Add async/await from *Async methods

Most helpful comment

All 65 comments

Related issues