Csvhelper: Commas within comma separated values issue

Created on 4 Aug 2014  路  10Comments  路  Source: JoshClose/CsvHelper

sample input csv: "a", "b", "c", "d1,d2,d3" (4 columns)
csvHelper reads it as: "a", "b", "c", "d1", "d2", "d3" (6 columns)

any way around this?
IgnoreQuotes = true/false does not help
Delimiter = ','

All 10 comments

I'm not seeing this happen...

Running this code:

``` c#
void Main()
{
var data = "\"a\",\"b\",\"c\",\"d1,d2,d3\"";
var stream = new MemoryStream();
var reader = new StreamReader( stream );
var writer = new StreamWriter( stream );
var parser = new CsvParser( reader );
writer.WriteLine( data );
writer.Flush();
stream.Position = 0;

parser.Read().Dump();

}

Gives this result:

a
b
c
d1,d2,d3
```

I don't know if you're having the same issue I just debugged, but

var data = "\"a\",\"b\",\"c\",\"d1,d2,d3\""; will pass, while
var data = "\"a\",\"b\",\"c\", \"d1,d2,d3\""; will fail

This happens because the CSV Parser does not enter the fieldIsEscaped mode unless the quote character is the very first character after the delimiter or CRLF:

https://github.com/JoshClose/CsvHelper/blob/master/src/CsvHelper/CsvParser.cs#L274

``` c#
// [prevCharWasDelimiter is false here, even if Configuration.TrimField = true]
if( !fieldIsEscaped && ( prevCharWasDelimiter || cPrev == '\r' || cPrev == '\n' || cPrev == null ) )
{
// The field is escaped only if the first char of
// the field is a quote.
fieldIsEscaped = true;
}

// [so we just continue unescaped here, and the quote is ignored]
if( !fieldIsEscaped )
{
// If the field isn't escaped, the quote
// is like any other char and we should
// just ignore it.
continue;
}
```

The reason I bring this up is that I noticed in the original question, there are spaces between the delimiting commas and the escaping quotation marks.

So there is a space after the comma before the quote? It is handling this situation as it should. That is not a valid CSV format as you are expecting it to be. For invalid formats, CsvHelper will do the same thing as MS Excel. If you open that up in Excel, you will see the same result.

That is (was) my situation. Looking at the grammar, it's easy to see that a quote character must immediately follow a field delimiter or a newline. However, it's not necessarily intuitive that whitespace within lines is important when you're hand-crafting CSV strings or files. (This is not a problem for you to solve, just exposition.)

Aside from that question, my post was primarily to point out that the sample data in @shadargee's question contains spaces between the fields, and suggest that that might be the cause of the behavior he or she noted.

Oh man, I didn't even realize you didn't post the original question. Ha ha ha.

So yes, the problem is the spaces in between the , and the ". Thanks @arootbeer for pointing that out.

@shadargee There is nothing you can do in CsvHelper to solve the problem. You will have to sanitize your data because it's invalid. You could write a simple regex to remove the spaces in between the , and the ".

@JoshClose
(v12.1.2 here)
However, for code:

var csvConfig = new Configuration()
{
     Encoding = Encoding.UTF8,
     Delimiter = ",",
     IgnoreQuotes = true
};
using (var streamReader = new StreamReader(fileStream))
{
     using (var parser = new CsvParser(streamReader, csvConfig))
      {
           //values: "CE272A","15,000.0","1","1.0","141.8"
           var row = parser.Read();
           //"15,000.0" is now  "\"15" and "000.0\"" - row has 6 members instead of 5
           ...
      }
}

I keep getting "15,000.0" as ""15" and "000.0"", and the file opens up in Excel with no issues.
I was looking to replace the VisualBasic.FileIO parser (not .NET Standard compatible) which is handling this case as expected (value stays as "15,000.0").

Am I missing something in the configuration?

Remove IgnoreQuotes = true. This is expected behavior. See the Configuration doc.

@marcrocny Indeed, that solved the issue! Thank you.

Porting some older code to .NET Standard and I was looking at using this library to replace the use of Microsoft.VisualBasic.FileIO.TextFieldParser. This was actually in one of the unit tests, so it's a scenario that must have occurred.

What exactly is the issue with writing some code that looks forward to see if the quote is there? This isn't really "Bad Data", so I'm not sure I understand the reasoning behind not fixing this.

After some testing, I found that:

Considered Invalid Data
VAL1,VAL2,VAL3, "VA,L4",VAL6

Not Considered Invalid Data
VAL1,VAL2,VAL3,"VA,L4" ,VAL6

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NeilMeredith picture NeilMeredith  路  4Comments

Wagimo picture Wagimo  路  4Comments

muzzamo picture muzzamo  路  5Comments

RizwanAhmedJutt picture RizwanAhmedJutt  路  5Comments

rh78 picture rh78  路  3Comments