Issue:
We are using AWS Redshift and using UNLOAD command
http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html
It outputs data in the following format:
"1"|"343439"|"524136"|""|""|"127.0.0.1"|"2017-09-26 13:51:04"|"{\"websiteId\":\"someId\",\"websiteName\":\"myNewMedia\",\"deviceType\":\"desktop\"}"
As far as more and more tools in amazon having following set: quote character, escape character, and delimeter.
I know that now CsvHelper allow only doublequote. Would be nice to have ability to specify escape character.
Is it reasonable?
The thought is reasonable. The implementation is quite difficult. ;)
You can currently change the quote character, but that character is used for escaping also, since the RFC says to just double the character, not put an escape character in front of it.
The escape character is only used to escape a quote then? Meaning, it's not used to wrap the field, nor is it used to escape the delimiter or newline.
Yeah. But I can't change quote character it in Redshift unfortunately.
I think escape is used only to escape quote symbol inside field. And probably could be used if field is not quoted to escape delimiter. Need to test it.
But if we will config like: 'quote char', 'escape char', and 'delimeter char' we are good.
I might be wrong but looks like it becoming something like industry standard:
http://docs.aws.amazon.com/athena/latest/ug/csv.html
I'll look into how much work it will be to use an escape char instead of just doubling the quote char. Writing is simple. Parsing is where it gets complicated.
+1 on this - have the exact same problem reading a dump out of a mainframe. Is there any mechanism for handling this via the BadDataFound delegate? I can't seem to find anything in the documentation. Would be good if we were able to process the string and resubmit it - failing this case being natively handled, of course...
edit: leafing through the unit tests, I just discovered that setting BadDataFound to null does the trick for us
I have always used the "" to escape quotes in CSV files (as per RFC 4180) as well, but it has become common to encounter \" as the escape sequence for a quote within a string in the CSV files I am processing. As @hmvs stated it is becoming more common and it would be nice if CsvHelper handled it.
As @moconnell mentioned, setting BadDataFound seems to work and is a viable workaround, but that is really not what I want to do since that might cover up other issues.
Here is an example of the escaping:
"2344","COMMENT","The \"CsvHelper\" package is a great tool"
Field1: 2344
Field2: COMMENT
Field3: The "CsvHelper" package is a great tool
Is seems like during parsing, once you encounter your quoting character after the field delimiter, you should be able to assume anytime you encounter that escape sequence it should be replaced with the quoting character, until you encounter the unescaped quoting character.
Thanks for all your hard work on this project and keep coding!
How do you escape the escape character?
I'm working through this and I think I can do it, but it has potential to be flaky because of things like that.
I have this implemented on a branch. https://github.com/JoshClose/CsvHelper/commit/9af9eb14c0c181d41eff061ef84afb672652221d
This is on nuget as version 10.0.0-beta01. Please test this and make sure it works for you.
This is in 10.0.0 on nuget now.
Most helpful comment
I'll look into how much work it will be to use an escape char instead of just doubling the quote char. Writing is simple. Parsing is where it gets complicated.