Csvhelper: Can't Read Special Characters

Created on 13 Dec 2013  路  10Comments  路  Source: JoshClose/CsvHelper

I'm working on a Turkish CSV file here. So there are special letters like 莽, 艧, 臒, 眉...

But CsvReader can't read them properly and show those chars as a "?" symbol.

I tried to configure it as:

using(var reader = new CsvReader(new StreamReader(rootDirS + "/Stok.csv", Encoding.UTF8))) {
    reader.Configuration.Encoding = Encoding.UTF8;
    reader.Configuration.CountBytes = true;
    reader.Configuration.CultureInfo = new CultureInfo("tr-TR");
}

But it just doesn't work. How can I achieve this?

Most helpful comment

Ahh ok, I set the encoding of the reader as follows:

using(var reader = new CsvReader(new StreamReader(rootDirS + "/Stok.csv", Encoding.GetEncoding("windows-1254"))))

...which corresponds to Turkish (Windows). So it works fine now. Thanks and sorry for pointing the finger at CsvHelper.

Btw, setting it as UTF7 shows Icelandic character set instead (Like "镁" instead of "艦"). But that's a known clash since DOS times.

So, for any future readers, use the GetEncoding method (http://msdn.microsoft.com/en-us/library/t9a3kf7c(v=vs.110).aspx), and here's the code page list: http://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx

All 10 comments

That is a very good question! I haven't dealt with it myself, but I know there are several people that use it in different cultures. Here is a unit test that someone added to address some culture stuff. https://github.com/JoshClose/CsvHelper/blob/master/src/CsvHelper.Tests/LocalCultureTests.cs

If you could send me a sample CSV file, I could take a look at it.

Oh, thanks for the reply. I'm not sure if this is an issue about culture settings. It seems more related to the encoding.

Here is the file: http://we.tl/Bsdnb5YlkA

Encoding isn't handled by CsvHelper at all. That's all externally in the TextReader/TextWriter that's passed in. The only thing CsvHelper does with it is you can set the encoding so that Configuration.CountBytes gives a correct number. It should be set to the same as the TextReader.

Was the file writen with UTF8 and "tr-TR"?

Hmm, I actually don't know, it was just a guess. Only thing I know is, it has Turkish chars and it displays correctly on Excel.

So should I mess with the TextReader instead?

I just tried an encoding of UTF7, and the text looked the same as Excel or Notepad++.

You should be able to read the text correctly without CsvHelper. CsvHelper will only take that text and chunk it up based on some rules.

Let me know if it doesn't work for you.

Ahh ok, I set the encoding of the reader as follows:

using(var reader = new CsvReader(new StreamReader(rootDirS + "/Stok.csv", Encoding.GetEncoding("windows-1254"))))

...which corresponds to Turkish (Windows). So it works fine now. Thanks and sorry for pointing the finger at CsvHelper.

Btw, setting it as UTF7 shows Icelandic character set instead (Like "镁" instead of "艦"). But that's a known clash since DOS times.

So, for any future readers, use the GetEncoding method (http://msdn.microsoft.com/en-us/library/t9a3kf7c(v=vs.110).aspx), and here's the code page list: http://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx

Thanks for the info!

Help me please!!! Do not read or write files. Thank you. [email protected]
https://yadi.sk/i/bnIMo_Ww3RdY8p
https://yadi.sk/i/nvfGc7oU3RdYAN

Try using Open command as below. It worked for me.
df = pd.read_csv(open(filename, 'r'))

Was this page helpful?
0 / 5 - 0 ratings

Related issues

KuraiAndras picture KuraiAndras  路  5Comments

dsotiriades picture dsotiriades  路  3Comments

marcselman picture marcselman  路  4Comments

Wagimo picture Wagimo  路  4Comments

malinru picture malinru  路  5Comments