CsvHelper.3.0.0-RC05 does not properly serialize negative numbers

Created on 5 Oct 2017  ·  13Comments  ·  Source: JoshClose/CsvHelper

The following test shows how negative number are prepended with whitespace characters
Worked in RC03

    [TestClass]
    public sealed class IntegerTests
    {
        private sealed class SampleClass
        {
            [CanBeNull]
            public int IntValue { get; }

            public SampleClass([CanBeNull] int intValue)
            {
                this.IntValue = intValue;
            }
        }

        private sealed class SampleClassMap : ClassMap<SampleClass>
        {
            public SampleClassMap()
            {
                this.Map(m => m.IntValue).Name("IntValue");
            }
        }

        [TestMethod]
        public void DirectMapTest()
        {
            StringWriter writer = new StringWriter();
            using (CsvWriter csvWriter = new CsvWriter(writer))
            {
                csvWriter.Configuration.RegisterClassMap<SampleClassMap>();
                csvWriter.WriteHeader<SampleClass>();
                csvWriter.NextRecord();
                csvWriter.WriteRecords(new[]
                    {
                        new SampleClass(1),
                        new SampleClass(int.MaxValue),
                        new SampleClass(int.MinValue),
                    });
            }
            Assert.AreEqual(@"IntValue
1
2147483647
-2147483648
", writer.ToString());
        }
    }

Testresult

Assert.AreEqual failed. Expected:<IntValue
1
2147483647
-2147483648
>. Actual:<IntValue
1
2147483647
    -2147483648
>. 

Most helpful comment

Sounds good, and thanks for taking the issue seriously!

On Fri, Oct 6, 2017 at 1:50 PM, Josh Close notifications@github.com wrote:

I'm thinking it through, I don't want it on by default. I want the round
trip scenario of writing and reading to work. I will put a lot of warnings
all over the place about the issue. I'll think about putting it as a
parameter in the constructor. This would probably give it the most
visibility.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/JoshClose/CsvHelper/issues/771#issuecomment-334839687,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAECZsvXqeAeo7PSeGr157dA5LhZhADGks5spnZbgaJpZM4Puu7D
.

All 13 comments

There was a new feature added that's turned on by default to prevent Excel/OpenOffice/Sheets injection attacks. If the field starts with a = @ + - it will get a \t inserted in front of it. You can turn this off by doing Configuration.SanitizeForInjection = false

I will put a big note at the top of the writing portion of the documentation so people know this is there and on by default.

@JoshClose Thanks for the hint. Will try that.
One thing that comes to my mind though is "should this really be the default?"
I mean your library is called CsvHelper, not ExcelHelper and it seems very unintuitive that this will be done.

I stated to love the work you have done here and I am fond of using it. As a first time user I would probably not spend that much time and just label the software as buggy if such a simple thing "is not working correctly"

I completely agree with you. I will turn if off by default and put some big warnings in the documentation and github readme.

What do you think about this though... By default having it throw an exception if an injection character is detected? The exception message would be large explaining why it's happening and how to turn it off. This way any new person would see the message and be able to turn it off if they need.

@togakangaroo I'd like your opinion on this also.

@Mertsch it's not an excel thing. The same exact thing happens in google sheets and I believe open office. and it's a really really nasty vulnerability. An attacker can use it to steal all the data in the sheet without any warnings or popups or anything. Worse, if the attacker is fine with a popup that someone will click through anyways then with excel at least they can go even further, executing any code they want on the target's machine. But again, the basic vulnerability happens in just about any spreadsheet program because they all try to execute cells that they believe might contain formulas. This is a mistake of spreadsheet programs, but since that's something that affects almost all csv users I think it makes sense to handle that here and to have it be secure by default.

I actually think the test above is incorrect. While yes, it is true that the output has this extra character, both google sheets and excel don't actually show that character so I don't think you should be thrown by it's presence.

Another example of the issue - consider you have

class SomeClass {
    public string Value1 {get; set;} = "2+5";
    public string Value2 {get; set;} = "-2+5";
}

you would expect that csv, when opened in a spreadsheet program to appear as a row with two columns

| 2+5 | -2+5 |

right? But it won't! Instead you'll get

| 2+5 | 3 |

Again, this happens in both excel and google sheets and (I haven't tested but I have heard) open/libre office. Very unexpected results. I think its correct to adjust for this by default (and of course document the alternative approach heavily and prominently)

This is only when viewing in Excel/etc. If you're using it for data, you don't want that extra bit there, and by default having it there will cause issue when reading. What are your thoughts on throwing an exception by default if one of those characters is detected with a large message explaining what's going on, how to turn off the exception, and how to turn on injection sanitization if they want? Educate them when it happens instead of silently re-writing the file.

A bit weird, but not the end of the world - what about having a non-optional boolean parameter when instantiating CsvWriter that requires setting true/false for data or user mode? (And if we make it optional I think the implications are bad enough that the default should be user mode)

To be clear, I like the explicit option better because say in testing you never happen to run through data with a negative sign or an equals prefix, but then in prod...you start getting this occasional error. And of course, most people don't log errors very well.

That would make people chuck their PCs out the window.

lol My issue is this is a CSV library and I want it to write RFC4180 compliant files by default. This would be the same issue if someone wrote a file that was sanitized, then read it right back in. It would fail because it couldn't parse \t-1 by default. I see having extra chars silently written by default potentially being a bigger issue than getting an unexpected exception. Keep in mind I'm planning on putting a large WARNING!!! on the docs and README so people will see it. Maybe even in the NuGet description... not sure about that one.

Yeah, I get why its a difficult decision, my argument is that its better to err on the side of security because you know....like everything going on in that field lately. But I understand why its difficult.

I think far fewer people read docs than should and they're the ones most likely to be popped by this. so yeah, my vote is still for an explicit option in the CsvWriter constructor to specify whether you want

  • user mode - this is meant to be read by people using a spreadsheet style application and should preform proper security sanitization steps
  • data mode - this is meant to be read by programs that will import data from a CSV format

After thinking it through, I don't want it on by default. I want the round trip scenario of writing and reading to work. I will put a lot of warnings all over the place about the issue. I'll think about putting it as a parameter in the constructor. This would probably give it the most visibility.

Sounds good, and thanks for taking the issue seriously!

On Fri, Oct 6, 2017 at 1:50 PM, Josh Close notifications@github.com wrote:

I'm thinking it through, I don't want it on by default. I want the round
trip scenario of writing and reading to work. I will put a lot of warnings
all over the place about the issue. I'll think about putting it as a
parameter in the constructor. This would probably give it the most
visibility.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/JoshClose/CsvHelper/issues/771#issuecomment-334839687,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAECZsvXqeAeo7PSeGr157dA5LhZhADGks5spnZbgaJpZM4Puu7D
.

The data in/out compatibility was also my primary thought.
I am also in favor of an explicit and required parameter for the constructor / a mandatory Configuration.Format (maybe an enum with Rfc4180, Default = Rfc4180, OptimizedForPopularReaders)

Thanks for the constructive discussion 👍

Was this page helpful?
0 / 5 - 0 ratings