Csvhelper: No exception thrown if csv file has no record but header doesn't match map

Created on 7 Nov 2016  Â·  27Comments  Â·  Source: JoshClose/CsvHelper

I was rather surprised to see that CsvHelper would not throw an exception when reading a file which, instead of the usual CSV, was just a single line of error code (the file could not be generated), even though a proper ClassMap is registered.
I tried setting ThrowOnBadData to true, but it does not help the header fields get validated against the ClassMap.

This can cause serious issues when a CSV file has no record because something wrong happened. It looks like a bug, unless I am missing a configuration setting.

I am using the following code:

 using (var csv = new CsvReader(new StringReader("Something terrible happened"), new CsvConfiguration { CultureInfo = CultureInfo.InvariantCulture, ThrowOnBadData = true, Delimiter = "|", ShouldSkipRecord = cols => string.IsNullOrWhiteSpace(string.Join("", cols)) })) {
  csv.Configuration.RegisterClassMap<Activity.Map>();
  return csv.GetRecords<Activity>();
}
feature

Most helpful comment

There are some configuration and exceptions that occur when actually getting the row, so this would need to happen in ReadHeaders. There is already a list of the header names to match against, so there is no need to loop through the types property names as in @Arithmomaniac 's example

There will probably be another flag to turn this check on/off.

All 27 comments

Looks like csv.ReadHeader should enable a workaround:

csv.ReadHeader();
if (typeof(MyType).GetProperties().Select(x => x.Name)
    .Except(csv.FieldHeaders)
    .Any())
{
    throw new InvalidOperationException("Missing Columns");
}

There is no way for the library to tell that the file is an error message instead of real data. It's just looking for commas and line feeds. I would think you'd get a missing field exception or something though.

If your files will sometimes not be a CSV file and instead be an error message, I would suggest looking at the first line of the file to determine if it's an error message before running it through CsvHelper.

Yes, we should for sure get a missing field exception in such cases. A lot of people are probably rightly relying on CsvHelper to do this kind of basic CSV validation. I was quite frightened to find out no exception was thrown.

Determining if the first line of the file is an error message is not reliable as in most cases we cannot guess what the error message will be. With field mapping (file validation is one of its key purposes), CsvHelper has everything it needs to quickly and efficiently do the check, so it wouldn't make sense to have to duplicate the mapping info for the purpose of validating the header (e.g. comparing the header column names with the list of expected column names).

Indeed, it seems csv.ReadHeader should implement this check (maybe conditional on ThrowOnBadData if this is the meaning of this option).

If your error message is a valid CSV file that your definition supports, I'm not sure how that is appropriate for CSVHelper's logic. Maybe some better examples would help? Or at least the map for your example Activity class.

The error message is not a valid CSV file. It is just an unpredictable phrase saying the file could not be generated. One key purpose of CsvHelper to me is to validate that columns are present in the CSV file, the same way it validates a record is valid and parseable (when there is a record). As Arithmomaniac suggested it seems just a matter of checking for this basic consistency when reading the header. I'd be happy to submit a PR but I am not very familiar with the library's inner working to ensure optimal speed.

If you could just give a description of the logic that is needed to figure out of a file is a valid CSV file, that would be good enough.

The problem I'm seeing is that a file with only a single character or a full error message in it, is a valid CSV file. It would be a CSV with a single header field, and no rows (assuming you have headers turned on). That is why it doesn't fail. If you have to rows of garbage, you'll then get the CsvMissingFieldException throw. You need to actually have data for that to happen.

I don't really say a way to validate what is considered a valid CSV file.

In version 3.0 an exception could possibly be thrown after ReadHeader is called and the file has a header that isn't a part of the mapping, or the file is missing a header that is in the mapping.

You are completely right that the file is indeed valid CSV. However, when using a mapping class, the file should be considered invalid if any field in the mapping is not present as a column name in the Header. The same way, an exception is already thrown if any field in the mapping is not present in a Record.

Arithmomaniac's suggestion captures this, but it may not be the fastest implementation or the one that takes advantage of operations that may have already been done (I would expect the MyType reflection to be done only once as it is then also used when parsing records).

Baleeted

Yeah. Trying to fix it right now ;)

Had to make a new one. Doesn't let you edit an email comment with markdown/images. Woops!

Here's a LINQPad snippet to replicate this behavior on V2. We can see that
there is no exception thrown and there is an empty list to come back. So
yeah, it would seem there needs to be some kind of validation logic in
place when reading the headers, either via the incoming ReadHeader method,
or via the bulk GetRecords, as I am doing below.

void Main()
{
    using (var csv = new CsvReader(new StringReader("Something terrible
happened"),
    new CsvConfiguration
    {
        ThrowOnBadData = true,
        Delimiter = "|",
        ShouldSkipRecord = cols => string.IsNullOrWhiteSpace(string.Join("", cols)),
        HasHeaderRecord = true}
    ))
    {
        csv.Configuration.RegisterClassMap<aMap>();
        csv.GetRecords<a>().ToList().Dump();
    }
}

public class a
{
    public string b { get; set; }
    public string c { get; set; }
}

public class aMap : CsvClassMap<a>
{
    public aMap()
    {
        Map(a=>a.b).Name("b");
        Map(a=>a.c).Name("c");
    }
}

image

There are some configuration and exceptions that occur when actually getting the row, so this would need to happen in ReadHeaders. There is already a list of the header names to match against, so there is no need to loop through the types property names as in @Arithmomaniac 's example

There will probably be another flag to turn this check on/off.

Definitely would need a flag to turn it on or off...we have a lot of
high-frequency high-reliability CSV files we parse that sometimes just come
completely blank, which is totally fine and indicates no output for that
period of time. I'm sure others are in the same scenario.

On Thu, Dec 15, 2016 at 3:54 PM, Josh Close notifications@github.com
wrote:

There are some configuration and exceptions that occur when actually
getting the row, so this would need to happen in ReadHeaders. There is
already a list of the header names to match against, so there is no need to
loop through the types property names as in @Arithmomaniac
https://github.com/Arithmomaniac 's example

There will probably be another flag to turn this check on/off.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/JoshClose/CsvHelper/issues/601#issuecomment-267441313,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_ohWTBW74t3kYVOTS6Y-_FLC3j-_ytks5rIakZgaJpZM4KrU7v
.

What is the logic for throwing the exception here? Does it only throw if no headers match?

I would say that if HasHeaderRecord = true and ThrowOnBadData = true, and any field defined in the classMap is not present in the columns (such as if the file is blank), one should reasonably expect CsvHelper to throw an exception. These two flags already have a strong meaning.

Adding a new flag (what would be its name?) with a default value set to not break existing use cases relying on this problematic behavior would break the principle of least surprise (a new user specifying HasHeaderRecord = true and ThrowOnBadData = true probably does not expect a blank file to be fine)...
I would suggest adding a ThrowOnBadHeader = true flag if needed; code from jamesbascle would need to set this flag to false to keep working. Not sure if there are many use cases beyond the one presented by jamesbascle (empty file) that would benefit from this flag however, so it would be good to give it more thought to see if another kind of flag scope would not be more appropriate.

I can't just validate against the mapping file, because you can have multiple mapping files registered. This means I could only automatically do header validation against mapped properties on GetRecords<T>. Is this the only scenario where people want this feature?

The definition of what is bad here is different for different people. Some are blank files, which is ok for others, sometimes there is an error code, which others don't have. It might be best to have a Action<ICsvReaderContext> Configuration.ValidateHeader that can be set. This would allow for that. With the way things are setup now though, it would be just as easy to do this:

csv.Read();
csv.ReadHeader();
Validate(csv.Context);
var records = csv.GetRecords<MyType>();

I really think properly-formatted CSV files should either always have headers even if there are no rows, or never have headers. So I am not sure why enforcing HasHeaderRecord = true, and if necessary adding a ThrowOnBadHeader = true or ValidateHeader = true flag would not be the way to go, encouraging the use of properly formatted files (respecting the principle of least surprise) while still allowing dirtier use cases. It would be much simpler than having to call a ValidateHeader function for all files before reading them.

Are you ok with that only working on a call to GetRecords<T>() then?

Yes, that would be fine by me. But shouldn't calls to Read(), ReadHeader() and GetRecord<T> also check for the presence of the correct headers (ReadHeader() may have to throw an InvalidOperationException if HasHeaderRecord = false)?

The only way to check for a correct header is to have the mapping to check against. There are a list of mappings and at that point you don't know which mapping is being used. Only if the type T is specified is that possible.

True, so indeed only GetRecord<T> would be relevant.

Done. ccefc1d7493fa41a23dffc67fe94df660b3187b9

There are 2 methods. Validate<T>() and Validate( Type type ). You can call this manually if you want, or calling any of the GetRecord and GetRecords methods will also call it. For the Get methods to work, you need to set Configuration.ThrowOnBadHeader = true.

Great, thanks. In a future major release, it may be good to decide if the default value for ThrowOnBadHeader should not be set to true, to not break the principle of least surprise (I was personally surprised that CsvHelper didn't enforce the header mapping by default; not sure if it is just me or it is indeed what normal people would expect). Of course this should then be documented in the release notes so people relying on the current behavior like @jamesbascle could set the flag to false as part of the migration to the newer version.

What about when auto mapping? Maybe if i just put an explanation in the exception that it's a setting you can turn off, then it won't be as big of a deal.

I am all in for putting an explanation in the exception that it's a setting you can turn off.
Not sure about auto-mapping, I have never used it myself. But I suspect it should apply as well (my understanding is that auto-mapping generates a classmap from a custom class, so the header should be checked against it as with any other classmap).

Auto mapping is creating a default class map when one isn't supplied. If you don't register a map and you get records, a map is created for you. The validation happens here then also. I can see why someone would want to always know when a header value is missing. I think putting in a nice long message in the exception explaining what's going on will work, and this can be on by default.

Sounds good to me. Thanks for making CsvHelper so great!

Was this page helpful?
0 / 5 - 0 ratings