Pandas: read_csv, integer dtype and empty cells

Created on 3 Jan 2013  路  4Comments  路  Source: pandas-dev/pandas

Reading in a csv file with an integer column which has empty cells will cast that column to float (which in the end will resulted in problems with merging this dataframe on that column with a dataframe where the corresponding column is int).

It would be nice if a warning could be printed when such conversation (maybe only when an explicit dtype={"col":np.int64} setting is passed to read_csv) takes place and optional let me specify that such rows should be droped (isn't there a NA value for int columns...?)

data = """YEAR, DOY, a
2001,106380451,10
2001,,11
2001,106380451,67"""
import numpy as np
f = pandas.read_csv(StringIO(data), sep=",", dtype={'DOY': np.int64})
f.dtypes
YEAR      int64
 DOY    float64
 a        int64
Enhancement IO Data

Most helpful comment

Now that we have nullable integers since 0.24.0, wouldn't it be a good idea to add a parameter to read_csv like 'use_nullable_ints' to enable inference of Int64 columns?

All 4 comments

There is no integer NA values unfortunately. I plan to fix this (a big project-- requires circumventing NumPy probably) one of these days

I don't mind that it is not possible (yet) but that read_csv changed the datatype even as I specified it and didn't say anything (throw exception or print warning).

pandas/src/pasrer.pyx has commented out exception throwing in line 900, which seems to do what I expected...?

Would it be posible to add a param to specify a strategy (drop row, throw exception, cast to float) what should happen with such cases? I tried to understand the code and it seems that it operates on columns, so dropping rows if an int is NA seems not an easy option :-(

Done. Thanks for the suggestion; I agree raising the exception is the right move. in your example note you need to pass skipinitialspace=True

Now that we have nullable integers since 0.24.0, wouldn't it be a good idea to add a parameter to read_csv like 'use_nullable_ints' to enable inference of Int64 columns?

Was this page helpful?
0 / 5 - 0 ratings