Pandas: read_sql should convert Numeric to decimals

Created on 28 Mar 2018 · 4Comments · Source: pandas-dev/pandas

pd.read_sql converts columns of type sqlalchemy.Numeric to floats as the below selection appears to suggest. This seems counter intuitive since in the NUMERIC and DECIMAL types are interchangeable. Also the NUMERIC type has an option asdecimal that defaults to true (http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.Numeric). I would expect that sqlalchemy.Numeric types get converted to decimal.Decimal.

https://github.com/pandas-dev/pandas/blob/master/pandas/io/sql.py#L1187-L1190

Dtypes Enhancement IO SQL

Source

postelrich

Most helpful comment

There is a very old, closed issue that raises some good points about dtypes:
https://github.com/pandas-dev/pandas/issues/6798

If you allow read_sql a parameter for specifying the dtypes of the resulting table then that could help people a lot.

Alternatively, add an example to the documentation where
read_sql has a chunk_size
the first chunk is read into a dataframe
the dtypes are specified
the rest of the data is added

It would very useful for large tables. Casting to categories and smaller bit dtypes saves so much memory which in turn decreases data processing run-time significantly.

grofte on 30 Jan 2019

👍2

All 4 comments

coerce_float=False skips the conversion so maybe a non-issue.

postelrich on 28 Mar 2018

👍2

decimal.Decimal is not a first class type, so we generally don't do things like this. But I suppose it could be done if a SQL dtype is unambiguously a decimal type.

jreback on 29 Mar 2018

For the reason that @jreback mentions (decimals cannot be stored natively in a pandas dataframe, only as object dtype), I don't think we want to change this by default. But, it could be a good idea to at least enable the user the specify explicitly they want to keep the decimal objects. That would be a welcome contribution I think.

jorisvandenbossche on 29 Mar 2018

There is a very old, closed issue that raises some good points about dtypes:
https://github.com/pandas-dev/pandas/issues/6798

If you allow read_sql a parameter for specifying the dtypes of the resulting table then that could help people a lot.

Alternatively, add an example to the documentation where
read_sql has a chunk_size
the first chunk is read into a dataframe
the dtypes are specified
the rest of the data is added

It would very useful for large tables. Casting to categories and smaller bit dtypes saves so much memory which in turn decreases data processing run-time significantly.

grofte on 30 Jan 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings