pd.read_sql converts columns of type sqlalchemy.Numeric to floats as the below selection appears to suggest. This seems counter intuitive since in the NUMERIC and DECIMAL types are interchangeable. Also the NUMERIC type has an option asdecimal that defaults to true (http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.Numeric). I would expect that sqlalchemy.Numeric types get converted to decimal.Decimal.
https://github.com/pandas-dev/pandas/blob/master/pandas/io/sql.py#L1187-L1190
coerce_float=False skips the conversion so maybe a non-issue.
decimal.Decimal is not a first class type, so we generally don't do things like this. But I suppose it could be done if a SQL dtype is unambiguously a decimal type.
For the reason that @jreback mentions (decimals cannot be stored natively in a pandas dataframe, only as object dtype), I don't think we want to change this by default. But, it could be a good idea to at least enable the user the specify explicitly they want to keep the decimal objects. That would be a welcome contribution I think.
There is a very old, closed issue that raises some good points about dtypes:
https://github.com/pandas-dev/pandas/issues/6798
If you allow read_sql a parameter for specifying the dtypes of the resulting table then that could help people a lot.
Alternatively, add an example to the documentation where
read_sql has a chunk_size
the first chunk is read into a dataframe
the dtypes are specified
the rest of the data is added
It would very useful for large tables. Casting to categories and smaller bit dtypes saves so much memory which in turn decreases data processing run-time significantly.
Most helpful comment
There is a very old, closed issue that raises some good points about dtypes:
https://github.com/pandas-dev/pandas/issues/6798
If you allow read_sql a parameter for specifying the dtypes of the resulting table then that could help people a lot.
Alternatively, add an example to the documentation where
read_sql has a chunk_size
the first chunk is read into a dataframe
the dtypes are specified
the rest of the data is added
It would very useful for large tables. Casting to categories and smaller bit dtypes saves so much memory which in turn decreases data processing run-time significantly.