Pandas: `to_sql` dies on columns with closed parens

Created on 9 Nov 2014  路  11Comments  路  Source: pandas-dev/pandas

In [3]: pd.show_versions()

INSTALLED VERSIONS

commit: 9a66dbb
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.0-62-g9a66dbb
nose: None
Cython: 0.20.1post0
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: None
patsy: None
dateutil: 2.2
pytz: 2014.7
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)

Bug IO SQL

Most helpful comment

.key won't work right now in every case. passing "paramstyle='format'" to create_engine() for Postgresql engines will do it.

All 11 comments

Here's an example stack trace when trying to insert data from a data frame with a column named CT Volume (cc):

Traceback (most recent call last):
  File "infer_schemas.py", line 21, in <module>
    df.to_sql(table, engine, index=False)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/generic.py", line 956, in to_sql
    index=index, index_label=index_label, chunksize=chunksize)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/sql.py", line 534, in to_sql
    chunksize=chunksize)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/sql.py", line 1134, in to_sql
    table.insert(chunksize)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/sql.py", line 712, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/sql.py", line 687, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 1162, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/util/compat.py", line 182, in reraise
    raise value
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 928, in _execute_context
    context)
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 433, in do_executemany
    cursor.executemany(statement, parameters)
KeyError: 'CT Volume (cc'

@hammer Thanks for the report. Can you give a bit more details? What database flavor? Small example?

A small test with sqlite works for me on python 2:

In [6]: engine = create_engine('sqlite:///:memory:')

In [7]: df = pd.DataFrame({'CT Volume (cc)':[1,2,3]})

In [8]: df
Out[8]:
    CT Volume (cc)
0   1
1   2
2   3

In [9]: df.to_sql('test_parantheses', engine)

In [10]: pd.read_sql_table('test_parantheses', engine)
Out[10]:
    index   CT Volume (cc)
0   0   1
1   1   2
2   2   3

I'm using the postgres engine

OK, can reproduce it with postgres

Hmm, I am thinking this is possibly a sqlalchemy bug, as until the latest step in pandas in the stacktrace, everything seems fine.

If I print data, self.insert_statement in before execute (https://github.com/pydata/pandas/blob/v0.15.1/pandas/io/sql.py#L687), I get:

[{'index': 0L, 'CT Volume (cc)': 1L}, {'index': 1L, 'CT Volume (cc)': 2L}, {'index': 2L, 'CT Volume (cc)': 3L}]
INSERT INTO test_parantheses (index, "CT Volume (cc)") VALUES (%(index)s, %(CT Volume (cc))s)

so that looks all OK. The table is also correctly created in postgres.

I reported it at sqlalchemy: https://bitbucket.org/zzzeek/sqlalchemy/issue/3245/postgres-insert-into-a-table-whith-column

So closing this, as for now, I think this is not a bug in pandas. But thanks for the report!

OK, reopening, as we _can_ fix this from the pandas side if we want by using 'key's (see detailed response at https://bitbucket.org/zzzeek/sqlalchemy/issue/3245/postgres-insert-into-a-table-whith-column)

.key won't work right now in every case. passing "paramstyle='format'" to create_engine() for Postgresql engines will do it.

Is this fixable from the pandas side?

hey wes -

So the approach taken on my end in https://bitbucket.org/zzzeek/sqlalchemy/issues/3245/postgres-insert-into-a-table-whith-column now works, so you'd use .key on the Column so that ends up being the parameter name and no problem:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    ct_volume = Column("CT Volume (cc)", Integer, key="ct_volume")

e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

s = Session(e)
s.add(A(ct_volume=12))
s.commit()

print(s.query(A.ct_volume).scalar())

so... that will work, it is probably possible in theory for SQLAlchemy to even do something like that automatically as the Oracle dialect does some translation of parameter names too for weird chars.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments

ebran picture ebran  路  3Comments

scls19fr picture scls19fr  路  3Comments

marcelnem picture marcelnem  路  3Comments

nathanielatom picture nathanielatom  路  3Comments