Incubator-superset: Oracle backend: pandas Exception : KeyError

Created on 16 Aug 2016 · 30Comments · Source: apache/incubator-superset

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/views.py", line 827, in explore
    payload = obj.get_json()
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/viz.py", line 294, in get_json
    'data': self.get_data(),
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/viz.py", line 479, in get_data
    return self.get_df().to_html(
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/viz.py", line 474, in get_df
    margins=True,
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/tools/pivot.py", line 120, in pivot_table
    grouped = data.groupby(keys)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/generic.py", line 3778, in groupby
    **kwargs)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/groupby.py", line 1427, in groupby
    return klass(obj, by, **kwds)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/groupby.py", line 354, in __init__
    mutated=self.mutated)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/groupby.py", line 2383, in _get_grouper
    in_axis, name, gpr = True, gpr, obj[gpr]
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get
    loc = self.items.get_loc(item)
  File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
  File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
  File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
  File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'agent_name'

#bug inactive

Source

PickLiDan

Most helpful comment

I just met exactly the same bug.
My case is I'm connectting caravel AKA superset to Apache Kylin and use pykylin as sqlachemy dialect.
Dimensions generated by Kylin are all uppercase and columns generated by caravel are lowercase. Thus when I use pivot table or any other charts the bug appears.
Screenshot

And it's weird when I uppercase all columns and save, the lowercase columns appear again. It seems pykylin sync from kylin

auroralinan on 28 Dec 2016

👍3

All 30 comments

My caravel version is : 0.10.0
Download url is : https://github.com/airbnb/caravel/tarball/0.10.0

piovt table:
When I was chosen 『 Group by 』、『Columns』、
『Metrics』、『Aggregation function』
then Tap enter button

PickLiDan on 16 Aug 2016

subjet: # column_name ---> upper(column_name) cause exceptions

1.# test.py

!/usr/bin/python

import pandas as pd
import numpy as np

d={}
d['a']=['1','2']
d['b']=['3','4']
d['c']=['2','2']

df = pd.DataFrame(d)

print('--------- 1 ----------')
print(df.pivot_table( values='c', index=['a'],columns=['b'], aggfunc=np.sum,margins=True))

print('--------- 2 ----------')
print(pd.pivot_table(df, values='c', index=['a'],columns=['b'], aggfunc=np.sum,margins=True))

print('--------- 3 ----------')
print(df.pivot_table( values='c', index=['a'],columns=['B'], aggfunc=np.sum,margins=True))

print('--------- 4 ----------')
print(pd.pivot_table(df, values='c', index=['a'],columns=['B'], aggfunc=np.sum,margins=True))

# change column_name 'b' to 'B' then run test.py

--------- 1 ----------
b 3 4 All
a
1 2 None 2
2 None 2 2
All 2 2 22
--------- 2 ----------
b 3 4 All
a
1 2 None 2
2 None 2 2
All 2 2 22
--------- 3 ----------
Traceback (most recent call last):
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/indexes/base.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'a'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/test/a.py", line 19, in
print(df.pivot_table( values='c', index=['a'],columns=['B'], aggfunc=np.sum,margins=True))
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/tools/pivot.py", line 158, in pivot_table
margins_name=margins_name)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/tools/pivot.py", line 205, in _add_margins
margins_name)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/tools/pivot.py", line 283, in _generate_marginal_results
piece[all_key] = margin[key]
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'a'

PickLiDan on 18 Aug 2016

@PickLiDan that may or may be not be the cause :) To find out, just print the keys available in the dataframe so we can't see what's wrong here.

xrmx on 18 Aug 2016

@xrmx # i change my column to upper case,then i work well,so i am sure.

_lower case :_

2016-08-18 20:08:14,614:INFO:root: DT COUNT_DISTINCT__ORDER_ID
0 2016-08-13 930613
1 2016-08-14 877308
2 2016-08-12 791172
3 2016-08-15 630725
4 2016-08-16 4595
2016-08-18 20:08:14,616:INFO:root:Serving from groupby:dt
2016-08-18 20:08:14,616:INFO:root:Serving from columns:
2016-08-18 20:08:14,616:INFO:root:Serving from metrics:COUNT_DISTINCT__ORDER_ID
2016-08-18 20:08:14,616:INFO:root:Serving from pandas_aggfunc : s,u,m
2016-08-18 20:08:14,617:ERROR:root:'dt'
Traceback (most recent call last):
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/indexes/base.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'dt'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/views.py", line 923, in explore
payload = obj.get_json()
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/viz.py", line 299, in get_json
'data': self.get_data(),
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/viz.py", line 500, in get_data
return self.get_df().to_html(
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/caravel/viz.py", line 495, in get_df
margins=True
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/tools/pivot.py", line 120, in pivot_table
grouped = data.groupby(keys)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/generic.py", line 3778, in groupby
*_kwargs)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/groupby.py", line 1427, in groupby
return klass(obj, by, *_kwds)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/groupby.py", line 354, in init
mutated=self.mutated)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/groupby.py", line 2383, in _get_grouper
in_axis, name, gpr = True, gpr, obj[gpr]
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/usr/local/Python-3.5.2/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'dt'

_upper case :_

2016-08-18 20:03:51,109:INFO:root: DT COUNT_DISTINCT__ORDER_ID
0 2016-08-13 930613
1 2016-08-14 877308
2 2016-08-12 791172
3 2016-08-11 712151
4 2016-08-15 630725
5 2016-08-16 4595
2016-08-18 20:03:51,111:INFO:root:Serving from groupby:DT
2016-08-18 20:03:51,111:INFO:root:Serving from columns:
2016-08-18 20:03:51,111:INFO:root:Serving from metrics:COUNT_DISTINCT__ORDER_ID
2016-08-18 20:03:51,111:INFO:root:Serving from pandas_aggfunc : s,u,m
2016-08-18 20:03:51,132:INFO:root:Caching for the next None seconds

PickLiDan on 18 Aug 2016

@PickLiDan so to sum up, you have a field in your table called "DT" all uppercase, but on the select widget of group by you have something that has value "dt" lowercase?

xrmx on 18 Aug 2016

yeah .
I suggest there should be a set, you can set case-insensitive。
My English is poor ，sorry.

PickLiDan on 19 Aug 2016

I don't see where we could be doing that in the code. Are you sure you just don't have a column in your table model with the wrong name?

xrmx on 23 Aug 2016

@PickLiDan - could you please provide the screenshots ?

bkyryliuk on 22 Sep 2016

@PickLiDan - feel free to reopen is the issue persists. I cannot reproduce it.

bkyryliuk on 3 Oct 2016

And it's weird when I uppercase all columns and save, the lowercase columns appear again. It seems pykylin sync from kylin

auroralinan on 28 Dec 2016

👍3

Hi,

I think I have a similar error.

SELECT date_format( created_at,'%%d-%%m-%%Y')as DT,count(*) as CT
FROM user
group by DT.

This the query I ran in SQL lab. Got the data accordingly. But when I try to visualize it in Time series line chart,
screen shot 2017-02-13 at 3 57 50 pm

I get the following error in logs.

2017-02-13 15:55:10,887:ERROR:root:'CT'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/pandas/indexes/base.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'CT'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/superset/views.py", line 1406, in explore_json
payload = viz_obj.get_payload()
File "/usr/local/lib/python3.6/site-packages/superset/viz.py", line 360, in get_payload
data = self.get_data()
File "/usr/local/lib/python3.6/site-packages/superset/viz.py", line 1590, in get_data
def get_data(self):
File "/usr/local/lib/python3.6/site-packages/superset/viz.py", line 1575, in get_df
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3778, in groupby
*kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas/core/groupby.py", line 1427, in groupby
return klass(obj, by, *kwds)
File "/usr/local/lib/python3.6/site-packages/pandas/core/groupby.py", line 354, in __init__
mutated=self.mutated)
File "/usr/local/lib/python3.6/site-packages/pandas/core/groupby.py", line 2383, in _get_grouper
in_axis, name, gpr = True, gpr, obj[gpr]
File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python3.6/site-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'CT'

This is what is being displayed on screen.
screen shot 2017-02-13 at 3 58 56 pm

pardhasm on 13 Feb 2017

Column names become lower case regardless of forcing it to upper case in query. Oracle SQL database is used. If I make column names lowercase, then after selecting things in 'group by', the additional sql surround the original sql will not recognize that lowercase column name.
There is no such problem using MySQL database.

yileic on 27 Apr 2017

Has anyone found a work-around for this yet? This is a blocker for me applying it at work because we have to deal with an Oracle DB.

TryHarder01 on 28 Aug 2017

@TryHarder01 my current workaround is editing the column name in the UI after the table is created.

yileic on 28 Aug 2017

@TryHarder01 @yileic I tried casting the column names to upper case and lower case in the UI but it still has the 'KeyError: [column_name]' issue

The error doesn't come up when I choose 'Table View' but when I change the visualization type to any other view, it doesn't recognize the key

nikitathampi on 28 Aug 2017

Are all the ongoing issues here happening on an Oracle db?

mistercrunch on 29 Aug 2017

@mistercrunch Yes, I haven't had any issues on MySQL. Its just been Oracle so far.

nikitathampi on 29 Aug 2017

Would it be helpful to close and reopen as a bug specifically with Oracle DB?

TryHarder01 on 29 Aug 2017

I renamed this issue to make it specific to Oracle. So it appeared to me like the SQLAlchemy dialect for Oracle returns lower case but then Oracle wants upper case at query time?

Can someone with an Oracle database laying around confirm that the issue is with the SQLALchemy driver returning lower case column names?

mistercrunch on 30 Aug 2017

This seems very relevant:
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/dialects/oracle/base.py#L56

Damn Oracle! and don't get me started on the the 30 character limit on object names. Fully committed to bad decisions made in the 80s.

mistercrunch on 30 Aug 2017

😄1

@nikitathampi works with me. We can help test out some things w/ Oracle at the office tomorrow if it's helpful to you folks.

If so, help me scope out exactly what you want to see as far as testing.

TryHarder01 on 30 Aug 2017

@mistercrunch well that's fun wrinkle in their syntax.

TryHarder01 on 30 Aug 2017

Perhaps it might be resolved by enclosing the name of the item containing uppercase and lowercase letters in double quotes in the column name definition.

yusecase on 29 Sep 2017

Adding more context on what I've found out.
Oracle DBs / SQLAlch will return a lower case column EXCEPT if the column has mixed case.

under the hood Superset renames the "min" , "max", etc.... metrics calls by querying the data and renaming the columns min(XXXX) as min__XXXX

I traced this all the way up to the models.core.Database.get_df()

One idea I've thought about is all the calculated metrics could also be in all caps.

For example, min(x) as MIN__X and then we could have .upper() to make sure everything is upper.

Right now, we'll need to apply something to make sure these calc'd metrics aren't made upper and that any future metrics don't become upper, but yet we also catch the names of any other columns from the database.

This seems like a dangerous tightrope to walk.

screen shot 2017-11-06 at 12 57 27 pm

TryHarder01 on 6 Nov 2017

@TryHarder01 could you please reference a line in the file where you are adding these prints?

xrmx on 6 Nov 2017

@xrmx , sure. But I feel good about a solution I just tested in models.py

I'll be sending up a PR in a few hours.

starting on line 615 (aprox) of core.py

    def get_df(self, sql, schema):

        sql = sql.strip().strip(';')
        eng = self.get_sqla_engine(schema=schema)
        df = pd.read_sql(sql, eng)

        print('\n\n------->>>>>>>')
        print('\nget_df in core.py\n')
        print('\n\nPARAM: sql')
        print(sql)
        #print('\nPARAM: eng\n')
        #print(eng)
        print('\n\nPARAM: schema\n')
        print(schema)
        print('\n\npd.read_sql(sql,eng)\n')
        print(df)
        print('\n\n<<<<<-------')

TryHarder01 on 6 Nov 2017

On the front-end, notice how the capitalization is different depending on if a PostGres or an Oracle DB is queried.

Oracle DB

columns come back in all caps

screen shot 2017-11-06 at 4 08 40 pm

PostGres DB

columns come back in all lower case

screen shot 2017-11-06 at 4 08 31 pm

TryHarder01 on 6 Nov 2017

👍1

Could you please check your user tablespace please? https://github.com/pandas-dev/pandas/issues/8820#issuecomment-63644241

xrmx on 6 Nov 2017

@xrmx I'm going to be connecting multiple Oracle DBs, but here's the tablespace for one that I'm testing.

For all Oracle DBs I'll be connecting I won't have any admin authority over them to change any of the permission structure.

screen shot 2017-11-06 at 5 31 55 pm

TryHarder01 on 6 Nov 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.