Hi,
to improve the usability of the python query runner I created two python functions to convert the result (of execute_query and get_query_result) to a pandas.DataFrame (incl. type conversions) and a DataFrame to result.
Both functions only iterate the columns (not the rows) and convert on pd.Series - so they should be pretty fast. They should also handle all possible redash-types and pandas-dtypes (incl. np.nan, datetime and date).
I am pretty sure that other people might find this useful, so my questions are:
I have two ideas in mind:
execute_query and get_query_result to directly return DataFrames)If you find it worth adding, I will create a PR.
Here are the two functions:
def df_to_redash(df_orig, index_to_col=False):
import numpy as np
df = df_orig.copy()
if index_to_col:
df.reset_index(inplace=True)
result = {'columns': [], 'rows': []}
conversions = [
{'pandas_type': np.integer, 'redash_type': 'integer',},
{'pandas_type': np.inexact, 'redash_type': 'float',},
{'pandas_type': np.datetime64, 'redash_type': 'datetime', 'to_redash': lambda x: x.strftime('%Y-%m-%d %H:%M:%S')},
{'pandas_type': np.bool_, 'redash_type': 'boolean'},
{'pandas_type': np.object, 'redash_type': 'string'}
]
labels = []
for dtype, label in zip(df.dtypes, df.columns):
for conversion in conversions:
if issubclass(dtype.type, conversion['pandas_type']):
result['columns'].append({'name': label, 'friendly_name': label, 'type': conversion['redash_type']})
labels.append(label)
func = conversion.get('to_redash')
if func:
df[label] = df[label].apply(func)
break
result['rows'] = df[labels].replace({np.nan: None}).to_dict(orient='records')
return result
def redash_to_df(result, col_to_index=False):
import pandas as pd
conversions = [
{'redash_type': 'datetime', 'to_pandas': lambda x: pd.to_datetime(x, infer_datetime_format=True)},
{'redash_type': 'date', 'to_pandas': lambda x: pd.to_datetime(x, infer_datetime_format=True)},
]
df = pd.DataFrame.from_dict(result['rows'], orient='columns')
labels = []
for column in result['columns']:
label = column['name']
labels.append(label)
for conversion in conversions:
if conversion['redash_type'] == column['type']:
func = conversion.get('to_pandas')
if func:
df[label] = df[label].apply(func)
break
df = df[labels]
if col_to_index and labels:
df.set_index(labels[0], inplace=True)
return df
Requirements:
Looks useful!
How about you create a Python (PyPi) package with these utilities (and maybe others in the future)? Then whoever wants to use it, can install them and enable in the data source settings.
Of course we can add the relevant documentation on how to install and enable this.
OK, I will do that and keep this thread updated.
Thanks again, @dersphere. I think the forum is a better venue for this, so I posted it there: https://discuss.redash.io/t/converting-redash-query-results-to-pandas-dataframe-and-vice-versa/3015 (if you create a user there, I will transfer the post ownership to you).
This solved my pain in using redash perfectly, thank you!
Most helpful comment
OK, I will do that and keep this thread updated.