Redash: conversion pandas.DataFrame to result and vice versa in python query (code attached)

Created on 7 Nov 2017 · 4Comments · Source: getredash/redash

Hi,

to improve the usability of the python query runner I created two python functions to convert the result (of execute_query and get_query_result) to a pandas.DataFrame (incl. type conversions) and a DataFrame to result.
Both functions only iterate the columns (not the rows) and convert on pd.Series - so they should be pretty fast. They should also handle all possible redash-types and pandas-dtypes (incl. np.nan, datetime and date).

I am pretty sure that other people might find this useful, so my questions are:

Is it worth adding?
How is the best way to integrate it with redash?

I have two ideas in mind:

add both methods as staticmethods to the current python query runner, add numpy and pandas to requirements and improve the docs
add a new "Python Pandas" Query runner for this purpose (and alter execute_query and get_query_result to directly return DataFrames)

If you find it worth adding, I will create a PR.

Here are the two functions:

def df_to_redash(df_orig, index_to_col=False):
    import numpy as np
    df = df_orig.copy()
    if index_to_col:
        df.reset_index(inplace=True)
    result = {'columns': [], 'rows': []}
    conversions = [
        {'pandas_type': np.integer, 'redash_type': 'integer',},
        {'pandas_type': np.inexact, 'redash_type': 'float',},
        {'pandas_type': np.datetime64, 'redash_type': 'datetime', 'to_redash': lambda x: x.strftime('%Y-%m-%d %H:%M:%S')},
        {'pandas_type': np.bool_, 'redash_type': 'boolean'},
        {'pandas_type': np.object, 'redash_type': 'string'}
    ]
    labels = []
    for dtype, label in zip(df.dtypes, df.columns):
        for conversion in conversions:
            if issubclass(dtype.type, conversion['pandas_type']):
                result['columns'].append({'name': label, 'friendly_name': label, 'type': conversion['redash_type']})
                labels.append(label)
                func = conversion.get('to_redash')
                if func:
                    df[label] = df[label].apply(func)
                break
    result['rows'] = df[labels].replace({np.nan: None}).to_dict(orient='records')
    return result

def redash_to_df(result, col_to_index=False):
    import pandas as pd
    conversions = [
        {'redash_type': 'datetime', 'to_pandas': lambda x: pd.to_datetime(x, infer_datetime_format=True)},
        {'redash_type': 'date', 'to_pandas': lambda x: pd.to_datetime(x, infer_datetime_format=True)},
    ]
    df = pd.DataFrame.from_dict(result['rows'], orient='columns')
    labels = []
    for column in result['columns']:
        label = column['name']
        labels.append(label)
        for conversion in conversions:
            if conversion['redash_type'] == column['type']:
                func = conversion.get('to_pandas')
                if func:
                    df[label] = df[label].apply(func)
                break
    df = df[labels]
    if col_to_index and labels:
        df.set_index(labels[0], inplace=True)
    return df

Requirements:

Python query runner enabled
pandas and numpy installed (global or virtualenv) and enabled (in datasource settings)

Source

dersphere

👍4

Most helpful comment

OK, I will do that and keep this thread updated.

dersphere on 14 Nov 2017

👍6

All 4 comments

Looks useful!

How about you create a Python (PyPi) package with these utilities (and maybe others in the future)? Then whoever wants to use it, can install them and enable in the data source settings.

Of course we can add the relevant documentation on how to install and enable this.

arikfr on 14 Nov 2017

OK, I will do that and keep this thread updated.

dersphere on 14 Nov 2017

👍6

Thanks again, @dersphere. I think the forum is a better venue for this, so I posted it there: https://discuss.redash.io/t/converting-redash-query-results-to-pandas-dataframe-and-vice-versa/3015 (if you create a user there, I will transfer the post ownership to you).

arikfr on 15 Jan 2019

This solved my pain in using redash perfectly, thank you!