Pandas: ENH: Construct pandas dataframe from function

Created on 30 Apr 2020  路  10Comments  路  Source: pandas-dev/pandas

It would be great if we could construct a pd.DataFrame from a function.

Describe the solution you'd like

import pandas as pd
import numpy as np

def my_random():
    return np.random.rand() + 1

df = pd.DataFrame(my_random, index=range(10), columns=range(15))

The output would be equivalent to

col_len = 10
index_len = 15
data = np.array([[my_random() for i in range(col_len)] for j in range(index_len)])
df = pd.DataFrame(data, index=range(index_len), columns=range(columns_len))

Alternatively, we could do the same with something like

df = pd.DataFrame.from_function(my_random, *args, **kwargs)

args and kwargs are passed to the df constructor.

EDIT

I just realized I can easily obtain what I want with

df = pd.DataFrame(index=range(10), columns=range(15)).applymap(
    lambda x: my_random()
)

It's quite compact and easy to read. It's slowish but I don't expect it to be used in high performance tasks.

API Design Enhancement

All 10 comments

Thanks @giuliobeseghi for the suggestion

TBH I don't see how the example you've given is different to initalising a DataFrame with a numpy array. In general, I don't see this could work with arbitrary functions. Could you please provide some more details?

How is this an improvement over having the function return a DataFrame? I don't see why it would need to be a classmethod.

Sorry, I haven't explained myself well. I'll correct the original post.

The idea is that the function is applied to every "cell" of the dataframe. Something like an applymap but used when building the df. The advantage is that we don't have to go through the numpy array, and we don't have to know the shape of the dataframe first.

How is this an improvement over having the function return a DataFrame? I don't see why it would need to be a classmethod.

My example was just to explain that it should work with a general function. The main reason why I proposed it is that I'd like to create dataframes like this:

df = pd.DataFrame(np.random.rand, index=range(10), columns=range(15))

using a function to create each value of a DataFrame. It looks clean and easy to read, but if you don't think it's an improvement we can forget about it. I thought about it because many pandas methods accept functions as arguments (I'm thinking of lambdas with loc, assign and rename for example).

For your particular use-case, can't you just pass the dimensions directly to random.rand?

pd.DataFrame(np.random.rand(15, 10)+1)

EDIT

OK, I see what you mean now, you'd want to be able to do this even if you have some arbitrary function which doesn't take dimension arguments

For your particular use-case, can't you just pass the dimensions directly to random.rand?

pd.DataFrame(np.random.rand(15, 10)+1)

EDIT

OK, I see what you mean now, you'd want to be able to do this even if you have some arbitrary function which doesn't take dimension arguments

Yeah, imagine for example that the index is pd.date_range('2020', '2021', freq='45T').

I see that the use case for my request is quite specific, implementing makes sense just if it's easy to carry out. It's more about style than performance.

implementing makes sense just if it's easy to carry out

That, and if it's worth maintaining + expanding the (already huge) API. I may be wrong, but I'm 99% certain the core team would be -1 on this

-1 on this extension, doesn't provide any new capability and adds to api bloat.

It hurts... I'm joking, I understand :) I gave it a try though!

It's OK :) There's lot's of other issues open if you want to contribute, see here for how to get started

Was this page helpful?
0 / 5 - 0 ratings