Pandas: DEPR: Remove pandas.np

Created on 17 Dec 2019  路  20Comments  路  Source: pandas-dev/pandas

Not sure if it was added intentionally, but it's possible to call numpy with the np attribute of the pandas module:

import pandas
x = pandas.np.array([1, 2, 3])

While this is not documented, I've seen couple of places suggesting this as a "trick" to avoid importing numpy directly.

I personally find this hacky, and I think should be removed.

API Design Deprecate good first issue

Most helpful comment

Similar but more minor, looks like users will also import datetime.datetime with import pandas which I find odd.

In [1]: import pandas

In [2]: pandas.datetime
Out[2]: datetime.datetime

All 20 comments

There is a chance removing this will break something, in case adding it wasn't random, but I believe it should still be removed. It's ugly and the issues that might arise are easily fixable.
As far as I can see, all this would entail is a one-line edit to __init__.py I found no explanation why np was added in the code or in the API reference.
Edit: I did find a colleague who relied on this, so even if our code doesn't break anything in the library (or if we fix it), this change will still break backward compatibility for some users.

We remove everything gradually, by first raising warnings.

I think there are other things we may also want to check if we should remove, I saw a pandas.array that I guess is an alias for numpy.array.

Similar but more minor, looks like users will also import datetime.datetime with import pandas which I find odd.

In [1]: import pandas

In [2]: pandas.datetime
Out[2]: datetime.datetime

Fair point about the deprecation warning.

pd.array, however, isn't just an alias. It allows to make arrays of pandas-specific datatypes.

This works:

pd.array([1,2,3], dtype=pd.Int64Dtype())

<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

This doesn't:

np.array([1,2,3], dtype=pd.Int64Dtype())

TypeError                                 Traceback (most recent call last)
<ipython-input-7-6ae38424b9f7> in <module>
----> 1 np.array([1,2,3], dtype=pd.Int64Dtype())

TypeError: data type not understood

The numpy import is actually explicit: https://github.com/pandas-dev/pandas/blob/37dfcc1acf3b37a1ff5251fee3380a179da1f2ed/pandas/__init__.py#L108 but is simply a redirect to normal numpy: https://github.com/pandas-dev/pandas/blob/37dfcc1acf3b37a1ff5251fee3380a179da1f2ed/pandas/core/api.py#L3 Still, I have often met users that swear that pandas.np is different from np as it provides a compatability layer between NumPy and pandas. Removing this alias would also wipe out this myth. As it is just an alias, the breaking change is really easy to resolve.

The datetime import is also https://github.com/pandas-dev/pandas/blob/37dfcc1acf3b37a1ff5251fee3380a179da1f2ed/pandas/__init__.py#L42 only there for importing it via pandas.datetime. There is no usage of it in the __init__.py.

For python 3.7+, we can actually deprecate this with the module getattr trick (the same we use for Panel dummy class right now). So I think we can go through a deprecation cycle instead of directly removing (for python 3.6, this is more difficult though).

pd.array, however, isn't just an alias.

Yep, I got confused, it's obviously our own array.

take

Are we wanting to do this for 1.0, or should it wait, or does it not matter?

Would be nice, but I don't think it's important, since it won't be removed until 2.0 I guess.

datetime was also suggested for this treatment. what else doesn't belong in the top-level namespace? Some candidates:

  • __docformat__ (is this a legacy thing or is it actually used by sphinx or something?)
  • _hashtable, _lib, _tslib
  • datetime
  • isnull, notnull (weren't these deprecated a while back?)
  • _np_version_under1p*
  • _version

the private modules don鈥檛 show up anyhow so reallly nbd in those

we didn鈥檛 actually depreciate isnull/notnull

Can i have a go at datetime or isnull/notnull ?

I think it is better to first finalize the open PR: https://github.com/pandas-dev/pandas/pull/30386.

Also, if we want to deprecate isnull/notnull, let's first discuss that in a separate issue, as this is a quite different thing. Here we are discussing shortcuts for external packages.

ok will wait for this PR to be merged - if datetime still requires treatment in this issue then happy to work on that.

I think we want to get rid of pandas.datetime, and I don't think there shouldn't be important conflicts with #30386 if you open the PR in parallel.

ok will do, thanks @datapythonista

Hi guys,

I've got this code:

df['SEBotClass'] = pd.np.where(df.userAgent.str.contains("YandexBot"), "YandexBot", pd.np.where(df.userAgent.str.contains("bingbot"), "BingBot", pd.np.where(df.userAgent.str.contains("DuckDuckBot"), "DuckDuckGo", pd.np.where(df.userAgent.str.contains("Baiduspider"), "Baidu", pd.np.where(df.userAgent.str.contains("Googlebot/2.1"), "GoogleBot", "Else"))))) nan=pd.np.nan

Which now gives this warning message:

FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead

What do I need to do to avoid that warning?

Thanks! :)

Replace pd.np with np in your code above (after doing import numpy as np)

Thanks Joris!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mfmain picture mfmain  路  3Comments

ericdf picture ericdf  路  3Comments

nathanielatom picture nathanielatom  路  3Comments

amelio-vazquez-reina picture amelio-vazquez-reina  路  3Comments

MatzeB picture MatzeB  路  3Comments