Xarray: test failure with upstream-dev

Created on 17 Oct 2019  路  12Comments  路  Source: pydata/xarray

https://dev.azure.com/xarray/xarray/_build/results?buildId=1101&view=logs

=================================== FAILURES ===================================
_________________________ test_datetime_reduce[False] __________________________

dask = False

    @arm_xfail
    @pytest.mark.parametrize("dask", [False, True])
    def test_datetime_reduce(dask):
        time = np.array(pd.date_range("15/12/1999", periods=11))
        time[8:11] = np.nan
        da = DataArray(np.linspace(0, 365, num=11), dims="time", coords={"time": time})

        if dask and has_dask:
            chunks = {"time": 5}
            da = da.chunk(chunks)

        actual = da["time"].mean()
>       assert not pd.isnull(actual)
E       AssertionError: assert not True
E        +  where True = <function isna at 0x7f2475449ae8>(<xarray.DataArray 'time' ()>\narray('NaT', dtype='datetime64[ns]'))
E        +    where <function isna at 0x7f2475449ae8> = pd.isnull

xarray/tests/test_duck_array_ops.py:288: AssertionError
__________________________ test_datetime_reduce[True] __________________________

dask = True

    @arm_xfail
    @pytest.mark.parametrize("dask", [False, True])
    def test_datetime_reduce(dask):
        time = np.array(pd.date_range("15/12/1999", periods=11))
        time[8:11] = np.nan
        da = DataArray(np.linspace(0, 365, num=11), dims="time", coords={"time": time})

        if dask and has_dask:
            chunks = {"time": 5}
            da = da.chunk(chunks)

        actual = da["time"].mean()
>       assert not pd.isnull(actual)
E       AssertionError: assert not True
E        +  where True = <function isna at 0x7f2475449ae8>(<xarray.DataArray 'time' ()>\narray('NaT', dtype='datetime64[ns]'))
E        +    where <function isna at 0x7f2475449ae8> = pd.isnull

xarray/tests/test_duck_array_ops.py:288: AssertionError
=============================== warnings summary 

All 12 comments

This has now spread to all tests without pinned dependencies

conda list (py36) before and after the breakage:

< boto3                     1.9.252                    py_0    conda-forge
> boto3                     1.9.253                    py_0    conda-forge
< botocore                  1.12.252                   py_0    conda-forge
> botocore                  1.12.253                   py_0    conda-forge
< expat                     2.2.5             he1b5a44_1003    conda-forge
> expat                     2.2.5             he1b5a44_1004    conda-forge
< libxkbcommon              0.8.4                h516909a_0    conda-forge
> libxkbcommon              0.9.0                hebb1f50_0    conda-forge
< mypy_extensions           0.4.2                    py36_0    conda-forge
> mypy_extensions           0.4.3                    py36_0    conda-forge
< nss                       3.46                 he751ad9_0    conda-forge
> nss                       3.47                 he751ad9_0    conda-forge
< pandas                    0.25.1           py36hb3f55d8_0    conda-forge
> pandas                    0.25.2           py36hb3f55d8_0    conda-forge
< pip                       19.3                     py36_0    conda-forge
> pip                       19.3.1                   py36_0    conda-forge
< pseudonetcdf              3.0.2                      py_0    conda-forge
> pseudonetcdf              3.1.0                      py_0    conda-forge

This looks like this upstream pandas issue: https://github.com/pandas-dev/pandas/issues/29053

It doesn't add up - the upstream ticket mentions an incompatibility with numpy 1.18. However, our failing tests are all running with numpy 1.17.

Looking more closely at the failing tests, there are actually two problems.
One is with pandas/numpy git tip, and it only affects the upstream-dev test suite.
The other is with pseudonetcdf-3.1, and it affects py36, py37, and upstream-dev.

Narrowed down:

>>> import numpy as np
>>> a = np.array(['1999-12-15', 'NaT'], dtype='M8[ns]')
>>> np.min(a)

Output:
numpy 1.17: numpy.datetime64('1999-12-15T00:00:00.000000000')
numpy 1.18: numpy.datetime64('NaT')

triggered by:
https://github.com/pydata/xarray/blob/0f7ab0e909d9b6272f734a0b6fa4318e9522d3a2/xarray/core/duck_array_ops.py#L372

I think we have our own version of the same issue that pandas had.

Same problem:

a = xarray.DataArray(['1999-12-15', 'NaT']).astype('M8[ns]')            
a.min(skipna=False)  # np 1.17: 1999-12-15; np 1.18: NaT
a.min(skipna=True)  # np 1.17: crashes; np 1.18: crashes

Looks like numpy will fix this: https://github.com/numpy/numpy/pull/14841

@dcherian I didn't try running the PR code, but I don't think so?
The PR may mean (must test) that nanmin() and nanmax() now work with NaT. However, as highlighted above https://github.com/pydata/xarray/issues/3409#issuecomment-544299242 xarray is invoking min() on an array that contains NaT - which in numpy 1.17 ignores them, while in 1.18 correctly returns NaT.

Does anybody have the time to test it?

Sigh yes you're right

Unfortunately this does not seem to make np.nanmin and np.nanmax work for datetime arrays (yet), see: https://github.com/numpy/numpy/pull/14841#issuecomment-551824320

Was this page helpful?
0 / 5 - 0 ratings

Related issues

blaylockbk picture blaylockbk  路  4Comments

benbovy picture benbovy  路  3Comments

zxdawn picture zxdawn  路  3Comments

mathause picture mathause  路  4Comments

byersiiasa picture byersiiasa  路  5Comments