Pandas: pd.testing.assert_frame_equal check_like not working like expected

Created on 25 Jul 2018  路  3Comments  路  Source: pandas-dev/pandas

Code Sample

pd.testing.assert_frame_equal(
pd.DataFrame([{'filename':'a'}, {'filename': 'b'}]),
pd.DataFrame([{'filename':'b'}, {'filename': 'a'}]),
check_like=True)

AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (100.0 %)
[left]:  [a, b]
[right]: [b, a]

Problem description

According to the documentation (version 0.23.3) , pandas.testing.assert_frame_equal takes a "check_like" parameter which can be set to True if the function should ignore the order of columns & rows.

check_like : bool, default False
If true, ignore the order of rows & columns

This does not work as I expect it to. When creating two Dataframe using a dict, it asserts them as being different due to the order of the rows.

Expected Output

I would expect the test to pass.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.93-linuxkit-aufs
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.3
pytest: 3.6.2
pip: 9.0.1
setuptools: 33.1.1
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Docs good first issue

Most helpful comment

The order doesn't matter, but the same labels need to be with the same data. e.g.

In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])

In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])

In [33]: pd.util.testing.assert_frame_equal(a, b)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-33-302d0eeab6d2> in <module>()
----> 1 pd.util.testing.assert_frame_equal(a, b)

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj)
   1330                        check_exact=check_exact,
   1331                        check_categorical=check_categorical,
-> 1332                        obj='{obj}.index'.format(obj=obj))
   1333
   1334     # column comparison

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_index_equal(left, right, exact, check_names, check_less_precise, check_exact, check_categorical, obj)
    858                                      check_less_precise=check_less_precise,
    859                                      check_dtype=exact,
--> 860                                      obj=obj, lobj=left, robj=right)
    861
    862     # metadata comparison

pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()

pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in raise_assert_detail(obj, message, left, right, diff)
   1033         msg += "\n[diff]: {diff}".format(diff=diff)
   1034
-> 1035     raise AssertionError(msg)
   1036
   1037

AssertionError: DataFrame.index are different

DataFrame.index values are different (100.0 %)
[left]:  Index(['a', 'b'], dtype='object')
[right]: Index(['b', 'a'], dtype='object')

But this passes

In [34]: pd.util.testing.assert_frame_equal(a, b, check_like=True)

All 3 comments

The order doesn't matter, but the same labels need to be with the same data. e.g.

In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])

In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])

In [33]: pd.util.testing.assert_frame_equal(a, b)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-33-302d0eeab6d2> in <module>()
----> 1 pd.util.testing.assert_frame_equal(a, b)

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj)
   1330                        check_exact=check_exact,
   1331                        check_categorical=check_categorical,
-> 1332                        obj='{obj}.index'.format(obj=obj))
   1333
   1334     # column comparison

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_index_equal(left, right, exact, check_names, check_less_precise, check_exact, check_categorical, obj)
    858                                      check_less_precise=check_less_precise,
    859                                      check_dtype=exact,
--> 860                                      obj=obj, lobj=left, robj=right)
    861
    862     # metadata comparison

pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()

pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in raise_assert_detail(obj, message, left, right, diff)
   1033         msg += "\n[diff]: {diff}".format(diff=diff)
   1034
-> 1035     raise AssertionError(msg)
   1036
   1037

AssertionError: DataFrame.index are different

DataFrame.index values are different (100.0 %)
[left]:  Index(['a', 'b'], dtype='object')
[right]: Index(['b', 'a'], dtype='object')

But this passes

In [34]: pd.util.testing.assert_frame_equal(a, b, check_like=True)

@lassebenni could you make a PR updating the docstring of assert_frame_equal to clarify this?

@TomAugspurger

Thank you for the quick response!

I am not very familiar with Pandas Dataframes, so I missed the part of explicitly defining an index for the data.

My use case is as following: I have some code that applies transformations to Spark Dataframes. For testing purposes I want to compare the expected result to the actual result. For this I transform the Spark Dataframes to Pandas Dataframes: df.toPandas(). After which I want to compare the two: pd.testing.assert_frame_equal(expected, actual, check_like=True).

Assuming the transformations applied create rows and columns in a different order than expected, I was hoping that check_like=True would handle the differences without me having to sort the resulting DataFrame by column(s).

But it seems that I will have to sort the DataFrame either way, since the indexes for the values have to match:

In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])

In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])

In short, I hoped that:

a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])
b = create_dataframe_transformation('filename', ['b', 'a'])

pd.testing.assert_frame_equal(a, b, check_like=True)

would pass, without having to do:

a = a.sort_values('filename')
b = b.sort_values('filename')

TLDR: for check_like to work, the values needs to have the same order of indexing.

Was this page helpful?
0 / 5 - 0 ratings