pd.testing.assert_frame_equal(
pd.DataFrame([{'filename':'a'}, {'filename': 'b'}]),
pd.DataFrame([{'filename':'b'}, {'filename': 'a'}]),
check_like=True)
AssertionError: DataFrame.iloc[:, 0] are different
DataFrame.iloc[:, 0] values are different (100.0 %)
[left]: [a, b]
[right]: [b, a]
According to the documentation (version 0.23.3) , pandas.testing.assert_frame_equal takes a "check_like" parameter which can be set to True if the function should ignore the order of columns & rows.
check_like : bool, default False
If true, ignore the order of rows & columns
This does not work as I expect it to. When creating two Dataframe using a dict, it asserts them as being different due to the order of the rows.
I would expect the test to pass.
pd.show_versions()commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.93-linuxkit-aufs
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.3
pytest: 3.6.2
pip: 9.0.1
setuptools: 33.1.1
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The order doesn't matter, but the same labels need to be with the same data. e.g.
In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])
In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])
In [33]: pd.util.testing.assert_frame_equal(a, b)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-33-302d0eeab6d2> in <module>()
----> 1 pd.util.testing.assert_frame_equal(a, b)
~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj)
1330 check_exact=check_exact,
1331 check_categorical=check_categorical,
-> 1332 obj='{obj}.index'.format(obj=obj))
1333
1334 # column comparison
~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_index_equal(left, right, exact, check_names, check_less_precise, check_exact, check_categorical, obj)
858 check_less_precise=check_less_precise,
859 check_dtype=exact,
--> 860 obj=obj, lobj=left, robj=right)
861
862 # metadata comparison
pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()
pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()
~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in raise_assert_detail(obj, message, left, right, diff)
1033 msg += "\n[diff]: {diff}".format(diff=diff)
1034
-> 1035 raise AssertionError(msg)
1036
1037
AssertionError: DataFrame.index are different
DataFrame.index values are different (100.0 %)
[left]: Index(['a', 'b'], dtype='object')
[right]: Index(['b', 'a'], dtype='object')
But this passes
In [34]: pd.util.testing.assert_frame_equal(a, b, check_like=True)
@lassebenni could you make a PR updating the docstring of assert_frame_equal to clarify this?
@TomAugspurger
Thank you for the quick response!
I am not very familiar with Pandas Dataframes, so I missed the part of explicitly defining an index for the data.
My use case is as following: I have some code that applies transformations to Spark Dataframes. For testing purposes I want to compare the expected result to the actual result. For this I transform the Spark Dataframes to Pandas Dataframes: df.toPandas(). After which I want to compare the two: pd.testing.assert_frame_equal(expected, actual, check_like=True).
Assuming the transformations applied create rows and columns in a different order than expected, I was hoping that check_like=True would handle the differences without me having to sort the resulting DataFrame by column(s).
But it seems that I will have to sort the DataFrame either way, since the indexes for the values have to match:
In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])
In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])
In short, I hoped that:
a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])
b = create_dataframe_transformation('filename', ['b', 'a'])
pd.testing.assert_frame_equal(a, b, check_like=True)
would pass, without having to do:
a = a.sort_values('filename')
b = b.sort_values('filename')
TLDR: for check_like to work, the values needs to have the same order of indexing.
Most helpful comment
The order doesn't matter, but the same labels need to be with the same data. e.g.
But this passes