Pandas: TST: Converting to Pytest Idiom

Created on 13 Apr 2017 · 20Comments · Source: pandas-dev/pandas

Our testing framework is a literal hodgepodge of different 3rd-party framework standards (i.e. nosetest, pytest, unittest) that needs to be cleaned up.

A couple of points are clear:

Use assert instead of self.assert*...
When possible, replace for loops with pytest.mark.parametrize
All new test classes should inherit from object instead of tm.TestCase

However, we also have this massive testing module helper testing.py that is populated with tons of useful comparison methods as well as wrappers around basic assert methods that are used inconsistently throughout the code-base. Thus, the question is whether or not we want to keep using those OR remove them entirely and replace them with basic assert statements.

The advantage with using tm.assert statements is that they come directly with error messages instead of an empty AssertionError, which is what I don't like about raw assert statements that are considered more idiomatic with pytest.

Thus, it's a question of greater clarity (i.e. our error messages will be more useful by default when we get test failures) vs. idiomatic (raw assert with no wrapping is the idiomatic way to go but risks presenting less helpful failure messages).

Another point I wanted to bring up is with regards to the introduction of decorators in #15952 to capture stdout and stderr. pytest has fixtures to capture those, but incorporating it into the code-base is very difficult due to our heavy reliance still on unittest.TestCase, which causes incorporation to fail. The question is: do we eventually want to use the pytest fixture or continue using the decorator?

Thoughts?

From the discussion that follows, here are the functions that should be removed and replaced:

[x] self (tm).assertRaises --> pytest.raises
[x] (NO ACTION) self (tm).assertRaisesRegexp -->
~python
with pytest.raises(cls) as exc_info:
...
exc_info.match(regex)
~
Per discussion in #16089 (comment), we are leaving as is.
[x] self.assert* (examples follow)
- self.assertIs
- self.assertEqual
- ...and many others
[x] self.assert_* (examples follow)
- self.assert_numpy_array_equal
- self.assert_series_equal
- self.assert_frame_equal
- self.assert_index_equal
- ...and many others
[x] tm.assert_equal
[x] tm.assertIs(Not)
[x] tm.assert(Not)In
[x] tm.assertIs(Not)None
[x] tm.assert(Not)IsInstance

Docs Testing

Source

gfyoung

Most helpful comment

I don't think we need to change any of the asserts in util/testing.py. They're already doing the right thing.

I think @gfyoung meant the assert_equal, assertIsInstance, .. wrappers we have in pandas.util.testing. And for those, yes, I think we should clean that up and remove those.

jorisvandenbossche on 13 Apr 2017

👍2

All 20 comments

this is already indicated in #15341 and documented. this is orthogonal to #15962 which is fine.

Thus, the question is whether or not we want to keep using those OR remove them entirely and replace them with basic assert statements.

absolutely NOT. not sure where you got this notion.

sure self.assertTrue, self.assertFalse, self.assertRaises, and self.assertEqual when its a scalar should be replaced, but these are just old nose idioms.

Generally replacing test code should be done in independent PRs/commits.

jreback on 13 Apr 2017

actually another section after http://pandas-docs.github.io/pandas-docs-travis/contributing.html#how-to-use-parametrize would make sense to explain how to write / when to use which assertions, focused on asserting pandas objects (assert_series_equal) and scalars, regexes etc. almost listing what are the 'common' ones would be enough.

jreback on 13 Apr 2017

Thus, the question is whether or not we want to keep using those OR remove them entirely and replace them with basic assert statements

Since all the aseert_*_equal methods use assert condition, message, pytest's assertion rewriting is disabled (4th paragraph here)

which is what I don't like about raw assert statements that are considered more idiomatic with pytest.

Pytest has special comparison reprs for builtin data structues, and you can define your own so this could be solved. But we don't want to. We need all the arguments to assert_*_equal like check_dtypes, that you can't specify with ==. We shouldn't ever have bare asserts that aren't being rewritten by pytest.

I can writeup a summary of the comparisons and assertions for the documentation.

TomAugspurger on 13 Apr 2017

@jreback , @TomAugspurger : So if I'm understanding this properly, we are sticking with raw assert statements when possible because the wrappers will not be properly handled by pytest ?

If that is the case, yes, absolutely, then we should replace using util/testing comparison methods with assert when appropriate in separate PR's (of course, not something like assert_frame_equal but something like assert_equal can then be removed).

gfyoung on 13 Apr 2017

@jreback , @TomAugspurger : So if I'm understanding this properly, we are sticking with raw assert statements when possible because the wrappers will not be properly handled by pytest ?

I think you misunderstood.

For builtins, scalars, etc. we'll recommend assert {'a': 1, 'b': 2} == {'a': 1, 'b': 3} over self.assertDictEqual.

For DataFrames, Series, etc., we'll recommend assert_frame_equal(df1, df2).

I don't think we need to change any of the asserts in util/testing.py. They're already doing the right thing.

TomAugspurger on 13 Apr 2017

@TomAugspurger : No, I do understand in fact. I am referring to chunks like this and something like this.

gfyoung on 13 Apr 2017

👍1

I don't think we need to change any of the asserts in util/testing.py. They're already doing the right thing.

I think @gfyoung meant the assert_equal, assertIsInstance, .. wrappers we have in pandas.util.testing. And for those, yes, I think we should clean that up and remove those.

jorisvandenbossche on 13 Apr 2017

👍2

In xarray we built xarray.test.assert_equal and then delegated from there based on type - rather than specific assert_X_equals, which seem redundant

But probably not worth changing now

max-sixty on 13 Apr 2017

👍1

wrappers we have in pandas.util.testing. And for those, yes, I think we should clean that up and remove those.

@jorisvandenbossche : Yes, that is correct. Okay, that's what I was hoping to get clarification on.

gfyoung on 13 Apr 2017

Also, do we want to use pytest fixtures to capture stdout and stdin instead of method decorators that were introduced in one of my earlier PR's? OR are we okay with that (no one is commenting on this, so it seems like we are okay with this)?

gfyoung on 13 Apr 2017

Regarding the wrappers in util.testing, note that some of them are already hardly / not used (like assertIs, assertIs) so will be easy to clean up (others like assert_equal are used more).

jorisvandenbossche on 13 Apr 2017

yes the only things would change are

assert_equal (this is generally for scalar / lists /dicts)
self.assertEqual
self.assertTrue/False
assertDictEqual
assertIs
assertIsInstance

and use pytest.raises

jreback on 13 Apr 2017

👍1

I would also add the following (static tm. methods):

assertIs
assertIsNot
assertIn
assertNotIn
assertIsNone
assertIsNotNone

gfyoung on 13 Apr 2017

One thing I might also add is that in the spirit of functional, test classes should probably be avoided unless there is a good organization reason for doing so. Otherwise, test functions should be used.

gfyoung on 14 Apr 2017

@gfyoung ideally yes, but they are nice for organization, so I wouldn't necessarily change the structure if its meant for organization (and it almost always is).

jreback on 14 Apr 2017

@jreback : Yes, of course. That's why I wasn't as definitive about this (also, I added documentation in a PR as you know (#15989) to explicitly to address that caveat).

gfyoung on 14 Apr 2017

To everyone here, what is your opinion about pytest.warns vs. tm.assert_produces_warning ? I know that we have decided to not use pytest.raises in place of tm.assertRaisesRegexp (soon to be tm.assert_raises_regex). I am more inclined to use this former over the latter because it also does message checking which our implementation does not do currently. Thoughts?

gfyoung on 26 Apr 2017

@gfyoung BTW, I just saw that the latest pytest actually supports a pytest.warns(..., match=..) to potentially have the behaviour of tm.assert_raises_regex (one of our custom methods we decided to keep because of this limitation in pytest.warns) (https://github.com/pytest-dev/pytest/pull/2227).

jorisvandenbossche on 16 Aug 2017

The PR you pointed to seems to referring to pytest.raises, though this is good to know nevertheless.

gfyoung on 16 Aug 2017

Ah, sorry, yes, that is what I meant :-)
(as otherwise it wouldn't help replacing tm.assert_raises_regex)

But indeed, we also have out own assert_produces_warning, but that is still something else (I think our implementation also change the filter and checks that stacklevel, compared to pytest one)

jorisvandenbossche on 16 Aug 2017

Was this page helpful?

0 / 5 - 0 ratings