Pandas: TST: Converting to Pytest Idiom

Created on 13 Apr 2017  路  20Comments  路  Source: pandas-dev/pandas

Our testing framework is a literal hodgepodge of different 3rd-party framework standards (i.e. nosetest, pytest, unittest) that needs to be cleaned up.

A couple of points are clear:

  • Use assert instead of self.assert*...
  • When possible, replace for loops with pytest.mark.parametrize
  • All new test classes should inherit from object instead of tm.TestCase

However, we also have this massive testing module helper testing.py that is populated with tons of useful comparison methods as well as wrappers around basic assert methods that are used inconsistently throughout the code-base. Thus, the question is whether or not we want to keep using those OR remove them entirely and replace them with basic assert statements.

The advantage with using tm.assert statements is that they come directly with error messages instead of an empty AssertionError, which is what I don't like about raw assert statements that are considered more idiomatic with pytest.

Thus, it's a question of greater clarity (i.e. our error messages will be more useful by default when we get test failures) vs. idiomatic (raw assert with no wrapping is the idiomatic way to go but risks presenting less helpful failure messages).

Another point I wanted to bring up is with regards to the introduction of decorators in #15952 to capture stdout and stderr. pytest has fixtures to capture those, but incorporating it into the code-base is very difficult due to our heavy reliance still on unittest.TestCase, which causes incorporation to fail. The question is: do we eventually want to use the pytest fixture or continue using the decorator?

Thoughts?


From the discussion that follows, here are the functions that should be removed and replaced:

  • [x] self (tm).assertRaises --> pytest.raises
  • [x] (NO ACTION) self (tm).assertRaisesRegexp -->
    ~python
    with pytest.raises(cls) as exc_info:
    ...
    exc_info.match(regex)
    ~

    Per discussion in #16089 (comment), we are leaving as is.
  • [x] self.assert* (examples follow)

    • self.assertIs

    • self.assertEqual

    • ...and many others

  • [x] self.assert_* (examples follow)

    • self.assert_numpy_array_equal

    • self.assert_series_equal

    • self.assert_frame_equal

    • self.assert_index_equal

    • ...and many others

  • [x] tm.assert_equal
  • [x] tm.assertIs(Not)
  • [x] tm.assert(Not)In
  • [x] tm.assertIs(Not)None
  • [x] tm.assert(Not)IsInstance
Docs Testing

Most helpful comment

I don't think we need to change any of the asserts in util/testing.py. They're already doing the right thing.

I think @gfyoung meant the assert_equal, assertIsInstance, .. wrappers we have in pandas.util.testing. And for those, yes, I think we should clean that up and remove those.

All 20 comments

this is already indicated in #15341 and documented. this is orthogonal to #15962 which is fine.

Thus, the question is whether or not we want to keep using those OR remove them entirely and replace them with basic assert statements.

absolutely NOT. not sure where you got this notion.

sure self.assertTrue, self.assertFalse, self.assertRaises, and self.assertEqual when its a scalar should be replaced, but these are just old nose idioms.

Generally replacing test code should be done in independent PRs/commits.

actually another section after http://pandas-docs.github.io/pandas-docs-travis/contributing.html#how-to-use-parametrize would make sense to explain how to write / when to use which assertions, focused on asserting pandas objects (assert_series_equal) and scalars, regexes etc. almost listing what are the 'common' ones would be enough.

Thus, the question is whether or not we want to keep using those OR remove them entirely and replace them with basic assert statements

Since all the aseert_*_equal methods use assert condition, message, pytest's assertion rewriting is disabled (4th paragraph here)

which is what I don't like about raw assert statements that are considered more idiomatic with pytest.

Pytest has special comparison reprs for builtin data structues, and you can define your own so this could be solved. But we don't want to. We need all the arguments to assert_*_equal like check_dtypes, that you can't specify with ==. We shouldn't ever have bare asserts that aren't being rewritten by pytest.

I can writeup a summary of the comparisons and assertions for the documentation.

@jreback , @TomAugspurger : So if I'm understanding this properly, we are sticking with raw assert statements when possible because the wrappers will not be properly handled by pytest ?

If that is the case, yes, absolutely, then we should replace using util/testing comparison methods with assert when appropriate in separate PR's (of course, not something like assert_frame_equal but something like assert_equal can then be removed).

@jreback , @TomAugspurger : So if I'm understanding this properly, we are sticking with raw assert statements when possible because the wrappers will not be properly handled by pytest ?

I think you misunderstood.

For builtins, scalars, etc. we'll recommend assert {'a': 1, 'b': 2} == {'a': 1, 'b': 3} over self.assertDictEqual.

For DataFrames, Series, etc., we'll recommend assert_frame_equal(df1, df2).

I don't think we need to change any of the asserts in util/testing.py. They're already doing the right thing.

@TomAugspurger : No, I do understand in fact. I am referring to chunks like this and something like this.

I don't think we need to change any of the asserts in util/testing.py. They're already doing the right thing.

I think @gfyoung meant the assert_equal, assertIsInstance, .. wrappers we have in pandas.util.testing. And for those, yes, I think we should clean that up and remove those.

In xarray we built xarray.test.assert_equal and then delegated from there based on type - rather than specific assert_X_equals, which seem redundant

But probably not worth changing now

wrappers we have in pandas.util.testing. And for those, yes, I think we should clean that up and remove those.

@jorisvandenbossche : Yes, that is correct. Okay, that's what I was hoping to get clarification on.

Also, do we want to use pytest fixtures to capture stdout and stdin instead of method decorators that were introduced in one of my earlier PR's? OR are we okay with that (no one is commenting on this, so it seems like we are okay with this)?

Regarding the wrappers in util.testing, note that some of them are already hardly / not used (like assertIs, assertIs) so will be easy to clean up (others like assert_equal are used more).

yes the only things would change are

  • assert_equal (this is generally for scalar / lists /dicts)
  • self.assertEqual
  • self.assertTrue/False
  • assertDictEqual
  • assertIs
  • assertIsInstance

and use pytest.raises

I would also add the following (static tm. methods):

  • assertIs
  • assertIsNot
  • assertIn
  • assertNotIn
  • assertIsNone
  • assertIsNotNone

One thing I might also add is that in the spirit of functional, test classes should probably be avoided unless there is a good organization reason for doing so. Otherwise, test functions should be used.

@gfyoung ideally yes, but they are nice for organization, so I wouldn't necessarily change the structure if its meant for organization (and it almost always is).

@jreback : Yes, of course. That's why I wasn't as definitive about this (also, I added documentation in a PR as you know (#15989) to explicitly to address that caveat).

To everyone here, what is your opinion about pytest.warns vs. tm.assert_produces_warning ? I know that we have decided to not use pytest.raises in place of tm.assertRaisesRegexp (soon to be tm.assert_raises_regex). I am more inclined to use this former over the latter because it also does message checking which our implementation does not do currently. Thoughts?

@gfyoung BTW, I just saw that the latest pytest actually supports a pytest.warns(..., match=..) to potentially have the behaviour of tm.assert_raises_regex (one of our custom methods we decided to keep because of this limitation in pytest.warns) (https://github.com/pytest-dev/pytest/pull/2227).

The PR you pointed to seems to referring to pytest.raises, though this is good to know nevertheless.

Ah, sorry, yes, that is what I meant :-)
(as otherwise it wouldn't help replacing tm.assert_raises_regex)

But indeed, we also have out own assert_produces_warning, but that is still something else (I think our implementation also change the filter and checks that stacklevel, compared to pytest one)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nathanielatom picture nathanielatom  路  3Comments

mfmain picture mfmain  路  3Comments

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments

hiiwave picture hiiwave  路  3Comments

matthiasroder picture matthiasroder  路  3Comments