Here is our winning list of tests taking longer than 2 minutes. Let us figure out a plan for these - remove them (I can run them as a cron job on my workstation weekly), or modify them to run within 2 minutes ideally.
I am taking the liberty to remove the first one, which was being skipped but was added back by me in #806. Addressing these slowest tests will give a respite to all Pyro developers who have been facing build timeouts due to slow tests.
Note that some of this slowdown might be due to https://github.com/uber/pyro/issues/839.
I propose moving most of these tests to Travis cron that I鈥檒l set up to run daily. However, the top five should be fixed anyhow as they are known to time out individually. Also cron jobs will have time limits too.
Before we remove these I'd like to see if our fixes to enumeration will help.
Note that many of these can be sped up by converting data from a list to a tensor. I've done that in many such tests in the past.
Also, I'd prefer doing profiling work to speed up Pyro on these tests, rather than removing or moving or reclassifying the tests. The real problem is that Pyro is slow 馃槈
Also, I'd prefer doing profiling work to speed up Pyro on these tests, rather than removing or moving or reclassifying the tests. The real problem is that Pyro is slow
I鈥檓 all for profiling and making Pyro faster but some of these tests have historically been slow (like the sampling tests). I am not sure if there is an easy fix for all of them (if there is we should certainly fix them), but I don鈥檛 think tests taking longer than 3 minutes should be running on every commit. That costs more dev time cumulatively than any benefits from the protections these tests offer on a per commit basis, and can be safely delegated to Travis cron.
tests/infer/test_sampling.py::ImportanceTest::test_importance_prior
Can be removed, and
tests/infer/test_sampling.py::ImportanceTest::test_importance_guide
Can probably be replaced with some unit tests for the components, or sped up by #825
The real problem is that Pyro is slow
To add to @neerajprad's point, the test_elbo_nonreparameterized tests are inherently statistically slow. We could speed them up with #791 if we're not worried about introducing too much complexity into a single test, but even then it would still take many steps for SVI to converge. I think moving many of those to a cron job is a fine solution.
test_importance_prior test in #841 which does seem to timeout occasionally. I have started a daily cron too, where I will be adding some perf regression tests so that we can detect any major slowdown early. I do not want to add it to regular CI because the detection process will be noisy, and will require some tune-up. If there are other expensive tests that we are comfortable delegating to cron, we should consider adding them too.
test_elbo_nonreparameterizedtests are inherently statistically slow.
I was able to speed up tests/infer/test_enum.py in #828 from >10 minutes to <7 seconds by hand-parallelizing num_particles via iarange. @martinjankowiak do you think can we apply the same technique to the test_elbo_nonreparameterized SVI tests? The idea is that each step of SVI would parallelize over say 100 or 1000 particles to reduce variance at negligible wallclock cost, and thereby reduce the number of inference.step()s by a factor of 10?
i'm definitely for moving some tests to a cron job. the conjugate pyramid models are a good target for example. but i think we shouldn't be too aggressive in doing so quite yet. on the one hand because we need to make sure that the cron job is stable/mature/usable/legible/etc. and also because, as @fritzo points out, we still have room for speed-ups. the reality is that we should do a reasonably systematic test overhaul before we do the 0.2 release. (as for the svi integration tests, yeah, we can probably do something like that with an increased number of particles, but there are diminishing returns and i don't know if we'll gain more than a factor of ~5-10)
we should do a reasonably systematic test overhaul before we do the 0.2 release
I think it would be fine to target this for 0.2.1.
well either way but in any case we should start making progress towards it
Closing this. I will create specific task items as needed - #844, #845 are some candidate items.