With the latest pytorch master, many parallel enum tests in test_enum are failing due to mismatch in the gradient computation.
To replicate, checkout the commit 5fa3aac610ee234338dbc11eb5b6d4a133cb483d in PyTorch master (https://github.com/pytorch/pytorch/pull/5776), build PyTorch and run these tests
pytest -v --tb=short tests/infer/test_enum.py
Example of a failing test - test_elbo_iarange_iarange 2-2-None-None-parallel-None.
@fritzo, @eb8680 - I thought that there could be some unexpected interactions between the dice elbo change and upstream PyTorch. Turns out that is not exactly the case as 11 of our tests fail even before the dice elbo change, but there are more failures (79) with dice elbo. Could you guys take a look?
This could either be a Pyro bug or something in PyTorch upstream.
Hey @neerajprad, thank you for your post, I'm looking into this now.
I found the bug, I'll send a patch soon.
Thanks, @cpuhrsch! Curious to see where the bug was.
@neerajprad please see PR https://github.com/pytorch/pytorch/pull/5926
Fixed upstream by pytorch/pytorch#5926 and in Pyro by #917.
Most helpful comment
I found the bug, I'll send a patch soon.