Ref. https://github.com/uber/pyro/pull/1582.
There is a perf bug in MVN, and lowrank MVN log_prob methods on expanded instances (i.e. instances coming from mvn.expand()). @fehiepsi already has a patch for it. I am running our test suite to see if this might also affect other distributions.
With event_shape < 50, pytorch 0.4 is faster than pytorch 1.0
The pytorch devs did mention that there is some tensor overhead that was added after 0.4 due to which small models are likely to be slower. https://github.com/pytorch/pytorch/pull/9320 was an attempt to rectify that.
Following your link, I can see there are many interesting stories about the performance of 0.4 vs 1.0. Thanks @neerajprad !
This is already fixed.