I have code that was working quite nicely with 0.8.5. When I upgraded to 0.9.0, the final loss is WORSE (higher!) and fluctuates a bit. The code is exactly the same. This is very concerning.
(Also 0.9.0 forces tensorboard to downgrade from 2.3.0 to 2.2.0 and I can't really see the curves any more, but that is not my main concern.)
Steps to reproduce the behavior:
I have not yet been able to minimize the bug. If that is necessary for you to help with this bug report, please let me know and I will endeavour to help however I can.
0.9.0 should give just as good results at 0.8.5 or we should be able to understand why it doesn't and fix it. We should also improve the docs to give an idea how to migrate to 0.9.0 if code changes are required.
can you confirm, in both vesions you are using torch = 1.6?
Yes, in both versions torch = 1.6
Hi
I copied your notebooks to google colab
added a pl.seed_everything(100) to the cell in which you call trainer.fit()
In both versions, i get exactly the same loss value after 10 epochs
0.8.5:
https://drive.google.com/file/d/1KP8GRmY7fy_b5bRRU-1K1P3Z0N705rEH/view?usp=sharing
0.9.0
https://drive.google.com/file/d/1K_TL6W-sK_HdHBcncmy5irEMFMU7z_4Q/view?usp=sharing
If your model is sensitive to initialization, of course you will get different results.
When you compare runs you need to set the seed.
Also, please be careful with these notebooks. If you run the cell with trainer.fit multiple times, it will not train your model from scratch, it will simply continue because the variables for the model are still in memory from the previous cell.
please confirm asap that my findings are correct.
Yes haha.
Please set the seed :) it's stated very boldly in the docs.
This is user error, closing.
@awaelchli could you please give me access to your colabs?
yeah sorry the link was not public.