Pymc3: Improve model debugging

Created on 31 Oct 2020 · 14Comments · Source: pymc-devs/pymc3

From Prieseman group:
"I don't whether Viola answered this question in the session, I wasn't there, but this is something I can answer as I did a large part of the development. We hadn't had much difficulties with PyMC3, if anything we were surprised I simple the development was. However I missed good debugging tools. It is quite easy to build a model where for some reason nans occur, or the gradient can't be computed. I haven't really found a good way to deal with it. There exist a monitor_mode in theano, but this involves looking through a txt file where the output of each node is printed. It works, but is somewhat cumbersome. I wrote here the different techniques we used: https://covid19-inference.readthedocs.io/en/latest/doc/debugging.html"

Source

twiecki

👍1

Most helpful comment

@ricardoV94 This might not work completely but something along the lines of:
git remote add stephenhogg https://github.com/StephenHogg/pymc3
git pull stephenhogg
git checkout 4116
git checkout -b improve_model_debugging

little "hack" I use to avoid adding a million remotes:

git checkout -b pr/StephenHogg/4116
git fetch https://github.com/StephenHogg/pymc3 4116
git reset --hard FETCH_HEAD

then, to push new commits:

git push https://github.com/StephenHogg/pymc3 HEAD:4116

MarcoGorelli on 26 Nov 2020

👍2

All 14 comments

It would be nice to have a helper function to just debug bad initial energy (with or without jitter as you mention in those docs). Every time I get this issue I have to google for jupenglao's code snippet in the discourse FAQ. The snippet just doesn't stick in my head :)

Even better would be if the helper function was called automatically (or at least mentioned) in the error message whenever the bad initial energy occurs.

ricardoV94 on 26 Nov 2020

@ricardoV94 Completely agree, that would be a great contribution. Want to give that a try?

twiecki on 26 Nov 2020

On the other hand, would such helper function become irrelevant after PR https://github.com/pymc-devs/pymc3/pull/4211 ? (Just noticed it now)

ricardoV94 on 26 Nov 2020

I think it could perhaps build on that PR to provide more informative output. In general anything that makes model problems easier to debug or for more informative error messages around that will be huge.

twiecki on 26 Nov 2020

Yeah, maybe the new function check_start_vals in that PR can be tweaked to be called outside of the sample function.

I imagine something like making the start parameter optional (if its missing we simply retrieve the model.test_point) and adding another argument that switches between the current behavior (raising an exception if something is nan or inf, and being silent otherwise) to a more informative behavior (printing the start log value of each RV without raising exceptions). The second behavior would be the default since that is the one users would want when calling the function directly.

The helper function would then be accessed by pm.utils.check_start_vals(model), and it would print basically the same output as the snippet we mentioned above. In addition users can pass a dictionary for the start argument, to test different values other than the model.test_point.

Or does it make more sense to write a separate function?

ricardoV94 on 26 Nov 2020

Yes, exactly what I had in mind.

twiecki on 26 Nov 2020

I don't mind giving it a go, but I should wait until that PR is completed, no?

ricardoV94 on 26 Nov 2020

@ricardoV94 That PR will be merged very soon so you could already start from that branch.

twiecki on 26 Nov 2020

Noob question: how can I work on someone else's fork/PR via git?

ricardoV94 on 26 Nov 2020

@ricardoV94 This might not work completely but something along the lines of:

git remote add stephenhogg https://github.com/StephenHogg/pymc3
git pull stephenhogg
git checkout 4116
git checkout -b improve_model_debugging

twiecki on 26 Nov 2020

👍1

Thanks, that helped me figure it out :)

ricardoV94 on 26 Nov 2020

After some digging I found again I might be just trying to reinvent the wheel. The model method check_test_point seems to do exactly what I was looking for.

ricardoV94 on 26 Nov 2020

@ricardoV94 This might not work completely but something along the lines of:
git remote add stephenhogg https://github.com/StephenHogg/pymc3
git pull stephenhogg
git checkout 4116
git checkout -b improve_model_debugging

little "hack" I use to avoid adding a million remotes:

git checkout -b pr/StephenHogg/4116
git fetch https://github.com/StephenHogg/pymc3 4116
git reset --hard FETCH_HEAD

then, to push new commits:

git push https://github.com/StephenHogg/pymc3 HEAD:4116

MarcoGorelli on 26 Nov 2020

👍2

Neat, didn't know about this one.

twiecki on 27 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Vectors of multivariate variables

fonnesbeck · 88Comments

Multiprocessing failure

fonnesbeck · 49Comments

Memory Error: OrderedLogistic

aloctavodia · 19Comments

DOCS: example of Gaussian process regression

PtrPiotr · 23Comments

Multiprocessing fails when sampling multiple chains using multiple cores

JackCaster · 27Comments