Pymc3: Standardize and Update Notebook Gallery

Created on 12 Jun 2020  ·  9Comments  ·  Source: pymc-devs/pymc3

[BEGINNER-FRIENDLY]
Our notebooks gallery is quite big, so:

  • Many of them use an old style and could use an updating with ArviZ color style instead (not listed).
  • Many notebooks show FutureWarnings that should be addressed (not listed).
  • Some notebooks fail to run because they use outdated third-party APIs or exotic packages (listed below).

So this issue is here to signal it would be nice if people want to take some time updating and re-running the notebooks below with PyMC 3.9, according to this style page 🎉
Do it in small batches though, to not get bored and enjoy it 😉 Thanks a lot in advance for your help and don't hesitate to ask your questions below!
PyMCheers 🖖

Here is an up-to-date list of the most outdated and problematic NBs (those not listed here should be checked for style and updating accordingly):

Exotic

  • [ ] blackbox_external_likelihood needs Cython
  • [ ] convolutional_vae_keras_advi needs Keras

Other Issues

  • [ ] GLM theano.gof.fg.MissingInputError
  • [x] GLM-poisson-regression KeyError: "['hpd_2.5', 'hpd_97.5'] not in index"
  • [ ] GLM-negative-binomial-regression KeyError: "['hpd_97.5', 'hpd_2.5'] not in index"
  • [ ] GLM-model-selection KeyError: 'var names: "[\'sd_log__\'] are not present" in dataset'
  • [x] GP-MaunaLoa2 ValueError: Units 'M' and 'Y' are no longer supported
  • [x] GP-MaunaLoa ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.
  • [ ] GP-TProcess runs but has way too many divergences; timed out after 14_000 seconds
  • [x] PyMC3_tips_and_heuristic KeyError: Rhat
  • [ ] dependent_density_regression AttributeError: 'DataFrame' object has no attribute 'range'
  • [x] hierarchical_partial_pooling not enough values to unpack (expected 2, got 1)
  • [ ] lda-advi-aevb TypeError: __init__() got an unexpected keyword argument 'n_topics'
  • [x] marginalized_gaussian_mixture_model AttributeError: 'Rectangle' object has no property 'normed'
  • [x] GLM-logistic AttributeError: 'Rectangle' object has no property 'normed'
  • [x] model_averaging FileNotFoundError: File ../data/milk.csv does not exist
  • [x] model_comparison AttributeError: 'ELPDData' object has no attribute 'WAIC'
  • [x] multilevel_modeling More chains (4000) than draws (2) and some plots may be wrong
  • [x] profiling has a shape error
  • [ ] rugby_analytics ValueError: not enough values to unpack (expected 2, got 1)
  • [ ] sampling_callback has a shape error (looks like a threading problem)
  • [ ] survival_analysis cell 11 raises a NotImplementedError in numpy/pandas
  • [x] weibull_aft AttributeError: module 'statsmodels' has no attribute 'datasets'
  • [ ] ODE_with_manual_gradients ValueError: array must not contain infs or NaNs
beginner friendly examples help wanted

All 9 comments

Hi @AlexAndorra
I am willing to update the notebooks dealing with Variational Inference and restyling according to the guide. I have one question though - do we need to re-run the notebooks on the current state of the PyMC3 codebase or running them on the latest released version (3.8)?

Great, thanks for answering so quickly Sayam!
Actually, we're rerunning and restyling all the notebooks prior to
releasing 3.9.0 (see https://github.com/pymc-devs/pymc3/pull/3955) so it'd
be awesome if you could help us out here with the VI notebooks (or any
others you like) -- just make sure to tell us which you're working on in
the comments 🙏

Le ven. 12 juin 2020 à 19:59, Sayam Kumar notifications@github.com a
écrit :

Hi @AlexAndorra https://github.com/AlexAndorra
I am willing to update the notebooks dealing with Variational Inference
and restyling according to the guide. I have one question though - do we
need to re-run the notebooks on the current state of the PyMC3 codebase or
running them on the latest released version (3.8)?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pymc-devs/pymc3/issues/3959#issuecomment-643410894,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AHIJMTD56PC76BTY6A33CSDRWJUH3ANCNFSM4N4QU4WA
.

I can work on rugby, radon (multilevel_modeling) and model comparison ones. I like the first two because after rerunning them I will be able to update the ArviZ examples with the new InferenceData objects (see https://github.com/arviz-devs/arviz/issues/1132). Regarding the third one, I am very familiar with loo/waic api and therefore I don't expect to find many issues (nor spend too much time on it).

Thanks @OriolAbril ! Actually, I think @Sayam753 already updated the rugby NB. Now that #3955 is merged, he'll probably open a new PR and it should be merged in master pretty soon 😉
For the radon NB, it should be quick: I updated it a couple of months ago and it was reviewed so I think it's a false positive -- the More chains (4000) than draws (2) warning is because of a known ArviZ issue that we didn't have time to look at yet, and I don't expect many plots to be wrong. But it's alway good to have a second pair of eyes!
Finally, I think you're the perfect person for the model_comparison NB 😉

I will open a PR regarding rugby notebook and will be happy @OriolAbril reviewing the same.

Great! Please ping me in the PR and I'll review :)

Count on me to fix the model comparison and model averaging notebooks.

I am working on the GP notebooks.

I submitted a PR for weibull_aft. It's my first PR so will greatly appreciate if one of you can review it! @AlexAndorra
Thanks.

Was this page helpful?
0 / 5 - 0 ratings