Pymc3: DIC method taking hours

Created on 9 Aug 2017  路  17Comments  路  Source: pymc-devs/pymc3

Hello,
I'm using the DIC function in a model with 10'000 iterations and 3 chains. It takes hours (~3) to compute the DIC and I found the problem is at this line https://github.com/pymc-devs/pymc3/blob/3232e160dfc728546c4dcedfc0a195831bcd1cf4/pymc3/stats.py#L119.

I tried to parallelize it, but it seems it doesn't parallelize easily. Can you help me?

All 17 comments

Yeah there are some recent discussion here about how to speed up evaluation of logp function. Wanna try the solution inside and submit a PR?

@junpenglao I try :) thx

ohhh I forgot to fix DIC! (and bpic).
@denadai2 just out of curiosity, why are you using DIC instead of WAIC?

@aloctavodia great question :P I suppose I'm using it because I'm interested on describing the parameters instead of predictions (LOO etc)... Is it wrong in this case?

Btw I tried to "cache" logp but it doesn't copy the function. It is very slow to create the list "cached". I'm not an expert, so help me. Meanwhile I did this mess:

def dic2(trace, model=None):
    model = pm.stats.modelcontext(model)
    cached = [(var, var.logp_elemwise) for var in model.observed_RVs]
    cached2 = [(var, var.logp_elemwise) for var in model.free_RVs]

    d_obs = []
    d_frees = []
    for pt in trace:
        d_obs.append(np.sum([logp(pt) for var, logp in cached]))
        d_frees.append(np.sum([logp(pt) for var, logp in cached2]))

    mean_deviance = -2 * np.mean(np.add(d_obs, d_frees))

    free_rv_means = {rv.name: trace[rv.name].mean(
        axis=0) for rv in model.free_RVs}
    deviance_at_mean = -2 * model.logp(free_rv_means)

    return 2 * mean_deviance - deviance_at_mean

Don't kill me but I'm in rush with some experiments and this is 10x faster :D

@denadai2 actually you just need to replace

mean_deviance = -2 * np.mean([model.logp(pt) for pt in trace])

with

logp=model.logp
mean_deviance = -2 * np.mean([logp(pt) for pt in trace])

yes, but it doesn't work faster.
It seems that logp is different from elementwiset

are u sure? I tested it locally and it runs much faster.

Proof:

%timeit pooled_model.logp(pooled_trace[0])
logp = pooled_model.logp
%timeit logp(pooled_trace[0])
1 loop, best of 3: 1.14 s per loop
1000 loops, best of 3: 421 碌s per loop

Here you would assume cached logp is a lot better right? It is not, because the copy takes a lot of time.

%timeit cached = [(pt, pooled_model.logp) for pt in pooled_trace[-20:]]
1 loop, best of 3: 21.8 s per loop
cached = [(pt, pooled_model.logp) for pt in pooled_trace[-20:]]
%timeit mean_deviance = -2 * np.mean([logp(pt) for pt, logp in cached])
100 loops, best of 3: 7.72 ms per loop

So cached assignment takes all the time

@denadai2 both DIC and WAIC are trying to estimate the same thing, the main difference is that DIC assumes a (multivariate) Gaussian posterior distribution, and WAIC does not make this assumption. @junpenglao is right, the only thing you need to do is to write logp=model.logp and then use logp. Let me know if you are going to do a PR, or if I should do it, and thanks a lot for reporting this issue!

@aloctavodia ah ok I try

DIC/WAIC: ok. I thought WAIC was more related to LOO and predictions. I'll look at the paper again :) thx

@denadai2 You don't need to cache the logp function more than once:

cached = model.logp
%timeit mean_deviance = -2 * np.mean([logp(pt) for pt in trace])

@aseyboldt indeed :P I pushed the fixed version

I am a little bit late to the party :) @denadai2, I just want to add a reference to the Watanabe webpage, just in case you want to read more about WAIC (and loo, dic etc).

@aloctavodia uhm ok! Thanks. I think I'll open a thread in discourse to ask about the failure of WAIC :))

@denadai2 Sorry if I missed something, but which failure of WAIC?

@aloctavodia

DIC 217.868591309
/home/nadai/.local/lib/python3.4/site-packages/pymc3/stats.py:213: UserWarning: For one or more samples the posterior variance of the
        log predictive densities exceeds 0.4. This could be indication of
        WAIC starting to fail see http://arxiv.org/abs/1507.04544 for details

  """)
WAIC WAIC_r(WAIC=158.42521828487168, WAIC_se=2.3092993247226805, p_WAIC=7.2801294)
/home/nadai/.local/lib/python3.4/site-packages/pymc3/stats.py:278: UserWarning: Estimated shape parameter of Pareto distribution is
        greater than 0.7 for one or more samples.
        You should consider using a more robust model, this is
        because importance sampling is less likely to work well if the marginal
        posterior and LOO posterior are very different. This is more likely to
        happen with a non-robust model and highly influential observations.
  happen with a non-robust model and highly influential observations.""")
LOO LOO_r(LOO=164.91826, LOO_se=2.4069428499178245, p_LOO=10.526650309191908)

But, really, I'll ask in the forum so we don't disturb the others here :))

@aloctavodia https://discourse.pymc.io/t/dic-waic-wbic-on-regression-tasks/247 :)) thx

Was this page helpful?
0 / 5 - 0 ratings

Related issues

junpenglao picture junpenglao  路  20Comments

twiecki picture twiecki  路  23Comments

ericmjl picture ericmjl  路  21Comments

springcoil picture springcoil  路  23Comments

marciomacielbastos picture marciomacielbastos  路  28Comments