Pomegranate: Multivariate Gaussian Covariance Type

Created on 26 Feb 2017 · 11Comments · Source: jmschrei/pomegranate

In hmmlearn there is a parameter which controls covariance type:

covariance_type : string, optional

String describing the type of covariance parameters to use. Must be one of

“spherical” — each state uses a single variance value that applies to all features.
“diag” — each state uses a diagonal covariance matrix.
“full” — each state uses a full (i.e. unrestricted) covariance matrix.
“tied” — all states use the same full covariance matrix.
Defaults to “diag”.

I cannot find anything similar in pomegranate in the docs or code:

class MultivariateGaussianDistribution(MultivariateDistribution):
    # no doc

Source

chananshgong

All 11 comments

Tried to use IndependentComponentsDistribution over NormalDistribution but I keep getting errors.

from pomegranate import *

dim = 20
n_component = 10
GeneralMixtureModel(IndependentComponentsDistribution([NormalDistribution] * dim), n_component)

ValueError: must either give initial distributions or constructor

Tried to initiate differently but keep getting these errors

chananshgong on 26 Feb 2017

Howdy @chananshgong

Sorry for the delay, I've been inundated with work recently. Currently I only support full covariance matrices, though at some point I'd like to support all types. If you want to use an IndependentComponentsDistribution you currently need to specify the initial parameters. However, this won't use BLAS so it's likely going to be much slower.

If I get time I'll look into a good performing solution soon. I've been working on Bayesian network structure learning recently.

jmschrei on 28 Feb 2017

I'd managed to use IndependentComponentsDistribution/NormalDistribution to achieve "diag" equivalent, building model like this:

        n_features = full_fset.shape[-1]
        means = np.mean(full_fset, axis=0)
        stds = np.std(full_fset, axis=0)
        # initial values for all gaussian components
        np.random.seed(None)
        dist_init = np.random.random((n_states, n_cmps, n_features, 2))
        dist_init[..., 0] -= 0.5  # center means to 0.0
        for feat_i in range(n_features):
            # random init mean in range [-2std, 2std)
            dist_init[..., feat_i, 0] *= 4 * stds[feat_i]
            dist_init[..., feat_i, 0] += means[feat_i]
            # random init std in range 1std/n_components
            dist_init[..., feat_i, 1] *= stds[feat_i] / n_cmps

        dists = tuple(
            pgn.GeneralMixtureModel(list(
                pgn.IndependentComponentsDistribution(tuple(
                    pgn.NormalDistribution(*dist_init[state_i, cmp_i, feat_i, :])
                    for feat_i in range(n_features)
                ))
                for cmp_i in range(n_cmps)
            ))
            if n_cmps > 1 else
            pgn.IndependentComponentsDistribution(tuple(
                pgn.NormalDistribution(*dist_init[state_i, 0, feat_i, :])
                for feat_i in range(n_features)
            ))
            for state_i in range(n_states)
        )
        trans_mat = np.random.random((n_states, n_states))
        starts = np.ones(n_states)
        self.hmm = pgn.HiddenMarkovModel.from_matrix(trans_mat, dists, starts)

hope it provides some clues and you folks help review my usage ;)

complyue on 1 Mar 2017

👍1

Ultimately what I need to do is make it so that Model.from_samples is used throughout the package and make it so that when IndependentComponentsDistribution.from_samples is used, it appropriately initializes all of the distributions which are passed in.

Thanks for the example @complyue

jmschrei on 1 Mar 2017

👍1

@complyue I understand that this may be very slow...

lxkain on 9 Mar 2017

@lxkain my approach is to randomize each training attempts to arrive at some surprising (or not) model parameters. I don't get your meaning of slow, would you share?

complyue on 12 Mar 2017

I was only referring to the manner in which a diagonal covariance MGD can be constructed, via IndependentComponentsDistribution using NormalDistributions <- slow, according to Jacob.

lxkain on 12 Mar 2017

Yeah. Not only does it currently not use BLAS, but it handles each example individually. A bunch of people have asked for this, it should be higher on my prior queue...

jmschrei on 12 Mar 2017

The IndependentComponentsDistribution approach should be much faster as of a month or two ago. Explicitly having options built-in is still on my queue.

jmschrei on 2 Jan 2018

This is great news, thank you! Question: what do you mean by the explicitly built-in options?

lxkain on 2 Jan 2018

I mean that I'd like for you to be able to specify "covariance_type=..." in MultivariateGaussianDistributions for when you call fit or from_samples.

jmschrei on 2 Jan 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings