Pomegranate: Multivariate Gaussian Covariance Type

Created on 26 Feb 2017  Â·  11Comments  Â·  Source: jmschrei/pomegranate

In hmmlearn there is a parameter which controls covariance type:

covariance_type : string, optional

String describing the type of covariance parameters to use. Must be one of

“spherical” — each state uses a single variance value that applies to all features.
“diag” — each state uses a diagonal covariance matrix.
“full” — each state uses a full (i.e. unrestricted) covariance matrix.
“tied” — all states use the same full covariance matrix.
Defaults to “diag”.

I cannot find anything similar in pomegranate in the docs or code:

class MultivariateGaussianDistribution(MultivariateDistribution):
    # no doc

All 11 comments

Tried to use IndependentComponentsDistribution over NormalDistribution but I keep getting errors.

from pomegranate import *

dim = 20
n_component = 10
GeneralMixtureModel(IndependentComponentsDistribution([NormalDistribution] * dim), n_component)

ValueError: must either give initial distributions or constructor

Tried to initiate differently but keep getting these errors

Howdy @chananshgong

Sorry for the delay, I've been inundated with work recently. Currently I only support full covariance matrices, though at some point I'd like to support all types. If you want to use an IndependentComponentsDistribution you currently need to specify the initial parameters. However, this won't use BLAS so it's likely going to be much slower.

If I get time I'll look into a good performing solution soon. I've been working on Bayesian network structure learning recently.

I'd managed to use IndependentComponentsDistribution/NormalDistribution to achieve "diag" equivalent, building model like this:

        n_features = full_fset.shape[-1]
        means = np.mean(full_fset, axis=0)
        stds = np.std(full_fset, axis=0)
        # initial values for all gaussian components
        np.random.seed(None)
        dist_init = np.random.random((n_states, n_cmps, n_features, 2))
        dist_init[..., 0] -= 0.5  # center means to 0.0
        for feat_i in range(n_features):
            # random init mean in range [-2std, 2std)
            dist_init[..., feat_i, 0] *= 4 * stds[feat_i]
            dist_init[..., feat_i, 0] += means[feat_i]
            # random init std in range 1std/n_components
            dist_init[..., feat_i, 1] *= stds[feat_i] / n_cmps

        dists = tuple(
            pgn.GeneralMixtureModel(list(
                pgn.IndependentComponentsDistribution(tuple(
                    pgn.NormalDistribution(*dist_init[state_i, cmp_i, feat_i, :])
                    for feat_i in range(n_features)
                ))
                for cmp_i in range(n_cmps)
            ))
            if n_cmps > 1 else
            pgn.IndependentComponentsDistribution(tuple(
                pgn.NormalDistribution(*dist_init[state_i, 0, feat_i, :])
                for feat_i in range(n_features)
            ))
            for state_i in range(n_states)
        )
        trans_mat = np.random.random((n_states, n_states))
        starts = np.ones(n_states)
        self.hmm = pgn.HiddenMarkovModel.from_matrix(trans_mat, dists, starts)

hope it provides some clues and you folks help review my usage ;)

Ultimately what I need to do is make it so that Model.from_samples is used throughout the package and make it so that when IndependentComponentsDistribution.from_samples is used, it appropriately initializes all of the distributions which are passed in.

Thanks for the example @complyue

@complyue I understand that this may be very slow...

@lxkain my approach is to randomize each training attempts to arrive at some surprising (or not) model parameters. I don't get your meaning of slow, would you share?

I was only referring to the manner in which a diagonal covariance MGD can be constructed, via IndependentComponentsDistribution using NormalDistributions <- slow, according to Jacob.

Yeah. Not only does it currently not use BLAS, but it handles each example individually. A bunch of people have asked for this, it should be higher on my prior queue...

The IndependentComponentsDistribution approach should be much faster as of a month or two ago. Explicitly having options built-in is still on my queue.

This is great news, thank you! Question: what do you mean by the explicitly built-in options?

I mean that I'd like for you to be able to specify "covariance_type=..." in MultivariateGaussianDistributions for when you call fit or from_samples.

Was this page helpful?
0 / 5 - 0 ratings