Pyro: incorrect log_prob dimension

Created on 25 Sep 2018 · 10Comments · Source: pyro-ppl/pyro

Inspired by the pyro tutorial on Bayesian regression: http://pyro.ai/examples/bayesian_regression.html

I wanted to try something on more complex data. I had this problem in mind

Input(X) will be 10-dimensional features
Output(y) will be 5-dimensional features (obtained linearly from input, y=XW+b+error)

I am using the following function to generate dataset (and this seems to be working fine)

N = 500  # size of toy data
indim = 10
outdim = 5

def build_linear_dataset(N, indim, outdim, noise_std=0.01):
    X = np.random.rand(N, indim)
    # w = 3
    w = 3 * np.ones((indim, outdim))
    # b = 1
    b = 1*np.ones((N, outdim))
    y = np.matmul(X, w)
    print(y.shape)
    y = y + b + np.random.normal(0, noise_std, size=(N, outdim))
    print(y.shape)
    y = y.reshape(N, outdim)
    X, y = torch.tensor(X).type(torch.Tensor), torch.tensor(y).type(torch.Tensor)
    data = torch.cat((X, y), 1)
    return data

To do the regression I am using the following model (don't mind the conv1d, I want to use conv1d for exerimentation, however prior will only be on linear layer)

class RegressionModel(nn.Module):
    def __init__(self, indim, outdim):
        # p = number of features
        super(RegressionModel, self).__init__()
        self.conv1 = nn.Conv1d(1, 1, kernel_size=3, stride=1, padding=1)
        self.linear = nn.Linear(indim, outdim)

    def forward(self, x):
        y = self.conv1(x)
        y = self.linear(y)
        print('fwd:', y.size())
        return y

regression_model = RegressionModel(10, 5)

Model function is

def model(data):
    # Create unit normal priors over the parameters
    loc, scale = torch.zeros(outdim, indim), 10 * torch.ones(outdim, indim)
    bias_loc, bias_scale = torch.zeros(outdim), 10 * torch.ones(outdim)
    w_prior = Normal(loc, scale).independent(1)
    b_prior = Normal(bias_loc, bias_scale).independent(1)
    priors = {'linear.weight': w_prior, 'linear.bias': b_prior}
    # lift module parameters to random variables sampled from the priors
    lifted_module = pyro.random_module("module", regression_model, priors)
    # sample a regressor (which also samples w and b)
    lifted_reg_model = lifted_module()
    with pyro.iarange("map", N):
        x_data = data[:, :-5].view(N,1,-1)
        y_data = data[:, -5:].view(N, 1, -1)
        # run the regressor forward conditioned on data
        prediction_mean = lifted_reg_model(x_data).squeeze(-1)
        # condition on the observed data
        pyro.sample("obs",
                    Normal(prediction_mean, 0.1*torch.ones(data.size(0), 1, outdim)).independent(1),
                    obs=y_data)

guide function is

softplus = torch.nn.Softplus()

def guide(data):
    # define our variational parameters
    w_loc = torch.randn(outdim, indim)
    # note that we initialize our scales to be pretty narrow
    w_log_sig = torch.tensor(-3.0 * torch.ones(outdim, indim) + 0.05 * torch.randn(outdim, indim))
    b_loc = torch.randn(outdim)
    b_log_sig = torch.tensor(-3.0 * torch.ones(outdim) + 0.05 * torch.randn(outdim))
    # register learnable params in the param store
    mw_param = pyro.param("guide_mean_weight", w_loc)
    sw_param = softplus(pyro.param("guide_log_scale_weight", w_log_sig))
    mb_param = pyro.param("guide_mean_bias", b_loc)
    sb_param = softplus(pyro.param("guide_log_scale_bias", b_log_sig))
    # guide distributions for w and b
    w_dist = Normal(mw_param, sw_param).independent(1)
    b_dist = Normal(mb_param, sb_param).independent(1)
    dists = {'linear.weight': w_dist, 'linear.bias': b_dist}
    # overload the parameters in the module with random samples
    # from the guide distributions
    lifted_module = pyro.random_module("module", regression_model, dists)
    # sample a regressor (which also samples w and b)
    return lifted_module()

both functions are almost same as presented in tutorial, I changed the distribution dimensions to fit this problem. However when running the model using

optim = Adam({"lr": 0.05})
svi = SVI(model, guide, optim, loss=Trace_ELBO())

num_iterations = 1000 if not smoke_test else 2
def main():
    pyro.clear_param_store()
    data = build_linear_dataset(N, 10, 5)
    for j in range(num_iterations):
        # calculate the loss and take a gradient step
        loss = svi.step(data)
        if j % 100 == 0:
            print("[iteration %04d] loss: %.4f" % (j + 1, loss / float(N)))

if __name__ == '__main__':
    main()

I get the following error:

ValueError: at site "module$$$linear.weight", invalid log_prob shape
  Expected [], actual [5]
  Try one of the following fixes:
  - enclose the batched tensor in a with iarange(...): context
  - .independent(...) the distribution being sampled
  - .permute() data dimensions

I am unable to post on forum, can't login or sign up.

Any clues? Any help will be appreciated.

Thanks

question

Source

udion

👍1

Most helpful comment

I am unable to post on forum, can't login or sign up.

Are you trying to login via Github? Feel free to create another issue regarding the discourse login issue you are facing, specifying how you are trying to login. Some other users have been facing login issues with Github, specifically.

In Pyro, any additional batch dimensions (extra dims in log_prob) in a sample site must be specified either within an enclosing iarange context, or be marked as independent via .independent. For more details, refer to the tutorial on tensor shapes. I think the issue is that when you call random_module, it tries to sample from the prior which returns a sample of size 5 x 10, one of which is accounted for by the .independent(1), but there is a stray batch dimension of size 5. I think one way to get this working (though you may need some other fixes additionally) would be to designate both batch dimensions in your prior as independent via:

    w_prior = Normal(loc, scale).independent(2)
    b_prior = Normal(bias_loc, bias_scale).independent(1) # this was wrongly set to 2 earlier.

Let me know if that works.

neerajprad on 25 Sep 2018

👍3

All 10 comments

I am unable to post on forum, can't login or sign up.

    w_prior = Normal(loc, scale).independent(2)
    b_prior = Normal(bias_loc, bias_scale).independent(1) # this was wrongly set to 2 earlier.

Let me know if that works.

neerajprad on 25 Sep 2018

👍3

@neerajprad

yes, I was trying to login through GitHub. I also tried creating an account using email, it shows confirmation email sent, but I do not receive any confirmation email (not even in spams).

I had tried .independent(2) before creating an issue, nevertheless, I tried again. I am not sure if I understand it properly, but keeping .independent(1) made sense to me because the weights and biased sampled from the prior were of an appropriate shape (i.e parameter shapes of the linear layer in the model). I am kind of stuck with this, can you point to some resources which are demo-ing use of Pyro to sample from a high dimensional distribution like my example, (the tutorial uses it to sample scalar weights and its not clear what to do in case of higher dimensional matrix)

Thanks

udion on 26 Sep 2018

@neerajprad @jpchen

if it helps, here is a link to a notebook https://github.com/udion/DeepUncer/blob/master/bayes_net1.ipynb

edit:
@neerajprad
Please note the last cell, looking at the print statement, it seems like the pyro.sample(obs=....) this line in model function is creating the problem, It executes fine before that

udion on 26 Sep 2018

@neerajprad

The linear layer has weight(5X10) and bias(5) so that is what I have passed as prior, sampling seems to be going fine, because prediction_mean is of right shape (as can be seen in output of last cell)

udion on 26 Sep 2018

sampling seems to be going fine,

Sampling will be unaffected whether you have a sample site with batch dims designated as independent (or inside an iarange). What changes is the log_prob shape, and how these are broadcasted internally to support various use cases.

There are a few things happening there, so here is the updated set of changes that you will need to make:

Model side changes:

    w_prior = Normal(loc, scale).independent(2)
    b_prior = Normal(bias_loc, bias_scale).independent(1)  # wrongly written as 2 earlier, this only has 1 batch dim.
    ...
        pyro.sample("obs",
                    Normal(prediction_mean, 0.1 * torch.ones(data.size(0), 1, outdim)).independent(2), # note last 2 dims are independent.
                    obs=y_data)

You also don't need to .squeeze(-1) the prediction_mean since it doesn't have a trailing singleton dim.

Likewise in the guide:

    w_dist = Normal(mw_param, sw_param).independent(2)
    b_dist = Normal(mb_param, sb_param).independent(1)

All of this may seem somewhat arbitrary, but will make sense if you print out the batch shapes and ensure that they are being accounted for, as mentioned above.

neerajprad on 26 Sep 2018

❤1

@neerajprad
Thank you so much for your efforts!

I wanted to extend it a little more myself, so I am putting priors on convolution weights and bias too. I am not sure if I understand what is actually happening. So I changed model a little:
https://github.com/udion/DeepUncer/blob/master/bayes_net1.ipynb

This is a very noob way to look at things...
but after printing batch_shapes and event_shape this is what I inferred, in order to make things work, .independet() should be used such that batch_shape is [] for both the priors on weights as well as bias.

I'm almost sure that this is wrong. Let me know if you've a better explanation.

Thanks again!

udion on 27 Sep 2018

@udion that is correct: all .batch_shapes in your random module's prior should have the same batch shape, and in your case that batch shape should be empty.

fritzo on 27 Sep 2018

❤1

@fritzo excuse me for the basic questions,

is it obvious that in my case .batch_shape should be []?

How to decide what my .batch_shape should look like?

Thanks

udion on 27 Sep 2018

In Pyro .batch_shape is used specifically for handling of independent events via iarange. When you read a pyro model, you should be able to determine the .batch_shape of any sample statement simply by looking at what iarange contexts enclose that sample statement: each context allows one dimension to be batched. In your case the line

lifted_reg_model = lifted_module()

creates pyro.sample statements under the hood, and you can see that they should all have empty .batch_shape since that lifted_module() call is not inside any iarange contexts.

If all your module parameters had a common batch shape on the left (e.g. a batch of similar networks, say one per city), you could decrease the .independent() shifts in each of your priors and enclose the lifted_module() call in an iarange context

with pyro.iarange('cities', len(cities)):
    per_city_module = per_city_lifted_module()

Note that you're free to use MultivariateNormal(...) priors rather than "diagonal" Normal(...).independent(...) priors to model correlations. However you probably wouldn't model correlations across batch items (e.g. cities), hence iarange might be appropriate.

fritzo on 27 Sep 2018

❤1

@neerajprad @fritzo @jpchen

Thanks a ton!!

udion on 27 Sep 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Test failure of enum-parallel gradients after PyTorch #5776

neerajprad · 5Comments

[FR] Make it easier to add an ad hoc log prob term?

fritzo · 5Comments

JIT trace does not work correctly with pyro.plate

fehiepsi · 4Comments

MCMC with parallel chains get stuck in jupyter notebook on Ubuntu

neerajprad · 4Comments

pytorch broadcasting

martinjankowiak · 3Comments