Inspired by the pyro tutorial on Bayesian regression: http://pyro.ai/examples/bayesian_regression.html
I wanted to try something on more complex data. I had this problem in mind
Input(X) will be 10-dimensional features
Output(y) will be 5-dimensional features (obtained linearly from input, y=XW+b+error)
I am using the following function to generate dataset (and this seems to be working fine)
N = 500 # size of toy data
indim = 10
outdim = 5
def build_linear_dataset(N, indim, outdim, noise_std=0.01):
X = np.random.rand(N, indim)
# w = 3
w = 3 * np.ones((indim, outdim))
# b = 1
b = 1*np.ones((N, outdim))
y = np.matmul(X, w)
print(y.shape)
y = y + b + np.random.normal(0, noise_std, size=(N, outdim))
print(y.shape)
y = y.reshape(N, outdim)
X, y = torch.tensor(X).type(torch.Tensor), torch.tensor(y).type(torch.Tensor)
data = torch.cat((X, y), 1)
return data
To do the regression I am using the following model (don't mind the conv1d, I want to use conv1d for exerimentation, however prior will only be on linear layer)
class RegressionModel(nn.Module):
def __init__(self, indim, outdim):
# p = number of features
super(RegressionModel, self).__init__()
self.conv1 = nn.Conv1d(1, 1, kernel_size=3, stride=1, padding=1)
self.linear = nn.Linear(indim, outdim)
def forward(self, x):
y = self.conv1(x)
y = self.linear(y)
print('fwd:', y.size())
return y
regression_model = RegressionModel(10, 5)
Model function is
def model(data):
# Create unit normal priors over the parameters
loc, scale = torch.zeros(outdim, indim), 10 * torch.ones(outdim, indim)
bias_loc, bias_scale = torch.zeros(outdim), 10 * torch.ones(outdim)
w_prior = Normal(loc, scale).independent(1)
b_prior = Normal(bias_loc, bias_scale).independent(1)
priors = {'linear.weight': w_prior, 'linear.bias': b_prior}
# lift module parameters to random variables sampled from the priors
lifted_module = pyro.random_module("module", regression_model, priors)
# sample a regressor (which also samples w and b)
lifted_reg_model = lifted_module()
with pyro.iarange("map", N):
x_data = data[:, :-5].view(N,1,-1)
y_data = data[:, -5:].view(N, 1, -1)
# run the regressor forward conditioned on data
prediction_mean = lifted_reg_model(x_data).squeeze(-1)
# condition on the observed data
pyro.sample("obs",
Normal(prediction_mean, 0.1*torch.ones(data.size(0), 1, outdim)).independent(1),
obs=y_data)
guide function is
softplus = torch.nn.Softplus()
def guide(data):
# define our variational parameters
w_loc = torch.randn(outdim, indim)
# note that we initialize our scales to be pretty narrow
w_log_sig = torch.tensor(-3.0 * torch.ones(outdim, indim) + 0.05 * torch.randn(outdim, indim))
b_loc = torch.randn(outdim)
b_log_sig = torch.tensor(-3.0 * torch.ones(outdim) + 0.05 * torch.randn(outdim))
# register learnable params in the param store
mw_param = pyro.param("guide_mean_weight", w_loc)
sw_param = softplus(pyro.param("guide_log_scale_weight", w_log_sig))
mb_param = pyro.param("guide_mean_bias", b_loc)
sb_param = softplus(pyro.param("guide_log_scale_bias", b_log_sig))
# guide distributions for w and b
w_dist = Normal(mw_param, sw_param).independent(1)
b_dist = Normal(mb_param, sb_param).independent(1)
dists = {'linear.weight': w_dist, 'linear.bias': b_dist}
# overload the parameters in the module with random samples
# from the guide distributions
lifted_module = pyro.random_module("module", regression_model, dists)
# sample a regressor (which also samples w and b)
return lifted_module()
both functions are almost same as presented in tutorial, I changed the distribution dimensions to fit this problem. However when running the model using
optim = Adam({"lr": 0.05})
svi = SVI(model, guide, optim, loss=Trace_ELBO())
num_iterations = 1000 if not smoke_test else 2
def main():
pyro.clear_param_store()
data = build_linear_dataset(N, 10, 5)
for j in range(num_iterations):
# calculate the loss and take a gradient step
loss = svi.step(data)
if j % 100 == 0:
print("[iteration %04d] loss: %.4f" % (j + 1, loss / float(N)))
if __name__ == '__main__':
main()
I get the following error:
ValueError: at site "module$$$linear.weight", invalid log_prob shape
Expected [], actual [5]
Try one of the following fixes:
- enclose the batched tensor in a with iarange(...): context
- .independent(...) the distribution being sampled
- .permute() data dimensions
I am unable to post on forum, can't login or sign up.
Any clues? Any help will be appreciated.
Thanks
I am unable to post on forum, can't login or sign up.
Are you trying to login via Github? Feel free to create another issue regarding the discourse login issue you are facing, specifying how you are trying to login. Some other users have been facing login issues with Github, specifically.
In Pyro, any additional batch dimensions (extra dims in log_prob) in a sample site must be specified either within an enclosing iarange context, or be marked as independent via .independent. For more details, refer to the tutorial on tensor shapes. I think the issue is that when you call random_module, it tries to sample from the prior which returns a sample of size 5 x 10, one of which is accounted for by the .independent(1), but there is a stray batch dimension of size 5. I think one way to get this working (though you may need some other fixes additionally) would be to designate both batch dimensions in your prior as independent via:
w_prior = Normal(loc, scale).independent(2)
b_prior = Normal(bias_loc, bias_scale).independent(1) # this was wrongly set to 2 earlier.
Let me know if that works.
@neerajprad
yes, I was trying to login through GitHub. I also tried creating an account using email, it shows confirmation email sent, but I do not receive any confirmation email (not even in spams).
I had tried .independent(2) before creating an issue, nevertheless, I tried again. I am not sure if I understand it properly, but keeping .independent(1) made sense to me because the weights and biased sampled from the prior were of an appropriate shape (i.e parameter shapes of the linear layer in the model). I am kind of stuck with this, can you point to some resources which are demo-ing use of Pyro to sample from a high dimensional distribution like my example, (the tutorial uses it to sample scalar weights and its not clear what to do in case of higher dimensional matrix)
Thanks
@neerajprad @jpchen
if it helps, here is a link to a notebook https://github.com/udion/DeepUncer/blob/master/bayes_net1.ipynb
edit:
@neerajprad
Please note the last cell, looking at the print statement, it seems like the pyro.sample(obs=....) this line in model function is creating the problem, It executes fine before that
@neerajprad
The linear layer has weight(5X10) and bias(5) so that is what I have passed as prior, sampling seems to be going fine, because prediction_mean is of right shape (as can be seen in output of last cell)
sampling seems to be going fine,
Sampling will be unaffected whether you have a sample site with batch dims designated as independent (or inside an iarange). What changes is the log_prob shape, and how these are broadcasted internally to support various use cases.
There are a few things happening there, so here is the updated set of changes that you will need to make:
Model side changes:
w_prior = Normal(loc, scale).independent(2)
b_prior = Normal(bias_loc, bias_scale).independent(1) # wrongly written as 2 earlier, this only has 1 batch dim.
...
pyro.sample("obs",
Normal(prediction_mean, 0.1 * torch.ones(data.size(0), 1, outdim)).independent(2), # note last 2 dims are independent.
obs=y_data)
You also don't need to .squeeze(-1) the prediction_mean since it doesn't have a trailing singleton dim.
Likewise in the guide:
w_dist = Normal(mw_param, sw_param).independent(2)
b_dist = Normal(mb_param, sb_param).independent(1)
All of this may seem somewhat arbitrary, but will make sense if you print out the batch shapes and ensure that they are being accounted for, as mentioned above.
@neerajprad
Thank you so much for your efforts!
I wanted to extend it a little more myself, so I am putting priors on convolution weights and bias too. I am not sure if I understand what is actually happening. So I changed model a little:
https://github.com/udion/DeepUncer/blob/master/bayes_net1.ipynb
This is a very noob way to look at things...
but after printing batch_shapes and event_shape this is what I inferred, in order to make things work, .independet() should be used such that batch_shape is [] for both the priors on weights as well as bias.
I'm almost sure that this is wrong. Let me know if you've a better explanation.
Thanks again!
@udion that is correct: all .batch_shapes in your random module's prior should have the same batch shape, and in your case that batch shape should be empty.
@fritzo excuse me for the basic questions,
is it obvious that in my case .batch_shape should be []?
How to decide what my .batch_shape should look like?
Thanks
In Pyro .batch_shape is used specifically for handling of independent events via iarange. When you read a pyro model, you should be able to determine the .batch_shape of any sample statement simply by looking at what iarange contexts enclose that sample statement: each context allows one dimension to be batched. In your case the line
lifted_reg_model = lifted_module()
creates pyro.sample statements under the hood, and you can see that they should all have empty .batch_shape since that lifted_module() call is not inside any iarange contexts.
If all your module parameters had a common batch shape on the left (e.g. a batch of similar networks, say one per city), you could decrease the .independent() shifts in each of your priors and enclose the lifted_module() call in an iarange context
with pyro.iarange('cities', len(cities)):
per_city_module = per_city_lifted_module()
Note that you're free to use MultivariateNormal(...) priors rather than "diagonal" Normal(...).independent(...) priors to model correlations. However you probably wouldn't model correlations across batch items (e.g. cities), hence iarange might be appropriate.
@neerajprad @fritzo @jpchen
Thanks a ton!!
Most helpful comment
Are you trying to login via Github? Feel free to create another issue regarding the discourse login issue you are facing, specifying how you are trying to login. Some other users have been facing login issues with Github, specifically.
In Pyro, any additional batch dimensions (extra dims in
log_prob) in a sample site must be specified either within an enclosing iarange context, or be marked as independent via.independent. For more details, refer to the tutorial on tensor shapes. I think the issue is that when you callrandom_module, it tries to sample from the prior which returns a sample of size5 x 10, one of which is accounted for by the.independent(1), but there is a stray batch dimension of size5. I think one way to get this working (though you may need some other fixes additionally) would be to designate both batch dimensions in your prior as independent via:Let me know if that works.