Pytorch_geometric: How to redo VGAE by using Pytorch_geometric

Created on 26 Mar 2019 · 8Comments · Source: rusty1s/pytorch_geometric

❓ Questions & Help

Hi, I tried to redo the VGAE in "Variational Graph Auto-Encoders"https://arxiv.org/abs/1611.07308
I found the class VGAE in the code torch_geometric/nn/models/autoencoder.py
and I followed the example tutorial of building GAE with pytorch_geometric, and the code is like following:

import os.path as osp
import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
import torch_geometric.transforms as T
from torch_geometric.nn import GCNConv, GAE,VGAE
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

dataset = 'Cora'
path = osp.join(osp.dirname(osp.realpath(__file__)), '..', 'data', dataset)
dataset = Planetoid(path, dataset, T.NormalizeFeatures())
data = dataset[0]

the dataset has been normalization

label=data.y.data.numpy() #to get the label of each node,then draw them

class Encoder(torch.nn.Module):
def __init__(self, in_channels, out_channels):
super(Encoder, self).__init__()
self.out_channels = out_channels
self.conv1 = GCNConv(in_channels,2 * out_channels)
self.conv2_mu = GCNConv(2 * out_channels, out_channels)
self.conv2_sig = GCNConv(2 * out_channels, out_channels)

def forward(self, x, edge_index):
    x = F.relu(self.conv1(x, edge_index))
    x_mu = self.conv2_mu(x,edge_index)
    x_sig = self.conv2_sig(x,edge_index)

    return x_mu,x_sig

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = VGAE(Encoder(dataset.num_features, out_channels=16)).to(device)
data.train_mask = data.val_mask = data.test_mask = data.y = None
data = model.split_edges(data)
x, edge_index = data.x.to(device), data.edge_index.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train():
model.train()
optimizer.zero_grad()
mu,logvar= model.encode(x, edge_index)
re_graph = model.sample(mu,logvar)
loss = model.recon_loss(mu, logvar,
data.train_pos_edge_index,
data.train_neg_adj_mask) + model.kl_loss(mu,logvar)

loss.backward()
optimizer.step()
return re_graph

def test(pos_edge_index, neg_edge_index):
model.eval()
with torch.no_grad():
mu,logvar = model.encode(x, edge_index)
re_graph = model.sample(mu,logvar)
return model.test(re_graph, pos_edge_index, neg_edge_index)

for epoch in range(1, 201):
if epoch == 200:
latent=train()
else:
train()
auc, ap = test(data.val_pos_edge_index, data.val_neg_edge_index)
print('Epoch: {:03d}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, auc, ap))

auc, ap = test(data.test_pos_edge_index, data.test_neg_edge_index)
print('Test AUC: {:.4f}, Test AP: {:.4f}'.format(auc, ap))

but the final result couldn't reach the results in the paper.(Actually I just got AUC:0.789 and AP 0.786)
and I puzzled is there any small mistakes in the loss function??
for the KL loss, I think it may suppose to be:
def kl_loss(self, mu, logvar):
return -0.5/2708 * torch.mean(torch.sum(1 + 2 * logvar - mu.pow(2) - logvar.exp().pow(2),1)),
which the 2708 is the number of nodes in Cora
but it didn't work either.
:(
So can you use the class VGAE to redo the work and tell me?
I think may be there are some questions in auto encoder.py, not the main function.
Thank you very much!!
Hope for your reply!

bug

Source

AllenWu18

Most helpful comment

Hey sorry about that. It was a hasty PR on my end!

jlevy44 on 26 Mar 2019

🚀2 👀1 😕1

All 8 comments

We are currently working on VGAE and ARGA. I am actually not quite sure why it is not working at the moment. I will keep you up-to-date.

rusty1s on 26 Mar 2019

OK Thank you for your quick reply.
Hope to see the VGAE soon.
If you finish redo the VGAE, can you let me know about it??
Thank you very much!
:)

| |
allenwu18
邮箱：[email protected]
|

Signature is customized by Netease Mail Master

On 03/27/2019 01:25, Matthias Fey wrote:

We are currently working on VGAE and ARGA. I am actually not quite sure why it is not working at the moment. I will keep you up-to-date.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

AllenWu18 on 26 Mar 2019

👍1

rusty1s on 26 Mar 2019

Hey sorry about that. It was a hasty PR on my end!

jlevy44 on 26 Mar 2019

🚀2 👀1 😕1

I'm not quite sure if this is actually a bug. I checked the implementation in ga-pytorch and its KLD loss computation (similar to the official one):

KLD = -0.5 / n_nodes * torch.mean(torch.sum(
        1 + 2 * logvar - mu.pow(2) - logvar.exp().pow(2), 1))

I'm not quite sure why the KLD is divided by n_nodes. The mean already averages the KLD in the node dimension! With 0.5 / n_nodes being 0.00018 on Cora, the KLD has zero regularization effect. When removing n_nodes, training stops working. Any idea?

rusty1s on 26 Mar 2019

I checked the author's and other ones' Pytorch codes, and I guess there may be these reasons：
1_ in PyG we do not use the dropout in the first GCN
2_ I think when get the latent embedding by add(mu,logvar),we should do a innerproduction to reconstruct the graph, and then use the re_graph to compute the loss, not the latent embedding directly
3_ maybe some errors in our loss function:
3.1 I found the other code use F.binary_cross_entropy_with_logits() to compute the reconstruction loss, but I just don't know how to implement this function in our recon_loss
3.2 there may be some error in our KL_loss. When I removing n_nodes , the ROC and AP both rise, but still lower than the paper results, even lower than GAE.
I hope the VGAE can be redo in the PyG, for PyG is really fast than other codes :)
Thank you! I hope to get your reply and good news soon.

AllenWu18 on 27 Mar 2019

Oh I think the second point is wrong. should use the latent embedding to compute the loss.
My mistake

AllenWu18 on 27 Mar 2019

Thanks for your detailed comment. I updated both GAE and VGAE and results should match or exceed those reported in the paper now. Especially increasing the hidden size yields significant improvements. Surprisingly, GAE is far better in performance as VGAE (see here for a possible explanation).

In addition, I slightly modified the official code. I use negative sampling for the reconstruction loss (instead of taking all negative edges into account) for faster computation. This additionally enables GPU computation on PubMed. I further choose to not rescale the KL loss by 1 / num_nodes. IMO, this is a bug in the official code.

If you have any questions/concerns/suggestions, feel free to reopen this issue.

rusty1s on 27 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings