Pytorch_geometric: How to reconstruct edges from autoencoder?

Created on 7 May 2019  路  5Comments  路  Source: rusty1s/pytorch_geometric

Hello,

I was working with the decoder of Autoencoder example, and wonder how to reconstruct a graph with decoder output.
If I understood right, the decoder computes edge probabilities, and I got them like below.

tensor([0.6745, 0.6745, 0.6551, 0.6551, 0.6286, 0.6286, 0.6391, 0.6391, 0.5424,
        0.5424, 0.5542, 0.5542, 0.5694, 0.5694, 0.5375, 0.5375, 0.4936, 0.4936,
        0.6254, 0.6254, 0.5815, 0.5815, 0.6029, 0.6029, 0.5186, 0.5186, 0.4716,
        0.4716, 0.5158, 0.5158, 0.5058, 0.5058, 0.5641, 0.5641, 0.4994, 0.4994,
        0.5030, 0.5030, 0.6071, 0.6071, 0.6064, 0.6064, 0.5250, 0.5250, 0.5184,
        0.5184, 0.5531, 0.5531, 0.5415, 0.5415, 0.5445, 0.5445, 0.5138, 0.5138,
        0.5075, 0.5075, 0.4968, 0.4968, 0.5199, 0.5199, 0.4946, 0.4946, 0.5524,
        0.5524, 0.5587, 0.5587, 0.5585, 0.5585, 0.5088, 0.5088, 0.4806, 0.4806,
        0.5119, 0.5119, 0.5122, 0.5122, 0.5117, 0.5117, 0.5116, 0.5116, 0.5283,
        0.5283, 0.5211, 0.5211, 0.5121, 0.5121, 0.5273, 0.5273, 0.5119, 0.5119,
        0.5117, 0.5117, 0.4990, 0.4990, 0.4986, 0.4986, 0.5036, 0.5036, 0.5067,
        0.5067, 0.4918, 0.4918, 0.4983, 0.4983, 0.5210, 0.5210, 0.5012, 0.5012,
        0.5017, 0.5017, 0.5477, 0.5477, 0.5475, 0.5475, 0.4924, 0.4924, 0.5084,
        0.5084, 0.5098, 0.5098, 0.5256, 0.5256, 0.5719, 0.5719, 0.5012, 0.5012,
        0.5010, 0.5010, 0.5020, 0.5020, 0.5064, 0.5064, 0.5063, 0.5063, 0.5221,
        0.5221, 0.6704, 0.6704, 0.5566, 0.5566, 0.6233, 0.6233, 0.5059, 0.5059,
        0.5069, 0.5069, 0.5085, 0.5085, 0.5048, 0.5048, 0.5051, 0.5051, 0.5887,
        0.5887, 0.6524, 0.6524, 0.5295, 0.5295, 0.5474, 0.5474, 0.5241, 0.5241,
        0.5059, 0.5059, 0.5568, 0.5568, 0.5497, 0.5497, 0.5727, 0.5727, 0.5397,
        0.5397, 0.5805, 0.5805, 0.5577, 0.5577, 0.5569, 0.5569, 0.6282, 0.6282,
        0.6124, 0.6124, 0.6134, 0.6134, 0.5117, 0.5117, 0.4991, 0.4991, 0.5032,
        0.5032, 0.6867, 0.6867, 0.6140, 0.6140, 0.6222, 0.6222, 0.6541, 0.6541,
        0.6641, 0.6641, 0.7468, 0.7468, 0.7686, 0.7686, 0.6775, 0.6775, 0.7056,
        0.7056, 0.7104, 0.7104, 0.5137, 0.5137], grad_fn=<SigmoidBackward>)

of which length is 106.
but my original data that I encoded has 212 edges and 117 nodes.

Data(edge_index=[2, 212], x=[117, 1])

How do I know which edge that each probability represents??

All 5 comments

Hi, you should post a small but complete example in order to help you more precisely. In general, the decoder needs to implement a forward and an optional forward_all method like here. The forward call outputs the probabilities of an edge given by edge_index, where the forward_all call outputs a dense probability matrix.

Thank you for reply, @rusty1s
I just loaded my custom dataset of 476 json files to the autoencoder example code.
The loaded data looks like this:

{"x": [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92], [93], [94], [95], [96], [97], [98], [99], [100], [101], [102], [103], [104], [105], [106], [107], [108], [109], [110], [111], [112], [113], [114], [115], [116]], "edge_index": [[2, 3, 3, 4, 3, 5, 4, 8, 4, 10, 5, 6, 5, 16, 6, 22, 6, 114, 8, 9, 9, 17, 9, 18, 10, 20, 10, 61, 10, 21, 11, 12, 11, 13, 12, 66, 12, 78, 13, 14, 13, 15, 16, 24, 16, 25, 17, 26, 17, 28, 18, 19, 19, 29, 19, 30, 20, 93, 21, 33, 21, 35, 22, 23, 23, 36, 23, 37, 24, 38, 24, 40, 25, 43, 25, 41, 25, 44, 25, 45, 26, 27, 27, 46, 27, 47, 28, 51, 29, 52, 29, 53, 30, 54, 30, 92, 32, 58, 32, 64, 33, 34, 34, 68, 34, 79, 35, 76, 35, 80, 36, 11, 37, 11, 38, 39, 39, 82, 39, 87, 40, 115, 40, 83, 47, 48, 48, 49, 48, 50, 49, 25, 50, 81, 54, 55, 55, 94, 55, 98, 55, 107, 56, 57, 56, 60, 58, 59, 59, 62, 59, 63, 61, 104, 61, 106, 64, 65, 65, 102, 66, 70, 66, 74, 70, 71, 71, 72, 71, 103, 79, 86, 79, 88, 83, 85, 83, 84, 88, 89, 89, 90, 89, 91, 92, 56, 93, 31, 93, 32, 94, 96, 94, 100, 96, 97, 104, 105, 105, 107, 105, 109, 106, 111, 106, 113, 109, 110, 111, 112, 114, 61], [3, 2, 4, 3, 5, 3, 8, 4, 10, 4, 6, 5, 16, 5, 22, 6, 114, 6, 9, 8, 17, 9, 18, 9, 20, 10, 61, 10, 21, 10, 12, 11, 13, 11, 66, 12, 78, 12, 14, 13, 15, 13, 24, 16, 25, 16, 26, 17, 28, 17, 19, 18, 29, 19, 30, 19, 93, 20, 33, 21, 35, 21, 23, 22, 36, 23, 37, 23, 38, 24, 40, 24, 43, 25, 41, 25, 44, 25, 45, 25, 27, 26, 46, 27, 47, 27, 51, 28, 52, 29, 53, 29, 54, 30, 92, 30, 58, 32, 64, 32, 34, 33, 68, 34, 79, 34, 76, 35, 80, 35, 11, 36, 11, 37, 39, 38, 82, 39, 87, 39, 115, 40, 83, 40, 48, 47, 49, 48, 50, 48, 25, 49, 81, 50, 55, 54, 94, 55, 98, 55, 107, 55, 57, 56, 60, 56, 59, 58, 62, 59, 63, 59, 104, 61, 106, 61, 65, 64, 102, 65, 70, 66, 74, 66, 71, 70, 72, 71, 103, 71, 86, 79, 88, 79, 85, 83, 84, 83, 89, 88, 90, 89, 91, 89, 56, 92, 31, 93, 32, 93, 96, 94, 100, 94, 97, 96, 105, 104, 107, 105, 109, 105, 111, 106, 113, 106, 110, 109, 112, 111, 61, 114]]}
import sys
import os
from pathlib import Path
import json
import torch
import argparse
from torch_geometric.data import Data,InMemoryDataset,DataLoader
import torch.nn.functional as F
import torch_geometric.transforms as T
from torch_geometric.nn import GCNConv, GAE, VGAE


def datalist(pathTorch):
    data_list = []
    filename_list = []
    for entry in sorted(os.scandir(pathTorch), key=lambda x: (x.is_dir(), x.name)):
        if entry.name.split('.')[0] .isdigit():
            with open(entry,'rt') as jsonfile:
                jsons = json.load(jsonfile)
                x = torch.tensor(jsons['x'], dtype=torch.float)
                edge_index = torch.tensor(jsons['edge_index'],dtype=torch.long)
                data = Data(x=x, edge_index=edge_index)# print(entry.name.split('.')[0],data)
                data_list.append(data)
                filename_list.append(entry.name)
    return data_list,filename_list

path = './datasetTorch/basic/'
data_list,filename_list = datalist(path)
dataSel = data_list[0]


loader = DataLoader(data_list,batch_size = len(data_list),shuffle=False)
for batch in loader:
    # print(batch.num_features)
    # print(batch.num_graphs)
    pass


data = Data(x=dataSel.x, edge_index=dataSel.edge_index)

parser = argparse.ArgumentParser()
parser.add_argument('--model', type=str, default='GAE')
args = parser.parse_args()
assert args.model in ['GAE', 'VGAE']
kwargs = {'GAE': GAE, 'VGAE': VGAE}


class Encoder(torch.nn.Module):
    # def __init__(self, in_channels, out_channels):
    def __init__(self, in_channels, out_channels):
        super(Encoder, self).__init__()
        self.conv1 = GCNConv(in_channels, 2 * out_channels, cached=True)
        if args.model in ['GAE']:
            self.conv2 = GCNConv(2 * out_channels, out_channels, cached=True)
        elif args.model in ['VGAE']:
            self.conv_mu = GCNConv(2 * out_channels, out_channels, cached=True)
            self.conv_logvar = GCNConv(
                2 * out_channels, out_channels, cached=True)

    def forward(self, x, edge_index):
        x = F.relu(self.conv1(x, edge_index))
        if args.model in ['GAE']:
            return self.conv2(x, edge_index)
        elif args.model in ['VGAE']:
            return self.conv_mu(x, edge_index), self.conv_logvar(x, edge_index)

# print(data.edge_index)

channels = 16
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = kwargs[args.model](Encoder(batch.num_features, channels)).to(device)
data.train_mask = data.val_mask = data.test_mask = data.y = None
data = model.split_edges(data)
x, edge_index = data.x.to(device), data.edge_index.to(device)###
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train():
    model.train()
    optimizer.zero_grad()
    z = model.encode(x, edge_index)
    loss = model.recon_loss(z, data.train_pos_edge_index)
    if args.model in ['VGAE']:
        loss = loss + 0.001 * model.kl_loss()
    loss.backward()
    optimizer.step()


def test(pos_edge_index, neg_edge_index):
    model.eval()
    with torch.no_grad():
        z = model.encode(x, edge_index)
    return model.test(z, pos_edge_index, neg_edge_index)


for epoch in range(1, 101):
    train()
    auc, ap = test(data.val_pos_edge_index, data.val_neg_edge_index)
    # print('Epoch: {:03d}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, auc, ap))


z = model.encode(x,edge_index)
value = model.decode(z, edge_index)
value_list= value.tolist()

and the len(value.tolist()) was 106 while the data was Data(edge_index=[2, 212], x=[117, 1])

Ah I see, so you are basically just using the autoencoder example with custom data. Actually, the split_edges function modifies the edge_index (and removes contrary edges), so the shapes are look okay to me (212//2=106).

In addition, I highly suggest using the latest autoencoder example (with PyG from master). The old autoencoder had a bug, where test and val edges where not removed in the encoder.

Thank you so much! will try with the latest one!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

weihua916 picture weihua916  路  3Comments

douglasrizzo picture douglasrizzo  路  4Comments

SaschaStenger picture SaschaStenger  路  4Comments

Raverss picture Raverss  路  3Comments

yuanx749 picture yuanx749  路  4Comments