Pytorch_geometric: GEDDataset num_classes request leads to AttributeError

Created on 24 Apr 2020 · 7Comments · Source: rusty1s/pytorch_geometric

🐛 Bug

When loaded a GEDDataset and querying the num_classes as descriped in the tutorial (dataset.num_classes) an AttributeError: 'NoneType' object has no attribute 'dim'-error is thrown.

To Reproduce

Steps to reproduce the behavior:

Load the IMDBMulti dataset with GEDDataset
Ask the num_classes.

Code for reproducing error:

from torch_geometric.datasets import GEDDataset
dataset = GEDDataset(root='/datasets/IMDBMulti', name="IMDBMulti")
print(dataset.num_classes)

Expected behavior

I would have expected a return of the number of classes: 3.

Environment

OS: Ubuntu 18.04 in Docker
Python version: 3.6.5
PyTorch version: 1.4.0
CUDA/cuDNN version: 10.1.243 / 7.6.5.32
GCC version: 7.5.0
Any other relevant information: Basis of our Dockerfile is from your provided dockerfile.

Additional context

I noticed that in the examples, the ground truth is saved in the key 'y'.

data = dataset[0]
data.keys()

would return something like ['edge_index', 'y']

The GEDDataset returns
['edge_index', 'i'].

Also tested with the ALKANE dataset, same problem.

Source

TNO-Knowledge-Based-Systems

Most helpful comment

The i refers to the index of the graph in the dataset, so that you can access the GED between two pairs of graphs via dataset.ged[data1.i, data2.i]. I think we should mention this in the documentation.

rusty1s on 29 Apr 2020

👍2

All 7 comments

The GEDDataset do not have any node labels (y=None), so a call to num_classes is expected to crash. Instead, the GEDDataset contains the GED distance between all pairs of graphs in the dataset.

rusty1s on 26 Apr 2020

👍1

Thank you for your reply.
What is stored in 'i', when it is not the node label?

The IMDB-Binary, which is availble in TUDataset, seems to have node labels. In that dataset it is possible to query the num_classes. Why is this different for the IMDB-Multi?

TNO-Knowledge-Based-Systems on 29 Apr 2020

rusty1s on 29 Apr 2020

👍2

Is there a way we can infer the graph labels from ged measures?

andac-demir on 26 Aug 2020

There are no graph labels available for this dataset. This dataset should be only used in case you are working on accelerating GED computation with machine learning.

rusty1s on 26 Aug 2020

👍1

SimGNN, paper referred in dataset , seems to have labels for Aids dataset.

Aids Dataset : Each node is labeled
with one of 29 types, as illustrated in Fig. 4a.

Linux Data : We randomly select 1000
graphs of equal or less than 10 nodes each. The nodes are unla-
beled.

IMDB Data : To test the scal-
ability and efficiency of our proposed approach, we use the full
dataset without any selection. The nodes are unlabeled.

How to access aids dataset labels?