Pytorch_geometric: GEDDataset num_classes request leads to AttributeError

Created on 24 Apr 2020  路  7Comments  路  Source: rusty1s/pytorch_geometric

馃悰 Bug

When loaded a GEDDataset and querying the num_classes as descriped in the tutorial (dataset.num_classes) an AttributeError: 'NoneType' object has no attribute 'dim'-error is thrown.

To Reproduce

Steps to reproduce the behavior:

  1. Load the IMDBMulti dataset with GEDDataset
  2. Ask the num_classes.

Code for reproducing error:

from torch_geometric.datasets import GEDDataset
dataset = GEDDataset(root='/datasets/IMDBMulti', name="IMDBMulti")
print(dataset.num_classes)

Expected behavior

I would have expected a return of the number of classes: 3.

Environment

  • OS: Ubuntu 18.04 in Docker
  • Python version: 3.6.5
  • PyTorch version: 1.4.0
  • CUDA/cuDNN version: 10.1.243 / 7.6.5.32
  • GCC version: 7.5.0
  • Any other relevant information: Basis of our Dockerfile is from your provided dockerfile.

Additional context

I noticed that in the examples, the ground truth is saved in the key 'y'.

data = dataset[0]
data.keys()

would return something like ['edge_index', 'y']

The GEDDataset returns
['edge_index', 'i'].

Also tested with the ALKANE dataset, same problem.

Most helpful comment

The i refers to the index of the graph in the dataset, so that you can access the GED between two pairs of graphs via dataset.ged[data1.i, data2.i]. I think we should mention this in the documentation.

All 7 comments

The GEDDataset do not have any node labels (y=None), so a call to num_classes is expected to crash. Instead, the GEDDataset contains the GED distance between all pairs of graphs in the dataset.

Thank you for your reply.
What is stored in 'i', when it is not the node label?

The IMDB-Binary, which is availble in TUDataset, seems to have node labels. In that dataset it is possible to query the num_classes. Why is this different for the IMDB-Multi?

The i refers to the index of the graph in the dataset, so that you can access the GED between two pairs of graphs via dataset.ged[data1.i, data2.i]. I think we should mention this in the documentation.

Is there a way we can infer the graph labels from ged measures?

There are no graph labels available for this dataset. This dataset should be only used in case you are working on accelerating GED computation with machine learning.

SimGNN, paper referred in dataset , seems to have labels for Aids dataset.

Aids Dataset : Each node is labeled
with one of 29 types, as illustrated in Fig. 4a.

Linux Data : We randomly select 1000
graphs of equal or less than 10 nodes each. The nodes are unla-
beled.

IMDB Data : To test the scal-
ability and efficiency of our proposed approach, we use the full
dataset without any selection. The nodes are unlabeled.

How to access aids dataset labels?

You can access them via data.x.

Was this page helpful?
0 / 5 - 0 ratings