When loaded a GEDDataset and querying the num_classes as descriped in the tutorial (dataset.num_classes) an AttributeError: 'NoneType' object has no attribute 'dim'-error is thrown.
Steps to reproduce the behavior:
Code for reproducing error:
from torch_geometric.datasets import GEDDataset
dataset = GEDDataset(root='/datasets/IMDBMulti', name="IMDBMulti")
print(dataset.num_classes)
I would have expected a return of the number of classes: 3.
I noticed that in the examples, the ground truth is saved in the key 'y'.
data = dataset[0]
data.keys()
would return something like ['edge_index', 'y']
The GEDDataset returns
['edge_index', 'i'].
Also tested with the ALKANE dataset, same problem.
The GEDDataset do not have any node labels (y=None), so a call to num_classes is expected to crash. Instead, the GEDDataset contains the GED distance between all pairs of graphs in the dataset.
Thank you for your reply.
What is stored in 'i', when it is not the node label?
The IMDB-Binary, which is availble in TUDataset, seems to have node labels. In that dataset it is possible to query the num_classes. Why is this different for the IMDB-Multi?
The i refers to the index of the graph in the dataset, so that you can access the GED between two pairs of graphs via dataset.ged[data1.i, data2.i]. I think we should mention this in the documentation.
Is there a way we can infer the graph labels from ged measures?
There are no graph labels available for this dataset. This dataset should be only used in case you are working on accelerating GED computation with machine learning.
SimGNN, paper referred in dataset , seems to have labels for Aids dataset.
Aids Dataset : Each node is labeled
with one of 29 types, as illustrated in Fig. 4a.
Linux Data : We randomly select 1000
graphs of equal or less than 10 nodes each. The nodes are unla-
beled.
IMDB Data : To test the scal-
ability and efficiency of our proposed approach, we use the full
dataset without any selection. The nodes are unlabeled.
How to access aids dataset labels?
You can access them via data.x.
Most helpful comment
The i refers to the index of the graph in the dataset, so that you can access the GED between two pairs of graphs via dataset.ged[data1.i, data2.i]. I think we should mention this in the documentation.