Pytorch_geometric: Some questions about Data

Created on 28 Aug 2020 · 17Comments · Source: rusty1s/pytorch_geometric

❓ Questions & Help

Hi, I have some quick questions about Data()/ Graph. As the following,

Should I reindex the node id in edge_index start from 0 or any given numbers? What if there is a node id never been shown in the edge_index, should I still give this node x?
Is it a good way to keep node id with x? Or is there a better way to get the node id? Since I need node id to get the init embedding or pass the node id to other parts of the model.

Source

johnny12150

All 17 comments

Node indices should always range from 0 to num_nodes - 1, and an edge (i, j) should align with the node indices in x: That is, x[i] should give you the features of the source node, while x[j] should give you the features of the destination node.

Edit: For getting the node id, you can also pass in an additional attribute to data, e.g.:

data.n_id = torch.arange(num_nodes)

Hope this clarifies your issues :)

rusty1s on 28 Aug 2020

👍1

Thanks for the quickly reply!
So the num_node of Data is calculated from the unique indexes within the edge_index?

johnny12150 on 28 Aug 2020

If not explicitly set, num_nodes will be calculated via x.size(0). In general, you cannot rely on num_nodes == edge_index.max() + 1 because of isolated nodes.

rusty1s on 28 Aug 2020

https://github.com/rusty1s/pytorch_geometric/issues/1391#issue-649149066

So if I want to build a Data() from the above example edge_index, I should reorder the edge_index?

johnny12150 on 28 Aug 2020

What do you mean by re-ordering? As far as I can see, there's no reason to modify it in the first-place.

rusty1s on 28 Aug 2020

https://github.com/rusty1s/pytorch_geometric/issues/1580#issuecomment-682341772

Oh! I have misunderstood the meaning here. I thought it means that I need to sort the edge_index.
For example, if I have a graph with [[0, 1], [1, 4], [1, 5], [1, 6], [0, 2], [2, 7], [0, 3], [3, 8], [3, 9], [3, 10], [3, 11]], then I need to reorder this to [[0, 1], [0, 2], [0, 3], [1, 4], [1, 5], [1, 6], [2, 7], [3, 8], [3, 9], [3, 10], [3, 11]].

For the above example (original order), I have 12 nodes and I have to give x with the length of 12?
If my x is torch.arrange(12), this means that node 0 will get x=0, node 1 get x=1, node 4 get x=2 and so on?

johnny12150 on 28 Aug 2020

(1) edge_index does not need to be sorted.
(2) Yes, exactly :)

rusty1s on 28 Aug 2020

👍1

Got it!

l = np.array([[0, 1], [1, 4], [1, 5], [1, 6], [0, 2], [2, 7], [0, 3], [3, 8], [3, 9], [3, 10], [3, 11]]).transpose()
r = np.array([l[1, :], l[0, :]])
edge_index = torch.from_numpy(r)

train_loader = NeighborSampler(edge_index,
                               sizes=[2, 2], batch_size=1)

for batch_size, n_id, adjs in train_loader:
    print("###")
    for edge_ind, e_id, size in adjs:
        print(e_id)

For the above code, Is there a more detailed description of batch_size, n_id returned by the NeighborSampler?
And edge_ind, e_id, size returned by adjs?

johnny12150 on 28 Aug 2020

https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html?highlight=Neighbor#torch_geometric.data.NeighborSampler

Sorry for the bothering, I have found the documentation about it.

an item returned by NeighborSampler holds the current batch_size, the IDs n_id of all nodes involved in the computation, and a list of bipartite graph objects via the tuple (edge_index, e_id, size), where edge_index represents the bipartite edges between source and target nodes, e_id denotes the IDs of original edges in the full graph, and size holds the shape of the bipartite graph.

johnny12150 on 28 Aug 2020

If I have isolated nodes in the graph, how should I assign their feature by x?
Since they won't appear in the edge_index.
Or I only need to assign features to those who aren't isolated and manually assign the num_node by myself?

johnny12150 on 31 Aug 2020

Isolates nodes are still nodes, so they should have the same semantic node features as all the remaining nodes. If they aren't important for your computation graph, feel free to set them to zero.

rusty1s on 31 Aug 2020

That's said I have a graph with the example mentioned above as [[0, 1], [1, 4], [1, 5], [1, 6], [0, 2], [2, 7], [0, 3], [3, 8], [3, 9], [3, 10], [3, 11]].
If I have an isolated node called 12, how should I assign the feature to it with x ?
And what if the isolated node is not a starting(0) or ending index(max_node)?

johnny12150 on 31 Aug 2020

You can still add node features for isolated nodes, i.e. to add node 12 as an isolated nodes, you simply have a node feature matrix of shape [13, num_features] where x[12] corresponds to the features of node 12.

rusty1s on 31 Aug 2020

If 4 is the isolated node with [[0, 1], [1, 5], [1, 6], [0, 2], [2, 7], [0, 3], [3, 8], [3, 9], [3, 10], [3, 11]]
And I have x is torch.arrange(12), in the previous discussion suggest that I will have x=0 for node0, x=1 for node 1, x=2 for node 5, x=3 for node 6, x=4 for node 2 and so on.
In this case, the x[4] won't be the feature of the isolated node 4 ?

johnny12150 on 31 Aug 2020

x[4] denotes the feature of the isolated node 4. That is true for all nodes, i.e.: node i holds its features in x[i].

rusty1s on 31 Aug 2020

So when I havex is torch.arrange(12) as features, it should mean that node i in edge_index would have feature x[i]?
Edge [1,5] would get x[1] and x[5] as the feature for node 1 and node 5.

johnny12150 on 31 Aug 2020

Yes :)

rusty1s on 31 Aug 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings