PyG 2.0 Released

12 Sep 2021

PyG (PyTorch Geometric) has been moved from the personal account rusty1s to its own organization account pyg-team to emphasize the ongoing collaboration between TU Dortmund University, Stanford University and many great external contributors. With this, we are releasing PyG 2.0, a new major release that brings sophisticated heterogeneous graph support, GraphGym and many other exciting features to PyG.

Heterogeneous Graph Support

We finally provide full heterogeneous graph support in PyG 2.0. See here for the accompanying tutorial.

Highlights

Heterogeneous Graph Storage: Heterogeneous graphs can now be stored in their own dedicated data.HeteroData class (thanks to @yaoyaowd):

from torch_geometric.data import HeteroData
  
data = HeteroData()

# Create two node types "paper" and "author" holding a single feature matrix:
data['paper'].x = torch.randn(num_papers, num_paper_features)
data['author'].x = torch.randn(num_authors, num_authors_features)

# Create an edge type ("paper", "written_by", "author") holding its graph connectivity:
data['paper', 'written_by', 'author'].edge_index = ...  # [2, num_edges]

data.HeteroData behaves similar to a regular homgeneous data.Data object:

print(data['paper'].num_nodes)
print(data['paper', 'written_by', 'author'].num_edges)
data = data.to('cuda')

Heterogeneous Mini-Batch Loading: Heterogeneous graphs can be converted to mini-batches for many small and single giant graphs via the loader.DataLoader and loader.NeighborLoader loaders, respectively. These loaders can now handle both homogeneous and heterogeneous graphs:

from torch_geometric.loader import DataLoader

loader = DataLoader(heterogeneous_graph_dataset, batch_size=32, shuffle=True)

from torch_geometric.loader import NeighborLoader

loader = NeighborLoader(heterogeneous_graph, num_neighbors=[30, 30], batch_size=128,
                        input_nodes=('paper', data['paper'].train_mask), shuffle=True)

Heterogeneous Graph Neural Networks: Heterogeneous GNNs can now easily be created from homogeneous ones via nn.to_hetero and nn.to_hetero_with_bases. These processes take an existing GNN model and duplicate their message functions to account for different node and edge types:

from torch_geometric.nn import SAGEConv, to_hetero

class GNN(torch.nn.Module):
    def __init__(hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

model = GNN(hidden_channels=64, out_channels=dataset.num_classes)
model = to_hetero(model, data.metadata(), aggr='sum')

Additional Features

A heterogeneous graph tutorial describing all newly released features (thanks to @mrjel)
A variety of heterogeneous GNN examples
Support for lazy initialization of GNN operators by passing -1 to the in_channels argument (implemented via nn.dense.Linear). This allows to avoid calculating and keeping track of input tensor sizes, simplyfing the creation of heterogeneous graph models with varying feature dimensionalities across different node and edge types. Lazy initialization is supported for all existing PyG operators (thanks to @yaoyaowd):
```
from torch_geometric.nn import GATConv
  
conv = GATConv(-1, 64)
  
# We can initialize the model’s parameters by calling it once:
conv(x, edge_index)
```
nn.conv.HeteroConv: A generic wrapper for computing graph convolution on heterogeneous graphs (thanks to @RexYing)
nn.conv.HGTConv: The heterogeneous graph transformer operator from the “Heterogeneous Graph Transformer” paper
loader.HGTLoader: The heterogeneous graph sampler from the “Heterogeneous Graph Transformer” paper for learning on large-scale heterogeneous graphs (thanks to @chantat)
Support for heterogeneous graph transformations in transforms.AddSelfLoops, transforms.ToSparseTensor, transforms.NormalizeFeatures and transforms.ToUndirected
New heterogeneous graph datasets: datasets.OGB_MAG, datasets.IMDB, datasets.DBLP and datasets.LastFM
Support for converting heterogeneous graphs to “typed” homogeneous ones via data.HeteroData.to_homogeneous (thanks to @yzhao062)
A tutorial on creating a data.HeteroData object from raw *.csv files (thanks to @yaoyaowd and @mrjel)
An example to scale heterogeneous graph models via PyTorch Lightning

Managing Experiments with GraphGym

GraphGym is now officially supported in PyG 2.0 via torch_geometric.graphgym. See here for the accompanying tutorial. Overall, GraphGym is a platform for designing and evaluating Graph Neural Networks from configuration files via a highly modularized pipeline (thanks to @JiaxuanYou):

GraphGym is the perfect place to start learning about standardized GNN implementation and evaluation
GraphGym provides a simple interface to try out thousands of GNN architectures in parallel to find the best design for your specific task
GraphGym lets you easily do hyper-parameter search and visualize what design choices are better

Breaking Changes

The datasets.AMiner dataset now returns a data.HeteroData object. See here for our updated MetaPath2Vec example on AMiner.
transforms.AddTrainValTestMask has been replaced in favour of transforms.RandomNodeSplit
Since the storage layout of data.Data significantly changed in order to support heterogenous graphs, already processed datasets need to be re-processed by deleting the root/processed folder.
data.Data.__cat_dim__ and data.Data.__inc__ now expect additional input arguments:
```
def __cat_dim__(self, key, value, *args, **kwargs):
    pass
    
def __inc__(self, key, value, *args, **kwargs):
    pass
```
In case you modified __cat_dim__ or __inc__ functionality in a customized data.Data object, please ensure to apply the above changes.

Deprecations

nn.conv.PointConv is deprecated in favour of nn.conv.PointNetConv (thanks to @lelouedec and @QuanticDisaster)
utils.train_test_split_edges is deprecated in favour of the new transforms.RandomLinkSplit transform
All data loaders were moved from torch_geometric.data to torch_geometric.loader, e.g.:
```
from torch_geometric.loader import DataLoader
```
loader.NeighborSampler is deprecated in favour of loader.NeighborLoader in order to simplify the application of neighbor sampling and to support both neighbor sampling in homogeneous and heterogeneous graphs
Data.contains_isolated_nodes and Data.contains_self_loops are deprecated in favour of Data.has_isolated_nodes and Data.has_self_loops, respectively

Additional Features

torch-scatter and torch-sparse now support half-precision computation via torch.half, bringing half-precision support to PyG
Added a GNN cheatsheet to the documentation, which lets you more easily choose a GNN operator for your specific need
Added the transforms.RandomLinkSplit transform to easily perform a random edge-level random split (thanks to @RexXing)
Added the torch_geometric.profile package which provides a variety of utility functions for benchmarking runtimes and memory consumptions of GNN models (thanks to @yzhao062)
nn.conv.MessagePassing now supports hooks for propagate, message, aggregate and update functions, e.g. via nn.conv.MessagePassing.register_propagate_forward_hook
Added the nn.conv.GeneralConv operator that can handle most GNN use-cases (e.g., w/ or w/o edge features, …) and has enough design options to be tuned (e.g., attention, skip-connections, …) (thanks to @JiaxuanYou)
Added the nn.models.RECT_L model for learning with completely-imbalanced labels (thanks to @Fizyhsp)
Added the Pathfinder Discovery Network Convolutional operator nn.conv.PDNConv (thanks to @benedekrozemberczki)
Added basic GNN model support as part of the nn.models package, e.g., nn.model.GCN, nn.models.GraphSAGE, nn.models.GAT and nn.models.GIN. Pre-defined models support customizing hidden feature dimensionality, number of layers, activation, normalization and jumping knowledge (thanks to @PabloAMC)
Added the datasets.MD17 datasets (thanks to @M-R-Schaefer)
Added a link-prediction example of nn.conv.RGCNConv (thanks to @moritzblum)
Added an example of nn.pool.MemPooling (thanks to @wsad1)
Added a return_attention_weights argument for nn.conv.TransformerConv (thanks to @wsad1)
Batch support for utils.homophily (thanks to @wsad1)
Added a batch_size argument to utils.to_dense_batch (thanks to @jimmiebtlr)

Minor Changes

Heavily improved loading times of import torch_geometric
nn.Sequential is now fully jittable
nn.conv.LEConv is now fully jittable (thanks to @lucagrementieri)
nn.conv.GENConv can now make use of "add", "mean" or "max" aggregations (thanks to @riskiem)
Attributes of type torch.nn.utils.rnn.PackedSequence are now correctly handled by data.Data and data.HeteroData (thanks to @WuliangHuang)
Added support for data.record_stream() in order to allow for data prefetching (thanks to @FarzanT)
Added a max_num_neighbors attribute to nn.models.SchNet and nn.models.DimeNet (thanks to @nec4)
nn.conv.MessagePassing is now jittable in case message, aggregate and update return multiple arguments (thanks to @PhilippThoelke)
utils.from_networkx now supports grouping of node-level and edge-level features (thanks to @PabloAMC)
Transforms now inherit from transforms.BaseTransform to ease type checking (thanks to @CCInc)
Added support for the deletion of data attributes via del data[key] (thanks to @Linux-cpp-lisp)

Bugfixes

The transforms.LinearTransformation transform now correctly transposes the input matrix before applying the transformation (thanks to @beneisner)
Fixed a bug in benchmark/kernel that prevented the application of DiffPool on the IMDB-BINARY dataset (thanks to @dongZheX)
Feature dimensionalities of datasets.WikipediaNetwork do now match which the official reported ones in case geom_gcn_preprocess=True (thanks to @ZhuYun97 and @GitEventhandler)
Fixed a bug in the datasets.DynamicFAUST dataset in which data.num_nodes was undefined (thanks to @koustav123)
Fixed a bug in which nn.models.GNNExplainer could not handle GNN operators that add self-loops to the graph in case self-loops were already present (thanks to @tw200464tw and @NithyaBhasker)
nn.norm.LayerNorm may no longer produce NaN gradients (thanks to @fbragman)
Fixed a bug in which it was not possible to customize networkx drawing arguments in nn.models.GNNExplainer.visualize_subgraph() (thanks to @jvansan)
transforms.RemoveIsolatedNodes now correctly removes isolated nodes in case data.num_nodes is explicitely set (thanks to @blakechi)

Share on:

Contact Info

Links

PyG 2.0 Released

Heterogeneous Graph Support

Highlights

Additional Features

Managing Experiments with GraphGym

Breaking Changes

Deprecations

Additional Features

Minor Changes

Bugfixes

You Might Also Like

Read More

GraphGym & PyG Integration

Read More

Stanford Graph Learning Workshop 2021

Read More

PyG 2.0 Released