Skip to content

Releases: pyg-team/pyg-lib

pyg-lib 0.4.0: PyTorch 2.2 support, distributed sampling, sparse softmax, edge-level temporal sampling

07 Feb 13:09
Compare
Choose a tag to compare

pyg-lib==0.4.0 brings PyTorch 2.2 support, distributed neighbor sampling, accelerated softmax operations, and edge-level temporal sampling support to PyG πŸŽ‰πŸŽ‰πŸŽ‰

Highlights

PyTorch 2.2 Support

pyg-lib==0.4.0 is fully compatible with PyTorch 2.2 (#294). To install for PyTorch 2.2, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.2.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu118 or cu121

The following combinations are supported:

PyTorch 2.2 cpu cu118 cu121
Linux βœ… βœ… βœ…
macOS βœ…

Older PyTorch versions like PyTorch 1.12, 1.13, 2.0.0 and 2.1.0 are still supported, and can be installed as described in our README.md.

Distributed Sampling

pyg-lib==0.4.0 integrates all the low-level code for performing distributed neighbor sampling as part of torch_geometric.distributed in PyG 2.5 (#246, #252, #253, #254).

Sparse Softmax Implementation

pyg-lib==0.4.0 supports a fast sparse softmax_csr implementation based on CSR input representation (#264, #282):

from pyg_lib.ops import softmax_csr

src = torch.randn(4, 4)
ptr = torch.tensor([0, 4])
out = softmax_csr(src, ptr)

Edge-level Temporal Sampling

pyg-lib==0.4.0 brings edge-level temporal sampling support to PyG (#280). In particular, neighbor_sample and hetero_neighbor_sample now support the edge_time attribute, which will only samples edges in case they have a lower or equal timestamp than their corresponding seed_time.

Additional Features

Bugfixes

New Contributors

Full Changelog: 0.3.0...0.4.0

pyg-lib 0.3.1: Bugfixes

10 Nov 06:14
4e70c29
Compare
Choose a tag to compare

pyg-lib==0.3.1 includes a variety of bugfixes and improvements.

Bug Fixes

  • Fixed an issue introduced in pyg-lib==0.3.0 in which the replace=False option was not correctly respected during neighbor_sample (#275)
  • Fixed support for older GLIBC versions (#276)

Improvements

Full Changelog: 0.3.0...0.3.1

pyg-lib 0.3.0: PyTorch 2.1 support, METIS partitioning, neighbor sampler improvements

11 Oct 12:04
11840ac
Compare
Choose a tag to compare

pyg-lib==0.3.0 brings PyTorch 2.1 support, METIS partioning and further neighbor sampling improvements to PyG πŸŽ‰πŸŽ‰πŸŽ‰

Highlights

PyTorch 2.1 Support

pyg-lib==0.3.0 is fully compatible with PyTorch 2.1 (#256). To install for PyTorch 2.1, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu118 or cu121

The following combinations are supported:

PyTorch 2.1 cpu cu118 cu121
Linux βœ… βœ… βœ…
macOS βœ…

Older PyTorch versions like PyTorch 1.12, 1.13 and 2.0.0 are still supported, and can be installed as described in our README.md. PyTorch 1.11 support has been dropped.

METIS partioning

pyg-lib==0.3.0 enables METIS partioning by introducing pyg_lib.partition (#229).

from pyg_lib.partition import metis

cluster = metis(rowptr, col, num_partitions)

Neighbor Sampling Improvements

pyg-lib==0.3.0 brings various improvements to our neighbor sampling routine:

Additional Features

  • Added dispatch for XPU device in index_sort (#243)
  • Updated cutlass version for speed boosts in segment_matmul and grouped_matmul (#235)

Bugfixes

  • Fixed vector-based mapping issue in Mapping (#244)
  • Fixed performance issues reported by Coverity Tool (#240)
  • Fixed TorchScript support in grouped_matmul (#220)

New Contributors

Full Changelog: 0.2.0...0.3.0

pyg-lib 0.2.0: PyTorch 2.0 support, sampled operations, and further accelerations

22 Mar 13:54
930f72f
Compare
Choose a tag to compare

pyg-lib==0.2.0 brings PyTorch 2.0 support, sampled operations and further accelerations to PyG πŸŽ‰πŸŽ‰πŸŽ‰

Highlights

PyTorch 2.0 Support

pyg-lib==0.2.0 is fully compatible with PyTorch 2.0. To install for PyTorch 2.0, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.0.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu117 or cu118

The following combinations are supported:

PyTorch 2.0 cpu cu117 cu118
Linux βœ… βœ… βœ…
macOS βœ…

Older PyTorch versions like PyTorch 1.11, 1.12 and 1.13 are still supported, and can be installed as described in our README.md.

Sampled Operations

We added support for sampled_op implementations (#156, #159, #160), which implements the scheme

out = left_tensor[left_index] (op) right_tensor[right_index]

efficiently without materializing intermediate representations:

from pyg_lib.ops import sampled_add

edge_index = ...
row, col = edge_index

# Replace ...
out = x[row] + x[col]

# ... with
out = sampled_add(left=x, right=x, left_index=row, right_index=col)

Supported operations are sampled_add, sampled_sub, sampled_mul and sampled_div.

Further Accelerations

  • index_sort implements a (way) faster alternative to sorting one-dimensional indices compared to torch.sort() (#181, #192). This heavily increases dataset loading times in PyG:

Screenshot 2023-03-22 at 14 46 26

image

Breaking Changes

Full Changelog

Added
  • Added PyTorch 2.0 support (#214)
  • neighbor_sample routines now also return information about the number of sampled nodes/edges per layer (#197)
  • Added index_sort implementation (#181, #192)
  • Added triton>=2.0 support (#171)
  • Added bias term to grouped_matmul and segment_matmul (#161)
  • Added sampled_op implementation (#156, #159, #160)
Changed
  • Sample the nodes with the same timestamp as seed nodes (#187)
  • Added write-csv (saves benchmark results as csv file) and libraries (determines which libraries will be used in benchmark) parameters (#167)
  • Enable benchmarking of neighbor sampler on temporal graphs (#165)
  • Improved [segment|grouped]_matmul CPU implementation via at::matmul_out and MKL BLAS gemm_batch (#146, #172)

Full commit list: 0.1.0...0.2.0

pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration

30 Nov 07:55
2eab973
Compare
Choose a tag to compare

We are proud to release pyg-lib==0.1.0, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG πŸŽ‰πŸŽ‰πŸŽ‰

Extensive documentation is provided here. Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.

Installation

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

where

  • ${TORCH} should be replaced by either 1.11.0, 1.12.0 or 1.13.0
  • ${CUDA} should be replaced by either cpu, cu102, cu113, cu115, cu116 or cu117

The following combinations are supported:

PyTorch 1.13 cpu cu102 cu113 cu115 cu116 cu117
Linux βœ… βœ… βœ…
Windows
macOS βœ…
PyTorch 1.12 cpu cu102 cu113 cu115 cu116 cu117
Linux βœ… βœ… βœ… βœ…
Windows
macOS βœ…
PyTorch 1.11 cpu cu102 cu113 cu115 cu116 cu117
Linux βœ… βœ… βœ… βœ…
Windows
macOS βœ…

Highlights

pyg_lib.sampler: Optimized homogeneous and heterogeneous neighborhood sampling

pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:

Screenshot 2022-11-30 at 08 44 08

pyg_lib.sampler.neighbor_sample(
    rowptr: Tensor,
    col: Tensor,
    seed: Tensor,
    num_neighbors: List[int],
    time: Optional[Tensor] = None,
    seed_time: Optional[Tensor] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

and

pyg_lib.sampler.hetero_neighbor_sample(
    rowptr_dict: Dict[EdgeType, Tensor],
    col_dict: Dict[EdgeType, Tensor],
    seed_dict: Dict[NodeType, Tensor],
    num_neighbors_dict: Dict[EdgeType, List[int]],
    time_dict: Optional[Dict[NodeType, Tensor]] = None,
    seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

pyg_lib.sampler.neighbor_sample and pyg_lib.sampler.hetero_neighbor_sample recursively sample neighbors from all node indices in seed in the graph given by (rowptr, col). Also supports temporal sampling via the time argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time.

pyg_lib.ops: Heterogeneous GNN acceleration

pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:

Screenshot 2022-11-30 at 08 44 38

segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensor

pyg_lib.ops.segment_matmul performs dense-dense matrix multiplication according to segments along the first dimension of inputs as given by ptr.

inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)

out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]

Full Changelog

Added
  • Added PyTorch 1.13 support (#145)
  • Added native PyTorch support for grouped_matmul (#137)
  • Added fused_scatter_reduce operation for multiple reductions (#141, #142)
  • Added triton dependency (#133, #134)
  • Enable pytest testing (#132)
  • Added C++-based autograd and TorchScript support for segment_matmul (#120, #122)
  • Allow overriding time for seed nodes via seed_time in neighbor_sample (#118)
  • Added [segment|grouped]_matmul CPU implementation (#111)
  • Added temporal_strategy option to neighbor_sample (#114)
  • Added benchmarking tool (Google Benchmark) along with pyg::sampler::Mapper benchmark example (#101)
  • Added CSC mode to pyg::sampler::neighbor_sample and pyg::sampler::hetero_neighbor_sample (#95, #96)
  • Speed up pyg::sampler::neighbor_sample via IndexTracker implementation (#84)
  • Added pyg::sampler::hetero_neighbor_sample implementation (#90, #92, #94, #97, #98, #99, #102, #110)
  • Added pyg::utils::to_vector implementation (#88)
  • Added support for PyTorch 1.12 (#57, #58)
  • Added grouped_matmul and segment_matmul CUDA implementations via cutlass (#51, #56, #61, #64, #69, #73, #123)
  • Added pyg::sampler::neighbor_sample implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89)
  • Added pyg::sampler::Mapper utility for mapping global to local node indices (#45, #83)
  • Added benchmark script (#45, #79, #82, #91, #93, #106)
  • Added download script for benchmark data (#44)
  • Added biased sampling utils (#38)
  • Added CHANGELOG.md (#39)
  • Added pyg.subgraph() (#31)
  • Added nightly builds ([#28](https://github.com...
Read more