07 Feb 13:09

pyg-lib==0.4.0 brings PyTorch 2.2 support, distributed neighbor sampling, accelerated softmax operations, and edge-level temporal sampling support to PyG 🎉🎉🎉

Highlights

PyTorch 2.2 Support

pyg-lib==0.4.0 is fully compatible with PyTorch 2.2 (#294). To install for PyTorch 2.2, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.2.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu118 or cu121

The following combinations are supported:

PyTorch 2.2	`cpu`	`cu118`	`cu121`
Linux	✅	✅	✅
macOS	✅

Older PyTorch versions like PyTorch 1.12, 1.13, 2.0.0 and 2.1.0 are still supported, and can be installed as described in our README.md.

Distributed Sampling

pyg-lib==0.4.0 integrates all the low-level code for performing distributed neighbor sampling as part of torch_geometric.distributed in PyG 2.5 (#246, #252, #253, #254).

Sparse Softmax Implementation

pyg-lib==0.4.0 supports a fast sparse softmax_csr implementation based on CSR input representation (#264, #282):

from pyg_lib.ops import softmax_csr

src = torch.randn(4, 4)
ptr = torch.tensor([0, 4])
out = softmax_csr(src, ptr)

Edge-level Temporal Sampling

pyg-lib==0.4.0 brings edge-level temporal sampling support to PyG (#280). In particular, neighbor_sample and hetero_neighbor_sample now support the edge_time attribute, which will only samples edges in case they have a lower or equal timestamp than their corresponding seed_time.

Additional Features

Added support for bfloat16 data type in segment_matmul and grouped_matmul on CPU (#272)
Improved the runtime of biased sampling in neighbor_sample and hetero_neighbor_sample (#270)

Bugfixes

Dropped the MKL code path in neighbor_sample and hetero_neighbor_sample with replace=False since it did not correctly prevent duplicates (#275)
Fixed grouped_matmul in case input tensors are not contiguous (#290)

New Contributors

@pmpalang made their first contribution in #280
@Jokeren made their first contribution in #290

Full Changelog: 0.3.0...0.4.0

Contributors

Jokeren and pmpalang

Assets 2

10 Nov 06:14

rusty1s

0.3.1

4e70c29

pyg-lib 0.3.1: Bugfixes

pyg-lib==0.3.1 includes a variety of bugfixes and improvements.

Bug Fixes

Fixed an issue introduced in pyg-lib==0.3.0 in which the replace=False option was not correctly respected during neighbor_sample (#275)
Fixed support for older GLIBC versions (#276)

Improvements

Biased neighbor_sample has been made approximately twice as fast (#270)
segment_matmul and grouped_matmul now support bfloat16 CPU tensors (#271)

Full Changelog: 0.3.0...0.3.1

Assets 2

11 Oct 12:04

rusty1s

0.3.0

11840ac

pyg-lib 0.3.0: PyTorch 2.1 support, METIS partitioning, neighbor sampler improvements

pyg-lib==0.3.0 brings PyTorch 2.1 support, METIS partioning and further neighbor sampling improvements to PyG 🎉🎉🎉

Highlights

PyTorch 2.1 Support

pyg-lib==0.3.0 is fully compatible with PyTorch 2.1 (#256). To install for PyTorch 2.1, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu118 or cu121

The following combinations are supported:

PyTorch 2.1	`cpu`	`cu118`	`cu121`
Linux	✅	✅	✅
macOS	✅

Older PyTorch versions like PyTorch 1.12, 1.13 and 2.0.0 are still supported, and can be installed as described in our README.md. PyTorch 1.11 support has been dropped.

METIS partioning

pyg-lib==0.3.0 enables METIS partioning by introducing pyg_lib.partition (#229).

from pyg_lib.partition import metis

cluster = metis(rowptr, col, num_partitions)

Neighbor Sampling Improvements

pyg-lib==0.3.0 brings various improvements to our neighbor sampling routine:

Support for biased/weighted sampling: pyg_lib.sampler.neighbor_sample and pyg_lib.sampler.hetero_neighbor_sample now support the additional edge_weight argument (#247, #251)
pyg_lib.sampler.hetero_neighbor_sample now performs neighborhood sampling across edge types in parallel (#211)
Added low-level support for distributed neighborhood sampling (#246, #252, #253, #254)

Additional Features

Added dispatch for XPU device in index_sort (#243)
Updated cutlass version for speed boosts in segment_matmul and grouped_matmul (#235)

Bugfixes

Fixed vector-based mapping issue in Mapping (#244)
Fixed performance issues reported by Coverity Tool (#240)
Fixed TorchScript support in grouped_matmul (#220)

New Contributors

@yaox12 made their first contribution in #213
@yanbing-j made their first contribution in #231
@akihironitta made their first contribution in #248

Full Changelog: 0.2.0...0.3.0

Contributors

yaox12, akihironitta, and yanbing-j

Assets 2

22 Mar 13:54

rusty1s

0.2.0

930f72f

pyg-lib 0.2.0: PyTorch 2.0 support, sampled operations, and further accelerations

pyg-lib==0.2.0 brings PyTorch 2.0 support, sampled operations and further accelerations to PyG 🎉🎉🎉

Highlights

PyTorch 2.0 Support

pyg-lib==0.2.0 is fully compatible with PyTorch 2.0. To install for PyTorch 2.0, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.0.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu117 or cu118

The following combinations are supported:

PyTorch 2.0	`cpu`	`cu117`	`cu118`
Linux	✅	✅	✅
macOS	✅

Older PyTorch versions like PyTorch 1.11, 1.12 and 1.13 are still supported, and can be installed as described in our README.md.

Sampled Operations

We added support for sampled_op implementations (#156, #159, #160), which implements the scheme

out = left_tensor[left_index] (op) right_tensor[right_index]

efficiently without materializing intermediate representations:

from pyg_lib.ops import sampled_add

edge_index = ...
row, col = edge_index

# Replace ...
out = x[row] + x[col]

# ... with
out = sampled_add(left=x, right=x, left_index=row, right_index=col)

Supported operations are sampled_add, sampled_sub, sampled_mul and sampled_div.

Further Accelerations

index_sort implements a (way) faster alternative to sorting one-dimensional indices compared to torch.sort() (#181, #192). This heavily increases dataset loading times in PyG:

Optimized segment_matmul and grouped_matmul CPU implementations via MKL BLAS gemm_batch (#146, #172):

Breaking Changes

Temporal neighbor_sample and hetero_neighbor_sample will now sample nodes with the same or smaller timestamp than the seed node (changed from only sampling nodes with a smaller timestamp) (#187)

Full Changelog

Added

Added PyTorch 2.0 support (#214)
neighbor_sample routines now also return information about the number of sampled nodes/edges per layer (#197)
Added index_sort implementation (#181, #192)
Added triton>=2.0 support (#171)
Added bias term to grouped_matmul and segment_matmul (#161)
Added sampled_op implementation (#156, #159, #160)

Changed

Sample the nodes with the same timestamp as seed nodes (#187)
Added write-csv (saves benchmark results as csv file) and libraries (determines which libraries will be used in benchmark) parameters (#167)
Enable benchmarking of neighbor sampler on temporal graphs (#165)
Improved [segment|grouped]_matmul CPU implementation via at::matmul_out and MKL BLAS gemm_batch (#146, #172)

Full commit list: 0.1.0...0.2.0

Assets 2

30 Nov 07:55

rusty1s

0.1.0

2eab973

pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration

We are proud to release pyg-lib==0.1.0, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG 🎉🎉🎉

Extensive documentation is provided here. Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.

Installation
Highlights
Full Changelog

Installation

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

where

${TORCH} should be replaced by either 1.11.0, 1.12.0 or 1.13.0
${CUDA} should be replaced by either cpu, cu102, cu113, cu115, cu116 or cu117

The following combinations are supported:

PyTorch 1.13	`cpu`	`cu116`	`cu117`
Linux	✅	✅	✅
Windows
macOS	✅

PyTorch 1.12	`cpu`	`cu102`	`cu113`	`cu116`
Linux	✅	✅	✅	✅
Windows
macOS	✅

PyTorch 1.11	`cpu`	`cu102`	`cu113`	`cu115`
Linux	✅	✅	✅	✅
Windows
macOS	✅

Highlights

`pyg_lib.sampler`: Optimized homogeneous and heterogeneous neighborhood sampling

pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:

pyg_lib.sampler.neighbor_sample(
    rowptr: Tensor,
    col: Tensor,
    seed: Tensor,
    num_neighbors: List[int],
    time: Optional[Tensor] = None,
    seed_time: Optional[Tensor] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

and

pyg_lib.sampler.hetero_neighbor_sample(
    rowptr_dict: Dict[EdgeType, Tensor],
    col_dict: Dict[EdgeType, Tensor],
    seed_dict: Dict[NodeType, Tensor],
    num_neighbors_dict: Dict[EdgeType, List[int]],
    time_dict: Optional[Dict[NodeType, Tensor]] = None,
    seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

pyg_lib.sampler.neighbor_sample and pyg_lib.sampler.hetero_neighbor_sample recursively sample neighbors from all node indices in seed in the graph given by (rowptr, col). Also supports temporal sampling via the time argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time.

`pyg_lib.ops`: Heterogeneous GNN acceleration

pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:

segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensor

pyg_lib.ops.segment_matmul performs dense-dense matrix multiplication according to segments along the first dimension of inputs as given by ptr.

inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)

out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]

Full Changelog

Added

Added PyTorch 1.13 support (#145)
Added native PyTorch support for grouped_matmul (#137)
Added fused_scatter_reduce operation for multiple reductions (#141, #142)
Added triton dependency (#133, #134)
Enable pytest testing (#132)
Added C++-based autograd and TorchScript support for segment_matmul (#120, #122)
Allow overriding time for seed nodes via seed_time in neighbor_sample (#118)
Added [segment|grouped]_matmul CPU implementation (#111)
Added temporal_strategy option to neighbor_sample (#114)
Added benchmarking tool (Google Benchmark) along with pyg::sampler::Mapper benchmark example (#101)
Added CSC mode to pyg::sampler::neighbor_sample and pyg::sampler::hetero_neighbor_sample (#95, #96)
Speed up pyg::sampler::neighbor_sample via IndexTracker implementation (#84)
Added pyg::sampler::hetero_neighbor_sample implementation (#90, #92, #94, #97, #98, #99, #102, #110)
Added pyg::utils::to_vector implementation (#88)
Added support for PyTorch 1.12 (#57, #58)
Added grouped_matmul and segment_matmul CUDA implementations via cutlass (#51, #56, #61, #64, #69, #73, #123)
Added pyg::sampler::neighbor_sample implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89)
Added pyg::sampler::Mapper utility for mapping global to local node indices (#45, #83)
Added benchmark script (#45, #79, #82, #91, #93, #106)
Added download script for benchmark data (#44)
Added biased sampling utils (#38)
Added CHANGELOG.md (#39)
Added pyg.subgraph() (#31)
Added nightly builds ([#28](https://github.com...

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights

PyTorch 2.2 Support

Distributed Sampling

Sparse Softmax Implementation

Edge-level Temporal Sampling

Additional Features

Bugfixes

New Contributors

Contributors

Bug Fixes

Improvements

Highlights

PyTorch 2.1 Support

METIS partioning

Neighbor Sampling Improvements

Additional Features

Bugfixes

New Contributors

Contributors

Highlights

PyTorch 2.0 Support

Sampled Operations

Further Accelerations

Breaking Changes

Full Changelog

Installation

Highlights

`pyg_lib.sampler`: Optimized homogeneous and heterogeneous neighborhood sampling

`pyg_lib.ops`: Heterogeneous GNN acceleration

Full Changelog

Releases: pyg-team/pyg-lib

pyg-lib 0.4.0: PyTorch 2.2 support, distributed sampling, sparse softmax, edge-level temporal sampling

Highlights

PyTorch 2.2 Support

Distributed Sampling

Sparse Softmax Implementation

Edge-level Temporal Sampling

Additional Features

Bugfixes

New Contributors

Contributors

pyg-lib 0.3.1: Bugfixes

Bug Fixes

Improvements

pyg-lib 0.3.0: PyTorch 2.1 support, METIS partitioning, neighbor sampler improvements

Highlights

PyTorch 2.1 Support

METIS partioning

Neighbor Sampling Improvements

Additional Features

Bugfixes

New Contributors

Contributors

pyg-lib 0.2.0: PyTorch 2.0 support, sampled operations, and further accelerations

Highlights

PyTorch 2.0 Support

Sampled Operations

Further Accelerations

Breaking Changes

Full Changelog

pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration

Installation

Highlights

pyg_lib.sampler: Optimized homogeneous and heterogeneous neighborhood sampling

pyg_lib.ops: Heterogeneous GNN acceleration

Full Changelog

`pyg_lib.sampler`: Optimized homogeneous and heterogeneous neighborhood sampling

`pyg_lib.ops`: Heterogeneous GNN acceleration