04 Mar 20:37

This release introduces image_embedded stype to handle image columns, fixes bugs on MultiNestedTensor indexing, and makes efficiency improvements in terms of missing value imputation and categorical column encoders.

Added

Avoided for-loop in EmbeddingEncoder (#366)
Added image_embedded and one tabular image dataset (#344)
Added benchmarking suite for encoders (#360)
Added dataframe text benchmark script (#354, #367)
Added DataFrameTextBenchmark dataset (#349)
Added support for empty TensorFrame (#339)

Changed

Changed a workflow of Encoder's na_forward method resulting in performance boost (#364)
Removed ReLU applied in FCResidualBlock (#368)

Fixed

Fixed bug in empty MultiNestedTensor handling (#369)
Fixed the split of DataFrameTextBenchmark (#358)
Fixed empty MultiNestedTensor col indexing (#355)

Assets 2

17 Jan 01:15

weihua916

0.2.1

f746974

PyTorch Frame 0.2.1

This PR makes the following fixes and extensions to 0.2.0.

Added

Support more stypes in LinearModelEncoder (#325)
Added stype_encoder_dict to some models (#319)
Added HuggingFaceDatasetDict (#287)

Changed

Supported decoder embedding model in examples/transformers_text.py (#333)
Removed implicit clones in StypeEncoder (#286)

Fixed

Fixed TimestampEncoder not applying CyclicEncoder to cyclic features (#311)
Fixed NaN masking in multicateogrical stype (#307)

Assets 2

15 Dec 23:20

yiweny

0.2.0

3f1a695

PyTorch Frame 0.2.0

We are excited to announce the second release of PyTorch Frame 🐶

PyTorch Frame 0.2.0 is the cumulation of work from many contributors from and outside Kumo who have worked on features and bug-fixes for a total of over 120 commits since torch-frame==0.1.0.

PyTorch Frame is featured in the Relational Deep Learning paper and used as the encoding layer for PyG.

Kumo is also hiring interns working on cool deep learning projects. If you are interested, feel free to apply through this link.

If you have any questions or would like to contribute to PyTorch Frame, feel free to send a question at our slack channel.

Highlights

Support for `multicategorical`, `timestamp`,`text_tokenized` and `embedding` stypes

We have added support for four more semantic types. Adding the new stypes allows for more flexibility to encode raw data. To understand how to specify different semantic types for your data, you can take a look at the tutorial. We also added many new StypeEncoder for the different new semantic types.

Integration with Large Language Models

We now support two types of integration with LLMs--embedding and fine-tuning.

You can use any embeddings generated by LLMs with PyTorch Frame, either by directly feeding the embeddings as raw data of embedding stype or using text as raw data of text_embedded stype and specifying the text_embedder for each column. Here is an example of how you can use PyTorch Frame with text embeddings generated by OpenAI, Cohere, VoyageAI and HuggingFace transformers.

text_tokenized enables users to fine-tune Large Language Models on text columns, along with other types of raw tabular data, on any downstream task. In this example, we fine-tuned both the full distilbert-base-uncased model and with LoRA.

More Benchmarks

We added more benchmark results in the benchmark section. LightGBM is included in the list of GBDTs that we compare with the deep learning models. We did initial experiments on various LLMs as well.

Breaking Changes

text_tokenized_cfg and text_embedder_cfg are renamed to col_to_text_tokenized_cfg and col_to_text_embedder_cfg respectively (#257). This allows users to specify different embedders, tokenizers for different text columns.
Now Trompt outputs 2-dim embeddings in forward.

Features

We now support the following new encoders: LinearEmbeddingEncoder for embedding stype, TimestampEncoder for timestamp stype and MultiCategoricalEmbeddingEncoder for multicategorical stype.
LightGBM is added to GDBTs module.
Auto-inference of stypes from raw DataFrame columns is supported through infer_df_stype function. However, the correctness of the inference is not guaranteed and we suggest you to double-check.

Bugfixes

We fixed the in_channels calculation of ResNet(#220) and improved the overall user experience on handling dirty data (#171 #234 #264).

Full Changelog

Full Changelog: 0.1.0...0.2.0

Assets 2

23 Oct 23:09

akihironitta

0.1.0

90a54f4

PyTorch Frame 0.1.0

We are excited to announce the initial release of PyTorch Frame 🎉🎉🎉

PyTorch Frame is a deep learning extension for PyTorch, designed for heterogeneous tabular data with different column types, including numerical, categorical, time, text, and images.

To get started, please refer to:

our README.md for the overview of PyTorch Frame,
"Introduction by Example" tutorial and its code at examples/tutorial.py to get started with using PyTorch Frame, and
"Modular Design of Deep Tabular Models" tutorial in our documentation and the existing implementations in torch_frame/nn/models/ directory to create your own PyTorch Frame model for tabular data.

Highlights

Models, datasets and examples

In our initial release, we introduce 6 models, 9 feature encoders, 5 table convolution layers, 3 decoders, and 14 datasets.

Benchmarks

With our initial set of models and datasets under torch_frame.nn and torch_frame.datasets, we benchmarked their performance on binary classification and regression tasks. The row denotes the model names and the column denotes the dataset idx. In each cell, we include the mean and standard deviation of the model performance, as well as the total time spent, including Optuna-based hyper-parameter search and final model training.

Note

For the latest benchmark scripts and results, see [benchmark/](https://github.com/pyg-team/pytorch-frame/tree/master/benchmark#leaderbo...

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Changed

Fixed

Highlights

Support for `multicategorical`, `timestamp`,`text_tokenized` and `embedding` stypes

Integration with Large Language Models

More Benchmarks

Breaking Changes

Features

Bugfixes

Full Changelog

Highlights

Models, datasets and examples

Benchmarks

Releases: pyg-team/pytorch-frame

PyTorch Frame 0.2.2

Added

Changed

Fixed

PyTorch Frame 0.2.1

PyTorch Frame 0.2.0

Highlights

Support for multicategorical, timestamp,text_tokenized and embedding stypes

Integration with Large Language Models

More Benchmarks

Breaking Changes

Features

Bugfixes

Full Changelog

PyTorch Frame 0.1.0

Highlights

Models, datasets and examples

Benchmarks

Support for `multicategorical`, `timestamp`,`text_tokenized` and `embedding` stypes