Skip to content

Commit

Permalink
Merge pull request #1687 from microsoft/staging
Browse files Browse the repository at this point in the history
Staging to main: New Release 1.1.0
  • Loading branch information
miguelgfierro committed Mar 31, 2022
2 parents 6987858 + 48b70d5 commit d4181cf
Show file tree
Hide file tree
Showing 11 changed files with 233 additions and 51 deletions.
1 change: 1 addition & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ build:
- cmake

# Explicitly set the version of Python and its requirements
# The flat extra_requirements all is equivalent to: pip install .[all]
python:
version: "3.7"
install:
Expand Down
9 changes: 9 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# What's New

## Update April 1, 2022

We have a new release [Recommenders 1.1.0](https://github.com/microsoft/recommenders/releases/tag/1.1.0)!
We have introduced the SASRec and SSEPT algorithms that are based on transformers.
In addition, we now have enabled Python 3.8 and 3.9.
We have also made improvements on the SARPlus algorithm, including support for Azure Synapse and Spark 3.2.
There are also bug fixes and improvements on NCF, RBM, LightGBM, LightFM, Scikit-Surprise, the stratified splitter, dockerfile
and upgrade to Scikit-Learn 1.0.2.

## Update January 13, 2022

We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.
Expand Down
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@

[![Documentation Status](https://readthedocs.org/projects/microsoft-recommenders/badge/?version=latest)](https://microsoft-recommenders.readthedocs.io/en/latest/?badge=latest)

## What's New (January 13, 2022)

We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.
## What's New (April 1, 2022)

We have a new release [Recommenders 1.1.0](https://github.com/microsoft/recommenders/releases/tag/1.1.0)!
We have introduced the SASRec and SSEPT algorithms that are based on transformers.
In addition, we now have enabled Python 3.8 and 3.9.
We have also made improvements on the SARPlus algorithm, including support for Azure Synapse and Spark 3.2.
There are also bug fixes and improvements on NCF, RBM, LightGBM, LightFM, Scikit-Surprise, the stratified splitter, dockerfile
and upgrade to Scikit-Learn 1.0.2.

Starting with release 0.6.0, Recommenders has been available on PyPI and can be installed using pip!

Expand Down
19 changes: 19 additions & 0 deletions docs/source/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,25 @@ SAR
.. automodule:: recommenders.models.sar.sar_singlenode
:members:

SASRec
******************************

.. automodule:: recommenders.models.sasrec.model
:members:

.. automodule:: recommenders.models.sasrec.sampler
:members:

.. automodule:: recommenders.models.sasrec.util
:members:


SSE-PT
******************************

.. automodule:: recommenders.models.sasrec.ssept
:members:


Surprise
******************************
Expand Down
4 changes: 2 additions & 2 deletions examples/00_quick_start/sequential_recsys_amazondataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,9 @@
"\n",
"Only the SLi_Rec model is time-aware. For the other models, you can just pad some meaningless timestamp in the data files to fill up the format, the models will ignore these columns.\n",
"\n",
"We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with num_ngs negative instances. Pair-wise ranking can be regarded as a special case of Softmax ranking, where num_ngs is set to 1. \n",
"We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with `num_ngs` negative instances. Pair-wise ranking can be regarded as a special case of softmax ranking, where `num_ngs` is set to 1. \n",
"\n",
"More specifically, for training and evalation, you need to organize the data file such that each one positive instance is followd by num_ngs negative instances. Our program will take 1+num_ngs lines as a unit for Softmax calculation. num_ngs is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function. `train_num_ngs` in `prepare_hparams` denotes the number of negative instances for training, where a recommended number is 4. `valid_num_ngs` and `num_ngs` in `fit` and `run_eval` denote the number in evalution. In evaluation, the model calculates metrics among the 1+num_ngs instances. For the `predict` function, since we only need to calcuate a socre for each individual instance, there is no need for num_ngs setting. More details and examples will be provided in the following sections.\n",
"More specifically, for training and evalation, you need to organize the data file such that each one positive instance is followed by `num_ngs` negative instances. Our program will take `1+num_ngs` lines as a unit for Softmax calculation. `num_ngs` is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function. `train_num_ngs` in `prepare_hparams` denotes the number of negative instances for training, where a recommended number is 4. `valid_num_ngs` and `num_ngs` in `fit` and `run_eval` denote the number in evalution. In evaluation, the model calculates metrics among the `1+num_ngs` instances. For the `predict` function, since we only need to calcuate a score for each individual instance, there is no need for `num_ngs` setting. More details and examples will be provided in the following sections.\n",
"\n",
"For training stage, if you don't want to prepare negative instances, you can just provide positive instances and set the parameter `need_sample=True, train_num_ngs=train_num_ngs` for function `prepare_hparams`, our model will dynamicly sample `train_num_ngs` instances as negative samples in each mini batch.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion recommenders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Licensed under the MIT License.

__title__ = "Microsoft Recommenders"
__version__ = "1.0.0"
__version__ = "1.1.0"
__author__ = "RecoDev Team at Microsoft"
__license__ = "MIT"
__copyright__ = "Copyright 2018-present Microsoft Corporation"
Expand Down
12 changes: 7 additions & 5 deletions recommenders/evaluation/spark_evaluation.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import numpy as np
try:
from pyspark.mllib.evaluation import RegressionMetrics, RankingMetrics
from pyspark.sql import Window, DataFrame
Expand Down Expand Up @@ -99,13 +100,13 @@ def __init__(
raise ValueError("Schema of rating_pred not valid. Missing Prediction Col")

self.rating_true = self.rating_true.select(
col(self.col_user).cast("double"),
col(self.col_item).cast("double"),
col(self.col_user),
col(self.col_item),
col(self.col_rating).cast("double").alias("label"),
)
self.rating_pred = self.rating_pred.select(
col(self.col_user).cast("double"),
col(self.col_item).cast("double"),
col(self.col_user),
col(self.col_item),
col(self.col_prediction).cast("double").alias("prediction"),
)

Expand Down Expand Up @@ -158,7 +159,8 @@ def exp_var(self):
0
]
var2 = self.y_pred_true.selectExpr("variance(label)").collect()[0][0]
return 1 - var1 / var2
# numpy divide is more tolerant to var2 being zero
return 1 - np.divide(var1, var2)


class SparkRankingEvaluation:
Expand Down

0 comments on commit d4181cf

Please sign in to comment.