Merge pull request #1687 from microsoft/staging

Staging to main: New Release 1.1.0
recommenders-team · Mar 31, 2022 · d4181cf · d4181cf
2 parents 6987858 + 48b70d5
commit d4181cf
Show file tree

Hide file tree

Showing 11 changed files with 233 additions and 51 deletions.
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -6,6 +6,7 @@ build:
  - cmake
 
 # Explicitly set the version of Python and its requirements
+# The flat extra_requirements all is equivalent to: pip install .[all]
 python:
  version: "3.7"
  install:

diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,14 @@
 # What's New
 
+## Update April 1, 2022
+
+We have a new release [Recommenders 1.1.0](https://github.com/microsoft/recommenders/releases/tag/1.1.0)! 
+We have introduced the SASRec and SSEPT algorithms that are based on transformers. 
+In addition, we now have enabled Python 3.8 and 3.9.
+We have also made improvements on the SARPlus algorithm, including support for Azure Synapse and Spark 3.2.
+There are also bug fixes and improvements on NCF, RBM, LightGBM, LightFM, Scikit-Surprise, the stratified splitter, dockerfile 
+and upgrade to Scikit-Learn 1.0.2.
+
 ## Update January 13, 2022
 
 We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.

diff --git a/README.md b/README.md
@@ -2,9 +2,14 @@
 
 [![Documentation Status](https://readthedocs.org/projects/microsoft-recommenders/badge/?version=latest)](https://microsoft-recommenders.readthedocs.io/en/latest/?badge=latest)
 
-## What's New (January 13, 2022)
-
-We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.
+## What's New (April 1, 2022)
+
+We have a new release [Recommenders 1.1.0](https://github.com/microsoft/recommenders/releases/tag/1.1.0)! 
+We have introduced the SASRec and SSEPT algorithms that are based on transformers. 
+In addition, we now have enabled Python 3.8 and 3.9.
+We have also made improvements on the SARPlus algorithm, including support for Azure Synapse and Spark 3.2.
+There are also bug fixes and improvements on NCF, RBM, LightGBM, LightFM, Scikit-Surprise, the stratified splitter, dockerfile 
+and upgrade to Scikit-Learn 1.0.2.
 
 Starting with release 0.6.0, Recommenders has been available on PyPI and can be installed using pip! 
 

diff --git a/docs/source/models.rst b/docs/source/models.rst
@@ -213,6 +213,25 @@ SAR
 .. automodule:: recommenders.models.sar.sar_singlenode
  :members:
 
+SASRec 
+******************************
+
+.. automodule:: recommenders.models.sasrec.model
+ :members:
+
+.. automodule:: recommenders.models.sasrec.sampler
+ :members:
+
+.. automodule:: recommenders.models.sasrec.util
+ :members:
+
+
+SSE-PT 
+******************************
+
+.. automodule:: recommenders.models.sasrec.ssept
+ :members:
+
 
 Surprise
 ******************************

diff --git a/examples/00_quick_start/sequential_recsys_amazondataset.ipynb b/examples/00_quick_start/sequential_recsys_amazondataset.ipynb
@@ -144,9 +144,9 @@
  "\n",
  "Only the SLi_Rec model is time-aware. For the other models, you can just pad some meaningless timestamp in the data files to fill up the format, the models will ignore these columns.\n",
  "\n",
- "We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with num_ngs negative instances. Pair-wise ranking can be regarded as a special case of Softmax ranking, where num_ngs is set to 1. \n",
+ "We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with `num_ngs` negative instances. Pair-wise ranking can be regarded as a special case of softmax ranking, where `num_ngs` is set to 1. \n",
  "\n",
- "More specifically,  for training and evalation, you need to organize the data file such that each one positive instance is followd by num_ngs negative instances. Our program will take 1+num_ngs lines as a unit for Softmax calculation. num_ngs is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function. `train_num_ngs` in `prepare_hparams` denotes the number of negative instances for training, where a recommended number is 4. `valid_num_ngs` and `num_ngs` in `fit` and `run_eval` denote the number in evalution. In evaluation, the model calculates metrics among the 1+num_ngs instances. For the `predict` function, since we only need to calcuate a socre for each individual instance, there is no need for num_ngs setting. More details and examples will be provided in the following sections.\n",
+ "More specifically, for training and evalation, you need to organize the data file such that each one positive instance is followed by `num_ngs` negative instances. Our program will take `1+num_ngs` lines as a unit for Softmax calculation. `num_ngs` is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function. `train_num_ngs` in `prepare_hparams` denotes the number of negative instances for training, where a recommended number is 4. `valid_num_ngs` and `num_ngs` in `fit` and `run_eval` denote the number in evalution. In evaluation, the model calculates metrics among the `1+num_ngs` instances. For the `predict` function, since we only need to calcuate a score for each individual instance, there is no need for `num_ngs` setting. More details and examples will be provided in the following sections.\n",
  "\n",
  "For training stage, if you don't want to prepare negative instances, you can just provide positive instances and set the parameter `need_sample=True, train_num_ngs=train_num_ngs` for function `prepare_hparams`, our model will dynamicly sample `train_num_ngs` instances as negative samples in each mini batch.\n",
  "\n",

diff --git a/recommenders/__init__.py b/recommenders/__init__.py
@@ -2,7 +2,7 @@
 # Licensed under the MIT License.
 
 __title__ = "Microsoft Recommenders"
-__version__ = "1.0.0"
+__version__ = "1.1.0"
 __author__ = "RecoDev Team at Microsoft"
 __license__ = "MIT"
 __copyright__ = "Copyright 2018-present Microsoft Corporation"

diff --git a/recommenders/evaluation/spark_evaluation.py b/recommenders/evaluation/spark_evaluation.py
@@ -1,6 +1,7 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 
+import numpy as np
 try:
  from pyspark.mllib.evaluation import RegressionMetrics, RankingMetrics
  from pyspark.sql import Window, DataFrame
@@ -99,13 +100,13 @@ def __init__(
  raise ValueError("Schema of rating_pred not valid. Missing Prediction Col")
 
  self.rating_true = self.rating_true.select(
- col(self.col_user).cast("double"),
- col(self.col_item).cast("double"),
+ col(self.col_user),
+ col(self.col_item),
  col(self.col_rating).cast("double").alias("label"),
  )
  self.rating_pred = self.rating_pred.select(
- col(self.col_user).cast("double"),
- col(self.col_item).cast("double"),
+ col(self.col_user),
+ col(self.col_item),
  col(self.col_prediction).cast("double").alias("prediction"),
  )
 
@@ -158,7 +159,8 @@ def exp_var(self):
  0
  ]
  var2 = self.y_pred_true.selectExpr("variance(label)").collect()[0][0]
- return 1 - var1 / var2
+ # numpy divide is more tolerant to var2 being zero
+ return 1 - np.divide(var1, var2)
 
 
 class SparkRankingEvaluation: