Factorization machines #172

martincousi · 2018-04-24T18:14:09Z

Here is a basic factorization machine algorithm that takes into account only the user and item ids. It is equivalent to SVD when using degree=2. I have implemented this algorithm with the tffm library as well as the polylearn library for testing purpose. I found that the tffm is the preferable one given the different options it allows. To be used with GridSearchCV and RandomizedSearchCV, it however requires a special value for the session_config argument (see doc).

It's yet unclear to me what should be good default values for the algorithm that would work in most settings. Currently, it appears that both algorithms are slow while I would have though that using tensorflow would be fast...

This PR also contains tests for the feature option to Dataset, Trainset, etc.

I am planning to construct more elaborate factorization machine algorithms. The tests for the factorization machine algorithms will follow.

dumping is now done with pickle 'highest protocol'

added asym_rmse and asym_mae

…aset Revert "Revert "Features dataset""

Lasso prediction algorithm

[GSF] Syncing Fork

martincousi · 2018-04-25T20:14:09Z

I have added three new factorization machine algos. They are many more possible but most of them can be accomplished by using the features. Additional ones could also be conceived when the library will support context (user-item pair features such as timestamp, location, etc.).

I would like these algos to be modular such that you can turn on/off implicit information, features, etc. I guess the best way would be to create the sparse lists in FMAlgo and turn on/off the different components in the children. What do you think? Also, should there be many FM objects or only one with multiple options?

By the way, the special value for session_config is not needed to do parallelization, at least not on my system.

NicolasHug · 2018-04-27T07:58:27Z

Thanks a lot,

Once again I really appreciate the efforts with the docs and the tests.

I'm definitely interested in adding FM into surprise! This is a lot of code for me to digest though ^^ and I don't have tons of free time ATM (should be easier in the following months), so I just wanted to make sure you know that the review process may take long.

should there be many FM objects or only one with multiple options?

I personally like it when there's a single uniform interface to deal with, but it should still be easy to use. Like, if there are lots of incompatible parameters in a single class, maybe it's best to separate them into different classes. I'll leave it to your own appreciation to decide what's best here.

Are you actually using the FM algos you implemented? If so, with what dataset? I'd like to play around with them to get a feel of how to use them, that would make the understanding of all the code (especially the feature part) a lot easier for me.

Thanks!

martincousi added 30 commits March 26, 2018 16:45

added asym_rmse and asym_mae

8a3532f

Merge pull request #1 from NicolasHug/master

de2cd0c

dumping is now done with pickle 'highest protocol'

Merge pull request #2 from martincousi/asymetric-measures

6d18af6

added asym_rmse and asym_mae

disable print in AlgoBase.compute_baselines()

3f6b1d0

Cancel printing of computation of similarities

daab1ba

Cancel printing of similiraty computation

05ef072

add load_features_df() method

902246f

modified construct_trainset() and load_features_df()

fb64e98

modified Trainset.__init__()

13f3a28

corrected bugs in print statement

900c0c0

use user_features_nb to test if initialized

68ccfca

revert back changes to accuracy.py

f7fa4d8

revert back changes to AlgoBase

c6591ae

Update .gitignore

e31e857

Update .gitignore

7d67963

fixed python 2 compatibility

73bea50

construction of Lasso.fit()

4063da8

modified predict and estimate methods

34dd04b

include features in testset and prediction objects

d275f84

update matrix factorization estimate method

14d1248

adapt estimate methods for all prediction algorithms

a2b87c4

add sklearn arguments to Lasso

3c5f7e6

single underscore for dummy variable

7b82e78

update documentation for Lasso and change filename

bf335c2

correct conflict with master

e34a5f9

add interaction terms in Lasso

4081244

add interaction terms to Lasso.estimate

d3dd0dd

correct conflicts with master

47ff477

correct verbose conflicts in knns

1279424

add add_interactions to self in Lasso

62ccd84

martincousi added 20 commits April 5, 2018 17:37

Revert "Revert "Features dataset""

e3de208

Merge pull request #5 from martincousi/revert-4-revert-3-features-dat…

c52d707

…aset Revert "Revert "Features dataset""

add lasso

4fabe29

Merge pull request #6 from martincousi/lasso

e7adc87

Lasso prediction algorithm

Merge pull request #7 from NicolasHug/master

4fc4242

[GSF] Syncing Fork

add factorization_machines.py

885aab2

add FMAlgo and FMBasic

69c281f

solve bugs

b8132e5

add indication of features in Prediction

b06a05d

add FMBasic based on polylearn

458a419

Remove features from Prediction

d8cea2f

changed tests

4337b24

correct tests

a0765a2

correct tests

1b6ee31

correct FMBasicPL

bef04b9

add tests for datasets with features

c998024

add test for missing user or item features

531b536

Add doc to FMBasic and FMBasicPL

991f30f

Add FMImplicit and FMExplicit

9550933

Add FMFeatures

e1cea2c

martincousi added 8 commits May 28, 2019 10:19

Added incomplete implementation of FM (without sample_weights)

037bf10

Implemented FM with Pytorch

011105c

Removed FM deprecated implementations

c808cde

Added dev set functionnality to FM

efe2b85

Update linear.py

1053a60

Merge remote-tracking branch 'origin/lasso' into factorization-machines

e65e68a

Change Baseline_only default verbose value

cc09a6d

Updated tests

723afd7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factorization machines #172

Factorization machines #172

martincousi commented Apr 24, 2018

martincousi commented Apr 25, 2018 •

edited

NicolasHug commented Apr 27, 2018

Factorization machines #172

Are you sure you want to change the base?

Factorization machines #172

Conversation

martincousi commented Apr 24, 2018

martincousi commented Apr 25, 2018 • edited

NicolasHug commented Apr 27, 2018

martincousi commented Apr 25, 2018 •

edited