Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve similarities performance #448

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mihaiblidaru
Copy link

I'm trying to test multiple recommendation algorithms with a very big dataset (6M ratings, 69k users, 17k items) and I noticed that for algorithms that compute similarities the fit time is very high.

I found that there are some redundant operations that could be eliminated in order to speed up the similarities' computation.

In similarities.pyx when computing auxiliary values, like the freq, prods arrays, because y_ratings is not sorted we have to fully iterate over y_ratings two times resulting in N² operations for each y_ratings in yr.

    for y, y_ratings in yr.items():
        for xi, ri in y_ratings:
            for xj, rj in y_ratings:
                freq[xi, xj] += 1
                prods[xi, xj] += ri * rj
                sqi[xi, xj] += ri**2
                sqj[xi, xj] += rj**2            

We can take advantage of the fact that all auxiliary matrices are symmetric (M[x][y] == M[y][X]) so filling the half above diagonal is enough to compute the similarities.

My change is sorting the y_ratings lists before filling the auxiliary arrays and changing the second for loop on y_ratings so that only the top half of the auxiliary matrices are filled. This reduces the number of operations for each element in yr from N² to N(N-1)/2.

    sorted_yr = { y : sorted(y_ratings, key = lambda x: x[0]) for y, y_ratings in yr.items() }

    for y, y_ratings in sorted_yr.items():
        for i, (xi, ri) in enumerate(y_ratings):
            for xj, rj in y_ratings[i + 1:]:

The sorting might reduce performance for small datasets, but for larger datasets like movielens-1m the performance gains are very noticeable.

Some performance tests I did on my laptop (Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz). The time in the next tables are the average time it took to train (just the .fit method call) a KNNBasic algorithm over 50 samples.
Dataset: movielens-1m

Before (seconds) After (seconds) Time reduction (After/Before * 100)
cosine 32.58 13.78 42.29 %
msd 29.94 11.14 37.20 %
pearson 38.21 16.98 44.43 %
pearson_baseline 38.26 17.86 46.68 %

Dataset: movielens-100k

Before(seconds) After(seconds) Time reduction(After/Before * 100)
cosine 0.46 0.20 43.47 %
msd 0.28 0.15 53.57 %
pearson 0.58 0.27 46.55 %
pearson_baseline 0.66 0.42 63.63 %

Of course, all tests pass successfully.

… computation

For each processed user or item, the number of operations is reduced from N² to N(N-1)/2, roughly half.

The sorting might reduce performance for small datasets but for larger datasets like movielens-1m the performance gains are very noticeable
@mihaiblidaru
Copy link
Author

I just discovered that there are official benchmarks. I ran then using github workflows which was a terrible idea because the VMs they run on are veeeeeeeery sloooooow.

There's not much improvement when benchmarking the ml-100k dataset
Before

Movielens 100k RMSE MAE Time
k-NN 0.98 0.774 0:00:17
Centered k-NN 0.951 0.749 0:00:18
k-NN Baseline 0.931 0.733 0:00:21

After

Movielens 100k RMSE MAE Time
k-NN 0.98 0.774 0:00:16
Centered k-NN 0.951 0.749 0:00:17
k-NN Baseline 0.931 0.733 0:00:19

Using the ml-1m dataset my implementation is slower for some unknown reason and made me doubt the consistency of the performance of the github VMs.

Before

Movielens 1M RMSE MAE Time
k-NN 0.923 0.727 0:11:51
Centered k-NN 0.929 0.738 0:12:03
k-NN Baseline 0.895 0.706 0:11:36

After

Movielens 1M RMSE MAE Time
k-NN 0.923 0.727 0:13:31
Centered k-NN 0.929 0.738 0:13:28
k-NN Baseline 0.895 0.706 0:13:40

I ran the benchmarks again on a c5.2xlarge AWS EC2 instance and the benchmark results make more sense.
Before

Movielens 1M RMSE MAE Time
k-NN 0.923 0.727 0:09:22

After

Movielens 1M RMSE MAE Time
k-NN 0.923 0.727 0:07:45

My guess is that the improvement is lower when running the full benchmark because most of the time is spent testing the algorithm and generating predictions rather than training it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant