Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPR results not reproducible using multiple threads #710

Open
Bernhard-Steindl opened this issue Mar 7, 2024 · 0 comments
Open

BPR results not reproducible using multiple threads #710

Bernhard-Steindl opened this issue Mar 7, 2024 · 0 comments

Comments

@Bernhard-Steindl
Copy link

Bernhard-Steindl commented Mar 7, 2024

Hello,

Thank you for sharing this great library with us! 馃槉
I am using implicit version 0.7.2 and I think I found a bug.

If I create two instances of BPR using the same arguments, with a fixed random_state and when using more than one thread via num_threads, I do get different user and item factors after fitting.
But, when I use just one thread this issue does not occur.
It seems that BPR produces non-deterministic results when using multiple threads.

I suspect that there may be some bug using the RNGVector class in bpr.pyx (c++ RNG object).
https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L43
A Numpy Random Generator is seeded using the supplied random_state argument (check_random_state in utils.py).
https://github.com/benfred/implicit/blob/v0.7.2/implicit/utils.py#L83
Each thread is supposed to get a RNGVector instance using a different random seed by using the Random Generator.
https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L185-L187

But, this issue does not occur for me using the LMF matrix factorization model, though it also uses a c++ RNG object.
https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/lmf.pyx#L40

There is a difference between BPR and LMF regarding the generation of random numbers with the RNGVector class.

Maybe this is the reason for the non-deterministic behaviour of BPR?

Best regards,
Bernhard

Minimal example of creating 2 BPR instances using a fixed seed and multiple threads.
The resulting user and item factors are different after fitting the models. Assertion errors occur.

import implicit
from threadpoolctl import threadpool_limits
from scipy.sparse import csr_matrix
import numpy as np

print('implicit version', implicit.__version__, '\n')

raw = [
    [1, 1, 0, 1, 0, 0],
    [0, 1, 1, 1, 0, 0],
    [1, 0, 1, 0, 0, 0],
    [1, 1, 0, 0, 0, 0],
    [0, 0, 1, 1, 0, 1],
    [0, 1, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 1],
]

def get_bpr_model():
    return implicit.bpr.BayesianPersonalizedRanking(factors=3,
                                       learning_rate=1e-3,
                                       regularization=1e-3,
                                       use_gpu=False,
                                       verify_negative_samples=True,
                                       iterations=5,
                                       num_threads=2,          # Assertions fail if num_threads > 1
                                       random_state=42)

with threadpool_limits(limits=1):

    model_A = get_bpr_model()
    model_A.fit(csr_matrix(raw), show_progress=False)
    print(model_A.item_factors)
    print(model_A.user_factors)
    print(model_A.user_factors.dot(model_A.item_factors.T))

    model_B = get_bpr_model()
    model_B.fit(csr_matrix(raw), show_progress=False)
    print(model_B.item_factors)
    print(model_B.user_factors)
    print(model_B.user_factors.dot(model_B.item_factors.T))


assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-2)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-2)

assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-3)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-3)

assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-4)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-4)

assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-5)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-5)

Output:

implicit version 0.7.2 

[[-0.13629021  0.09140649  0.05145105 -0.01928077]
 [-0.02253029  0.11990372 -0.13830613  0.06782728]
 [-0.10003278 -0.13561453  0.00902637  0.15518866]
 [ 0.07921306  0.08666488  0.07246038  0.09480929]
 [ 0.00444893 -0.1239751   0.11326855 -0.01699392]
 [-0.00050317 -0.04274316 -0.10560265  0.14299433]]
[[ 0.09415838  0.0487572  -0.03222072  1.        ]
 [ 0.01535796 -0.01886933 -0.01671766  1.        ]
 [-0.13623102  0.01816958  0.12942049  1.        ]
 [ 0.11905595  0.10971054 -0.07423817  1.        ]
 [-0.1114961   0.08572128  0.06691711  1.        ]
 [-0.14401852  0.15679215 -0.01908839  1.        ]
 [ 0.05942317  0.09276801  0.08664715  1.        ]]
[[-0.0293147   0.07600836  0.13886672  0.10415867 -0.02626929  0.14426552]
 [-0.02395883  0.06753092  0.15606043  0.09317917 -0.01647985  0.14555857]
 [ 0.00760582  0.05517557  0.16752037  0.09497053 -0.00519331  0.1286191 ]
 [-0.02929831  0.08856721  0.12773071  0.1083688  -0.03847447  0.14608479]
 [ 0.00719349  0.07136258  0.1553209   0.0982552  -0.02053766  0.13231981]
 [ 0.01369725  0.09251206  0.14815965  0.09560636 -0.03923509  0.13838078]
 [-0.01444188  0.06562787  0.13744582  0.11383459 -0.01841608  0.12984906]]
[[-0.13642174  0.09134811  0.05149103 -0.01928074]
 [-0.02247692  0.11985338 -0.1383458   0.06830399]
 [-0.10003294 -0.1356145   0.00902635  0.15518874]
 [ 0.07921314  0.08666489  0.07246038  0.09480958]
 [ 0.00444896 -0.12397511  0.11326852 -0.01699418]
 [-0.00050322 -0.04274315 -0.10560266  0.14299443]]
[[ 0.0941582   0.04875714 -0.03222046  1.        ]
 [ 0.01535813 -0.01886933 -0.01671774  1.        ]
 [-0.1362314   0.01816951  0.12942068  1.        ]
 [ 0.11917121  0.10970803 -0.07422689  1.        ]
 [-0.11149608  0.08572138  0.06691711  1.        ]
 [-0.14401817  0.15679215 -0.01908856  1.        ]
 [ 0.05942311  0.092768    0.08664716  1.        ]]
[[-0.02933116  0.07648887  0.1388668   0.10415898 -0.02626951  0.14426559]
 [-0.02396042  0.06801005  0.15606049  0.09317947 -0.01648012  0.14555869]
 [ 0.00762793  0.05563892  0.16752052  0.0949708  -0.00519355  0.1286192 ]
 [-0.02933869  0.08904324  0.12771969  0.10837884 -0.03847263  0.14608376]
 [ 0.00720586  0.07182638  0.15532097  0.0982555  -0.02053794  0.13231991]
 [ 0.01370624  0.09297396  0.14815971  0.09560666 -0.03923537  0.13838091]
 [-0.01445162  0.06609963  0.13744588  0.11383489 -0.01841634  0.12984917]]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[15], line 50
     47 assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-3)
     48 assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-3)
---> 50 assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-4)
     51 assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-4)
     53 assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-5)

AssertionError: 

I am using a win64 architecture and this conda environment:

name: my-conda-env
channels:
  - conda-forge
  - nodefaults
dependencies:
  - implicit=0.7.2
  - libblas=3.9.0=21_win64_mkl
  - mkl=2024.0.0=h66d3029_49657
  - numpy=1.24.4
  - pandas=2.0.3
  - pip=24.0
  - python=3.8.18
  - scipy=1.10.1
  - pip:
      - ipykernel==6.29.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant