Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about leave_k_out function #571

Open
Deemjan opened this issue May 14, 2022 · 2 comments
Open

Question about leave_k_out function #571

Deemjan opened this issue May 14, 2022 · 2 comments

Comments

@Deemjan
Copy link

Deemjan commented May 14, 2022

I noticed something weird when I was using this function to split my data into train and test set
I had a distribution of users and number of times they have rated items looking something like this:

Number of ratings given Number of users
1 6000
2 3000
3 200
4 30

The documentation states that users > K ratings have one of their rating put into test set, and the others in the train set.
So when I used the function with k = 1 I was expecting to get 3230 records in the test set, but only got 230

So my question is shoudln't this line then

candidate_mask = counts > K + 1

look like this

candidate_mask = counts >= K + 1 

or this

candidate_mask = counts > K

instead ?

I have a guess that it was done this way to prevent situation where user with 2 ratings gets only 1 rating in the train set, because If I understand it correctly users with 1 rating are useless for training? Please verify

@ita9naiwa
Copy link
Collaborator

yes, it looks it's bug and it must be fixed.

@ita9naiwa
Copy link
Collaborator

I'm sorry, it's intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants