Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ImplicitCF failing if not using stratified split #2024

Open
daviddavo opened this issue Oct 18, 2023 · 4 comments
Open

[BUG] ImplicitCF failing if not using stratified split #2024

daviddavo opened this issue Oct 18, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@daviddavo
Copy link
Collaborator

Description

ImplicitCF raises an IndexError if the user appears in the test dataset but not on the training dataset.

How do we replicate the issue?

Split a dataset using a method like TimeSeriesSplit or python_chrono_split. I.e: len(ImplicitCF.interact_status) < len(ImplicitCF.user_idx)

Expected behavior (i.e. solution)

Raisign a meaningful error if the dataset needs to be stratified, or assuming that if the user is not on the ImplicitCF.interact_status table, it should have the empty set of items.

Other Comments

Meanwhile, I solved it by using:

data.interact_status = data.interact_status.reindex(data.user_idx['userID_idx'])
data.interact_status['userID'] = data.interact_status.index
data.interact_status['itemID_interacted'] = data.interact_status['itemID_interacted'].fillna("").apply(set)

This will create a the remaining "empty" users

Or just deleting items in test that don't appear in train

@daviddavo daviddavo added the bug Something isn't working label Oct 18, 2023
@loomlike
Copy link
Collaborator

loomlike commented Nov 2, 2023

@cclinet @Qcactus Can you check this? If you cannot find a time to handle this, let me know.

@cclinet
Copy link
Contributor

cclinet commented Nov 3, 2023

I'm not certain under what circumstances 'user in test' should be used in 'interact_status.' Based on my understanding, 'interact_status' should only be applied to the training dataset. Could you provide me with more information on this?

@daviddavo
Copy link
Collaborator Author

Basically, to test how a system would work in real-life cold-start conditions.

@cclinet
Copy link
Contributor

cclinet commented Nov 8, 2023

I've become a bit unclear about the logic here, but I don't believe it's necessary for 'interact_status' to include users from
the test set.
@loomlike Do you have time to further investigate this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants