Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labeling function to get a good mix of positive/negative labels. #2577

Open
coderaing opened this issue Jun 18, 2023 · 0 comments
Open

Labeling function to get a good mix of positive/negative labels. #2577

coderaing opened this issue Jun 18, 2023 · 0 comments

Comments

@coderaing
Copy link

coderaing commented Jun 18, 2023

I was wondering what will the "label times" look like for a problem statement like "Likelihood of making a purchase in next 7 days". The data at hand is the Customer data and all the previous transactions (Orders).

This guide (https://compose.alteryx.com/en/stable/examples/predict_next_purchase.html) is almost similar, but with a very big difference, in that it focuses on whether a customer will buy a "particular" product, and not any product. It is because of this difference that the guide gets both positive/negative labels.

If i use the same labeling function (with a slight modification to return true if purchase happened in that slice), i will get all positive labels, because our original data contains only data where the user did purchase something!

One solution i can think of is using drop_empty=False in the lm.search, so i get all slices even though no purchases happened in that slice. If this is the right approach, i have 2 related queries:

  1. Will this dataset be more biased towards Negative labels? Since there might be long periods when no purchase happened for a customer.
  2. Since we are dealing with 7 days window, the order_date might randomly fall in any one of the days of the slice. Is this ok?

The inference will happen every 2-3 days and the data will be handed over to marketing team.

Sorry if this seems like a very trivial problem, somehow i was not able to find enough resources online on how to handle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant