Skip to content

AmineDiro/CascadeBandit

Repository files navigation

CascadeBandit-UCB

The following repo is a python implementation of the Cascading Bandits paper.

Main problem

Ranking and presenting web pages by attractiveness to user is very important problem. The search engine has to ’learn’ the user behavior to output the most attractive items to a user from a set of web pages, where attractiveness can usually be inferred from historical click data of the user. The paper presents a ’novel’ way of modeling this interaction in an online setting based on the Cascade Model. We could formulate a simpler setup where the search engine has to recommend the most attractive item. This could easily be modeled as a stochastic bandit problem where each page/item has an unknown attractiveness probability. This problem can be solved with classic stochastic algorithms like UCB1. The main difference here is the combinatorial aspect of the problem where we need to recommend K items from L. The model is related to a form of bandits problem where each recommendation is modeled as a separate independent bandit. We call this type of modeling Ranked bandits, we will discuss the differences between Ranked bandits and Cascade bandit.

Single run

To test the result presented by the paper, I implemented the algorithm CascadeUCB described above in algorithm 1.3. The U CB computation was based on CascadeUCB1. The implementation code details can be found in appendix A. Running the code for L = 8, K = 2, ∆ = 0.15, p = 0.2 and for n = 105 rounds we clearly see a convergence as the cumulative regret stabilizes.

100%|██████████| 99999/99999 [00:08<00:00, 12166.25it/s]

Analysis

png

png

Cumulative item selections

Plotting the selected item e by rounds on a logarithmic scale on the y-axis in figure 2, we clearly see that the algorithms stops selecting sub-optimal items (items from 2 to 8).

png

Experiment with multiple runs

We do in fact retrieve the upper bounds of the regret in equation 4. The regret increases linearly in L − K. When doubling the number of items L, the regret doubles. Increasing K with a constant number of items shows that the regret decreases. The regret also decreases when ∆ increases, confirming the 1/∆ factor in the upper bound.

L= 16 , K= 2,delta =0.15: 100%|██████████| 5/5 [00:00<00:00, 57.53it/s]
L= 16 , K= 4,delta =0.15: 100%|██████████| 5/5 [00:00<00:00, 60.38it/s]
L= 16 , K= 4,delta =0.075: 100%|██████████| 5/5 [00:00<00:00, 58.38it/s]
L= 8 , K= 2,delta =0.075: 100%|██████████| 5/5 [00:00<00:00, 76.36it/s]
L K delta mean std
0 16 2 0.150 24.096475 2.540551
1 16 4 0.150 29.094590 4.761885
2 16 4 0.075 20.042619 1.931034
3 8 2 0.075 15.488652 1.809575

About

Implementation of Cascading Bandits UCB algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published