The development is made in python 3.5 or above and requires the following packages:
- numpy 1.12.1 or above
- pandas 19.2 or above
- xgboost 0.6 or above
- Clone the directory : git clone https://github.com/goldentom42/kaggle_expedia_2016.git
- Download train and test data files from the Expedia competition page to the input folder.
- Under the main directory run: python bayesian_approach.py --mode=sub --name=full This will generate a file in the submission folder.
bayesian_approach.py supports several options:
- -b : this option will split the train.csv file into training and validation sets
- --mode=val : this option trains on the training set and issue statistics after building recommendations on the validation set.
- --mode=sub : this option trains on the original train.csv file and build recommendations for the test.csv file. It will generate a submission file in the submission folder. You can submit it on kaggle to check you LB position
- --keys=[comma separated list of fields] : with this option you can test different settings and see how the recommendations behave
- --name=[name of submission file] : with thos you can specify the name of the generated file
- -w[weighting startegy number] : supports 0, 1 and 2. Sets the weight assigned to each training samples
Examples:
- python bayesian_approach.py --mode=val --keys=user_location_city,orig_destination_distance --name=leak
- bayesian_approach.py --mode=sub --keys=srch_destination_id,hotel_market,is_package --name=dest_mkt_pack -w1