Skip to content

Identify and eventually generate fake Yelp Reviews

Notifications You must be signed in to change notification settings

N-Demir/Yelp-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yelp-reviews

A CS229 final project by Nikita Demir, Jose Giron, and Jonathan Gomes Selman

There is a clear need with the rise of e-commerce to reliably detect fake reviews and misinformation online. We set out to do this using a variety of simple and complex machine learning algorithms and compared/contrasted them.

We used a Yelp review dataset provided originally by Yelp but now hosted by [insert here] that contained 60,000 real and fake labeled Yelp restaurant reviews. We found surprisingly that the task of classification was extremely hard and that simply using word content was not a distinguishing enough feature. We then sought out to use behavorial features gotten from an accompanying metadata dataset for every review; however, these also were not very distinguishing.

Given the difficulty of classification, we then attempted to find out if we could somehow generate convincing fake restaurant reviews. Using a separate dataset of around 3 million restaurant reviews we trained a sequence to sequence model based on the OpenNMT framework to take an input context (review rating, type of food, location, etc.) and output a convincing review. Are results in this case were actually quite impressive and encouraging.

Future work on the matter should lead to even more fruitful results.

Our paper can be found as FinalReport.pdf

About

Identify and eventually generate fake Yelp Reviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages