Bag of Words Meets Bags of Popcorn

Please commit any changes so we can merge easily

TODO

(notes from the meeting with Jesse)

Feature detectors:
- N-grams (Dennis), Done
- No-grams (Dennis)
- Word2Vec
- Regex (Panni)
Classifiers:
- Vawpal Wabbit
- Semi supervised learning (Csaba: in progress)

About n-gram:

The n-gram feature combines 1-grams(=BoW), 2-grams ...., n-grams for feature creation. Now I set the minimum document-frequency to 1/10000 and it improved the BoW for about 2%. Yet it might be beneficial to indeed include a lot of features when involving 2- and 3-grams. Do you have an idea for a good classifier that can handle a lot of features ?

About no-grams idea:

I opened another branch for no-grams because I had to remove a lot of stopwords. It's prediction power is not better than randomness. I think because a no-gram in average only appears in every third review.

Creating 2-grams of the form: no+adjective and then: if adj = positive --> 2-gram is negative 2-gram if adj = negative --> 2-gram is positive 2-gram And then simply count amount of positive vs. negative 2-grams

Error to fix:

Error fixed thanks to Dennis. There's another error, I (Csaba) will solve it tomorrow night. Until then, do not use the SemiSupervised class.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
BagOfCentroids.csv		BagOfCentroids.csv
Bag_of_Words_model.csv		Bag_of_Words_model.csv
README.md		README.md
Word2Vec_AverageVectors.csv		Word2Vec_AverageVectors.csv
evaluation.py		evaluation.py
features.py		features.py
main.py		main.py
models.py		models.py
negative-words.txt		negative-words.txt
positive-words.txt		positive-words.txt
preprocessing.py		preprocessing.py
sampleSubmission.csv		sampleSubmission.csv
w2vec_popcorn.ipynb		w2vec_popcorn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bag of Words Meets Bags of Popcorn

TODO

About n-gram:

About no-grams idea:

Error to fix:

About

Releases

Packages

Contributors 4

Languages

pgombar/kaggle-popcorn

Folders and files

Latest commit

History

Repository files navigation

Bag of Words Meets Bags of Popcorn

TODO

About n-gram:

About no-grams idea:

Error to fix:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages