Swiss Happy Maps

Happiness of Swiss cantons based on social media sentiment.

Summary of Work

We classified 6.5 million instagram post sentiments using fastText. Our notebook is complete with data preparation, feature analysis, word clouds, hyperparameter selection, learning curves, cross-validation, confusion matrix to explain the model and measure its performance.

We provide an interactive visualization of the happiness of Switzerland across gender, time of day, and time of year with the interactive web application.

Our data was on the Hadoop cluster provided by Prof. Catasta. Our data acquisition and processing was done with PySpark, and our PySpark applications are found in our scripts repo. Extensive data wrangling done for the visualization can be found in our data_wrangling repo.

Challenges Faced

None of our team members had any knowledge of distributing computing or experience running jobs on the Hadoop cluster. It took us time and effort before successfully submitting a simple Spark application on the cluster.
None of our team members had experience with building interactive visualization or web applciation. We learned to use the Leaflet package in R and learned to build out our first web application using Shiny in RStudio.

Tools

Spark for distributed data processing
fastText for state-of-the-art word representation learning and text classification
Leaflet for interactive choropleth
Shiny for interactive web app

Repository Structure

scripts: Contains the scripts we ran on the cluster, usually to save the data as parquet so that we can do further analysis on it locally.
data_wrangling: Contains the notebooks used to clean the twitter data downloaded through the scripts. It is finally made into a form more suitable for visualization.
machine_learning: Gives the code and details for the Machine Learning part. The notebook is complete with feature analysis, word clouds, learning curves, cross-validation, confusion matrix to explain the model and measure its performance.
interactive_visualization: Gives the data and corresponding code in R for creating the Leaflet map and code for creating the shiny application in R that generates the interactive web application.

Further details for each folder are given in the Readme of each folder.

Acknowledgement

This repository was developed by Kirtan Padh, Luis Medina, and Tina Fang from November 2016 to February 2017 for the course Applied Data Analysis at EPFL, Switzerland.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
data_wrangling		data_wrangling
interactive_visualization		interactive_visualization
machine_learning		machine_learning
scripts		scripts
twitter_data		twitter_data
.gitignore		.gitignore
README.md		README.md
poster.pdf		poster.pdf
presentation.pdf		presentation.pdf
run_leaflet.R		run_leaflet.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_wrangling

data_wrangling

interactive_visualization

interactive_visualization

machine_learning

machine_learning

scripts

scripts

twitter_data

twitter_data

.gitignore

.gitignore

README.md

README.md

poster.pdf

poster.pdf

presentation.pdf

presentation.pdf

run_leaflet.R

run_leaflet.R

Repository files navigation

Swiss Happy Maps

Summary of Work

Challenges Faced

Tools

Repository Structure

Acknowledgement

About

Releases

Packages

Contributors 3

Languages

tbfang/swiss-happy-maps

Folders and files

Latest commit

History

Repository files navigation

Swiss Happy Maps

Summary of Work

Challenges Faced

Tools

Repository Structure

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Languages