Skip to content

This research aims to extract the general sentiment of visitors in Athens by exploiting tourism content based on online platforms

Notifications You must be signed in to change notification settings

dimitramav/thesis-sentiment-analysis-in-tourism

Repository files navigation

Preface

The main objective of this thesis is to gather the general sentiment of visitors in Athens by exploiting tourism content shared on online platforms. The idea comprises the study of real data about tourism, the discovery of sentiment and insights from them and finally, their visualization. The steps of the research include the data collection (reviews/posts), the data preprocessing and transformation and the algorithms’ application (i.e. topic modeling and sentiment analysis algorithms) on the refined data. A combination of tools and algorithms is used in order to intersect the output and to approach more realistic results. Finally, the outcome is visualised with diagrams and maps. The graphs provide answers to a series of questions and depict the satisfaction of visitors in Athens.

Dev environment and Tools

The project is implemented on Python 3.7.3 with the use of python libraries. Google Places, python-twitter and foursquare are wrappers that provide a pure Python interface for the corresponding APIs. During the preprocessing stage, langdetect, ast, re, stop_words and nltk libraries eliminate non-english and redundant elements. As far as algorithms are concerned, topic modeling utilises gensim modules and sentiment analysis combines nltk.sentiment.vader (ready sentiment analysis tool) and sklearn (library with custom vectorizers and classifiers). A library of offline reverse geocoding (reverse_geocoder) and a service that uses third-party geocoders (geopy) are responsible for the conversion of coordinates to districts. The third-party geocoder in this case is Nominatim, a geocoder for open source OpenStreetMap data. The visualization process is also achieved using python libraries. A word cloud generator (word_cloud), a data visualization library (seaborn), a tool that creates JS Leaflet maps (folium) and a 2D plotting library (matplotlib) are used to form the diagrams and the plots. Despite the essential components, many helping functions and modules were used. CSV, JSON and pickle libraries manage the generated files, while pandas, collections and operator handle the data structures. Python system libraries (sys, os, time) complete the project’s environment.

The current thesis was conducted at the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens from May 2019 to February 2020 and was supervised by Assistant Professor Maria Roussou and Instructional Lab. Personnel Ms. Athanasia Kolovou.

About

This research aims to extract the general sentiment of visitors in Athens by exploiting tourism content based on online platforms

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published