Overview

This repository contains work I did in Spark to build a recommender system for movies and users in MovieLens dataset. This was done as part of the college module 'Data Analysis at Speed and Scale'.

A description of the original MovieLens dataset can be seen at http://files.grouplens.org/datasets/movielens/ml-latest-small-README.html, however, for the recommendation system project I used a cleaned version of this dataset.

As I had already used this dataset for analysis in PIG & Hive, see https://github.com/Crone1/Pig_and_Hive_MovieLens_Analysis, I already had code to clean this data. This code was again used for this project and the cleaned data was uploaded to the HDFS cluster. This cleaning involved seperation of columns and joining of the data into one table and was done in Pig.

While both the Pig and Spark work was done using the Google Cloud Platform, the coding in Spark involved using a jupyter notebook web interface for the hosted cluster.

Running the code

In order to run this code, you will need to:

Set up your own Google Cloud Platform cluster,
Run the code in the Pig_movieLens_cleaning.pig file,
Create the jupyter notebook with the code in this repository,
Run this developed recommender system.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
MovieLens_Rocommender_System.ipynb		MovieLens_Rocommender_System.ipynb
Pig_movieLens_cleaning.pig		Pig_movieLens_cleaning.pig
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MovieLens_Rocommender_System.ipynb

MovieLens_Rocommender_System.ipynb

Pig_movieLens_cleaning.pig

Pig_movieLens_cleaning.pig

README.md

README.md

Repository files navigation

Overview

Running the code

About

Releases

Packages

Languages

Crone1/Spark-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

Overview

Running the code

About

Topics

Resources

Stars

Watchers

Forks

Languages