Sparkify is a music streaming startup with an impressive userbase growth (their marketing team must be doing one hell of a job) and is looking to move their processes and data to the cloud. Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.
I am tasked with applying my engineering skills to build solutions that serve data business users making their job a lot easier.
Initial I built a data warehouse for their analytic team but with their increasing user base, their data needed to be moved into a data lake.
This folder contains my implementation of a data warehouse solution for Sparkify.
Warehouse was hosted in AWS Redshift
This folder contains my implementation of a data lake solution for Sparkify.
This folder contains my implemetation of a data pipeline with Apache Airflow for Sparkify