Project Sparkify

Sparkify is a music streaming startup with an impressive userbase growth (their marketing team must be doing one hell of a job) and is looking to move their processes and data to the cloud. Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.

I am tasked with applying my engineering skills to build solutions that serve data business users making their job a lot easier.

Initial I built a data warehouse for their analytic team but with their increasing user base, their data needed to be moved into a data lake.

Data Warehouse

This folder contains my implementation of a data warehouse solution for Sparkify.

Warehouse was hosted in AWS Redshift

Data Lake

This folder contains my implementation of a data lake solution for Sparkify.

Pipeline

This folder contains my implemetation of a data pipeline with Apache Airflow for Sparkify

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Data Lake		Data Lake
Data Modeling Apache Cassandra		Data Modeling Apache Cassandra
Data Modeling Postgres		Data Modeling Postgres
Data Warehouse		Data Warehouse
Pipeline		Pipeline
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Sparkify

Data Warehouse

Data Lake

Pipeline

About

Releases

Packages

Languages

mathias-mike/Project-Sparkify

Folders and files

Latest commit

History

Repository files navigation

Project Sparkify

Data Warehouse

Data Lake

Pipeline

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages