Skip to content

nlarki/Fantasy-League-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fantasy Premier League data ingestion and analysis ⚽

Overview

The core premise of this project is to showcase what i have learned whilst partaking in the Data Talks Club Data Engineering course. I will be utilising multiple tools in order to create an effective pipeline that can ingest and manipulate the sourced FPL data into a finalised visual dashboard which you can view here!


What is Fantasy Premier League

Fantasy Premier league is an online game that casts you in the role of a Fantasy manager of Premier League players. You must pick a squad of 15 players from the current Premier League season, who score points for your team based on their performances for their clubs in PL matches.

Problem description

footballer

The project will aim to extract multiple years of FPL data for analysis so that we can take a deeper look into individual stats of players and teams across the 2016 to 2023 seasons.

Key insights to be extracted:

  • Who are the most inform goal scorers
  • Who are the most inform assisters
  • What players influence their teams the most
  • What players have the highest points
  • Who are the most expensive players
  • How many goals are scored per season

Technologies

I will use the technolgies below to help with the creation of the project:

  • Cloud: GCP
    • Data Lake: GCS
    • Data warehouse: Big Query
  • Terraform: Infrastructure as code (IaC) - creates project configuration for GCP to bypass cloud GUI.
  • Workflow orchestration: Prefect (docker)
  • Transforming data: DBT
  • Data Visualisation: SAS Visual Analytics

Architecture visualised:

Dashboard examples

The dashboard allows the user to ingest a highlevel analysis of both players and teams across several seasons in the Barclays Premier League. You can view the dashboard here

Home page for visualisation:

alt text

Overview analysis of all seasons:

alt text

Individual team analysis:

alt text

How to run the project

  1. Clone the repo and install the neccesary packages
pip install -r requirements.txt
  1. Next you will want to setup your Google Cloud environment
export GOOGLE_APPLICATION_CREDENTIALS=<path_to_your_credentials>.json
gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS
gcloud auth application-default login
  1. Set up the infrastructure of the project using Teeraform
  • If you do not have Terraform installed you can install it here and then add it to your PATH
  • Once donwloaded run the following commands:
cd terraform/
terraform init
terraform plan -var="project=<your-gcp-project-id>"
terraform apply -var="project=<your-gcp-project-id>"
  1. Run python code in Prefect folder
  • After installing the required python packages, prefect should be installed
  • You can setup the prefect server so that you can access the UI using the command below:
prefect orion start
  • access the UI at: http://127.0.0.1:4200/
  • You will then want to change out the blocks so that they are registered to your credentials for GCS and Big Query. This can be done in the Blocks options
  • You can keep the blocks under the same names as in the code or change them. If you do change them make sure to change the code to reference the new block name
  • Go back to the terminal and run:
cd flows/
python etl_gcs_player.py
  • The data will then be stored both in your GCS bucket and in Big Query
  • If you want to run the process in Docker you can run the commands below:
cd Prefect/
docker image build -t <docker-username>/fantasy:fpl .
docker image push <docker-username>/fantasy:fpl
  • the docker_deploy.py will load the flows into deployment area of prefect so that they can then be run directly from your container.
cd flows/
python docker_deploy.py
  • will start the agent to listen for job flows to run
prefect agent start
  • run the containerized flow from CLI:
prefect deployment run etl-parent-flow/docker_player_flow --param yr=[16,17,18,19,20,21,22] --param yrs=[17,18,19,20,21,22,23]"
  1. Running the dbt flow
  • Create a dbt account and log in using dbt cloud here
  • Once logged in clone the repo for use
  • in the cli at the bottom run the following command:
dbt run
  • this will run all the models and create our final dataset "final_players"
  • final_players will then be placed within the schema chosen when setting up the project in dbt.
  1. How the lineage should look once run: alt text

  2. Visualisation choices

  • You can now take the final_players dataset and use it within Looker or another data visualisation tool like SAS VA which i used.

About

Creation of a Fantasy Premier League data pipeline for analysis of both team & player performance. Technologies include, dbt, Prefect, Terraform & docker

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published