GitHub - seokyim8/Steam_data_pipeline: A data pipeline that provides with web-scraped information from Steam

Creator: Seok Yim (Noah)

Do you want to be the PIONEER of soon-to-POP-OFF games? Then you're gonna like this...

Title: Steam Data Pipeline

Project Summary:

A data pipeline that regularly scrapes, cleans, stores, and publishes data for newly released games on Steam. The data visualization is taken care of by Apache Superset (publicly accessible).

*** Preview ***

Website link:
http://18.212.126.33:8080/superset/dashboard/1/?standalone=3&show_filters=1

Authentication for anonymous users (Anyone can view it with these credentials):
ID: public
password: public

Description:

I frequently saw websites/projects with Steam-related data for popular(top 100) games but never saw one primarily focused on new releases on Steam. Thus, I decided to make one myself.

Technologies Used:

Python, MYSQL, AWS(EC2, RDS), Docker, Scrapy, Apache Superset, Selenium

Steps Taken:

Created a Scrapy project that scrapes data from the official Steam website (https://store.steampowered.com/search/?sort_by=Released_DESC&supportedlang=english).
Added selenium to deal with infinite scrolling. Created a Python scheduler with Apscheulder along with Python asyncio.
Launched an EC2 and RDS instance, each for persisting the program and running the MYSQL database, respectively.
Created a Docker image that downloads the Python dependencies along with the Chrome browser.
On EC2, initialized the containerized project along with the containerized Apache Superset image.
Made the dashboard publicly available.

Final Product:

- A dashboard/BI tool that updates every day at 7:30 am EST(with a couple extra updates during the day) with 1,000 entries from Steam.
- Contains visual expressions of the data that facilitate individuals in understanding the latest trends in games.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
steam_scrapy		steam_scrapy
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steam_scrapy

steam_scrapy

README.md

README.md

Repository files navigation

Project Summary:

Description:

Technologies Used:

Steps Taken:

Final Product:

About

Releases

Packages

Languages

seokyim8/Steam_data_pipeline

Folders and files

Latest commit

History

steam_scrapy

steam_scrapy

README.md

README.md

Repository files navigation

Project Summary:

Description:

Technologies Used:

Steps Taken:

Final Product:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages