Skip to content

arjayjean/comixology_new_releases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComiXology's Weekly Featured New Releases ETL

🧰 Languages and Tools


Purpose:

  • Every week, there are new comic books getting released on Amazon's cloud-based digital distribution platform for comics, ComiXology. As a comic book enthusiast, I wanted to create an ETL process that is automated using AWS services, extracts the appropriate data, transformed by Python, and is loaded into a CSV file, that would then be stored in Amazon S3.

⚙ ETL Process: AWS Diagram
  1. Every Tuesday at 9am, a cron job using Amazon EventBridge Scheduler will call for an AWS Lambda function that is an ETL process created with Python to collect ComiXology's weekly featured new releases.

  2. The ETL process will start with the extraction of the data using BeautifulSoup.

  3. Once the data has been extracted, there will be a process of data cleaning and formatting, so that it can be loaded into a CSV file.

  4. After the cleaning, a CSV file will be created, then the data will be loaded into it (Which will end the ETL process).

  5. With the completion of the ETL process, the recently created CSV file will be stored in AWS S3. This is possible with Boto3, an AWS SDK for Python (This will be the end of the first Lambda function).

  6. The storing of the file will then trigger another Lambda function that was set to be triggered when an object has been created in the defined S3 bucket.

  7. Once triggered, the Python script within the Lambda function will publish a message and send it to my email, that is a subscriber to a Topic I have created in SNS.