Skip to content

farbodahm/stateful-stream-processing

Repository files navigation

stateful-stream-processing

CI Unit Test Actions Status CI Lint Actions Status CI Docker Compose Build Actions Status

My university proposal on Stateful Stream Processing

Problem Description

TODO

Architecture

Architecture TODO

Architecture - AWS Deployment

ArchitectureAWS TODO

Architecture - Dev Env

ArchitectureDevEnv TODO

Development Environment

To run the project in dev env, make sure you have Docker and Docker-Compose installed. Then, to run the infrastructure for Kafka and Database, run:

docker-compose -f infra.docker-compose.yml up

Wait for few minutes and after you see logs related to Kafka working normally, then run the producer and consumer applications:

docker-compose -f app.docker-compose.yml up

This will run the applications. If you changed any part of the applications, including producer, consumer or processor you need to re-build the Docker images.

docker-compose -f app.docker-compose.yml up --build

If you wanted to tear down everything, then simply run:

docker-compose -f infra.docker-compose.yml down
docker-compose -f app.docker-compose.yml down

NOTE: This will also deletes all of the Kafka topics and database, Be careful if you want keep the data produced in it.

Interesting Information from Twitter model

Architecture

There are many interesting pieces of information that can be extracted from these tables depending on the specific requirements of your project or analysis. Here are a few examples:

  • User Activity: From the 'Tweets' and 'Comments' tables, we can derive how active each user is. We can calculate metrics like the average number of tweets per user, most active users, or the number of comments per tweet. We can also analyze user activity over time to identify trends or periods of high activity.

  • Most Popular Tweets: By using the 'TweetLikes' table, we can identify the most popular tweets based on the number of likes. We can also see which users get the most likes on their tweets, giving us an idea of influential users within the platform.

  • Social Network Analysis: From the 'UserFollow' table, we can perform social network analysis. This could involve determining who the most influential users are (those with the most followers), identifying clusters of users that tend to follow each other, or even using this data to suggest new users to follow based on a user's existing followees.

  • User Engagement: By combining information from 'Tweets', 'Comments', and 'TweetLikes', we can derive how engaging each user is. For instance, users who have a higher ratio of likes and comments per tweet could be considered more engaging.

  • Popular Topics: By analyzing the text of tweets from the 'Tweets' table, we could identify trending topics or commonly mentioned phrases or keywords.

  • User Interaction: From the 'Comments' table, we can analyze user interactions. For instance, we could identify pairs of users who often interact with each other or discover the types of tweets that generate the most discussion.

  • Demographics: If the 'Users' table includes demographic information (like location, age, or gender), we could analyze Twitter use and behavior by demographic group.