Skip to content

gstamatakis/bigdataprojects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Projects on various BigData platforms

Projects

This repository hosts the following projects, more can be found on each projects individual README.

Clinking one of the following links takes you directly to the projects module.

Skyline operator implemented in HadooopMR.

Distributed Bloom Filter and Count-Min sketches in Apache Storm.

Scheduling workloads in Spark, Flink, Apex and GPUs based on various metrics.

Calculating the Jaccard Index of terms and categories using a Per-Split SemiJoin algorithm in HadoopMR.

Used frameworks

Links redirect to each framework's download page.

Apache Spark

Apache Storm

Apache Flink

Apache Hadoop

Apache Hive

Apache Kafka

Apache NiFi

Elasticsearch (entire ELK stack)

Docker

The docker folder in the root directory contains various docker-compose.yml files for some of the Frameworks used in these projects. Docker is extremely powerful when complex networking is involved or rapid prototyping is necessary.

Structure

Inside each module there may be more submodules, usually one for each implementation (eg. Spark,Hadoop,...)

Building

This repository uses Maven3 to build its submodules. In order to build all of the submodules simply run the following from the root of this repo.

mvn clean package

Inside each submodule there will be a target directory with the module's uberjar.

To build just a single artifact (eg. The hadoop implementation of the skyline) simply:

mvn clean pacakge -pl :hadoopSkyline

Releases

No releases published

Packages

No packages published

Languages