Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

[ET4310] Supercomputing for Big Data - Practical Assignments

License

Notifications You must be signed in to change notification settings

joined/ET4310-SupercomputingForBigData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ET4310] Supercomputing for Big Data

This repository contains my personal solutions to the 3 practical assignments of the ET4310 "Supercomputing for Big Data" course at TU Delft, taught in Q1 2016/17.

The three Lab folders contain the code and the report for each of the assignments.

First assignment

Part 1

Data exploration on Wikipedia page view statistics

Part 2

Using Spark streaming to collect tweets and compute statistics

Part 3

Exploration of the output of part 2 to compute more statistics

Second assignment

Using the IMDB dumps to compute the degrees of separation from Kevin Bacon to a specified actor

Third assignment

Cluster-Based Apache Spark implementation of the GATK DNA Analysis Pipeline

Part 1

Interleaving DNA reads from FASTQ files

Part 2

DNA analysis pipeline implementation

Part 3

Interacting with HDFS

License and Copyright

See LICENSE.

About

[ET4310] Supercomputing for Big Data - Practical Assignments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published