Skip to content

☁ Batch processing Word-Letter Count application with a customed k8s scheduler

Notifications You must be signed in to change notification settings

matchy-at-snu/distributed-system-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

M1522.006300 Distributed Systems

This is the course project folder of M1522.006300 Distributed Systems of Group 17.

Project Description

The goal of this project is to deploy and manage a prototype cloud cluster running batch processing WordLetterCount applications. There are two WordLetterCount applications implemented in different ways: one used the Spark API, the other used WordCount API and a self-designed resource scheduler.

Developer Tutorials

Refer to the docs folder for useful guides. `` The project specification is specified in Specification.md.

Refer to GCP guide for a detailed tutorial on how to configure, access and use your GCP clusters.

Our project ID is peaceful-fact-294309, you can use the web-based dashboard GCP Console to view our cluster, VMs and Pods.

To-Dos

  • Deploy Google Dataproc on GKE (ref: Dataproc on Google Kubernetes Engine)
  • Install WordCount locally to test
  • Test WordCount on GKE
    • Deploy Hadoop on GKE
    • Tweak Hadoop deployment, integration with GCS