Skip to content

vittoriopolverino/mapreduce-wordcount

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce Word Count

A naive python implementation (no distributed computing) to mimic and understand the MapReduce paradigm.


📜 Table of Contents


🧐 About

MapReduce is a programming model for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The "MapReduce System" is usually composed of three functions (or steps):

  • Map: The map function, also referred to as the map task, processes a single key/value input pair and produces a set of intermediate key/value pairs.
  • Shuffle: The shuffle function transfer data from Mapper to Reducer. It is a mandatory operation for reducers to proceed their jobs further as the shuffling process serves as input for the reduce tasks.
  • Reduce: The reduce function, also referred to as the reduce task, consists of taking all key/value pairs produced in the map phase that share the same intermediate key and producing zero, one, or more data items.

🏁 Getting Started

Use the Pipfile to install packages in the virtualenv:

pipenv install
pipenv install --dev

💻 Usage

Run the MapReduce example:

pipenv run wordcount

🐛 Test

Run Unit and Integration tests

pipenv run test

⛏️ Built Using


✏️ Authors

About

MapReduce python implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages