Skip to content

Exercise of using the Streaming API with Hadoop to determine the word count of Wikipedia articles.

Notifications You must be signed in to change notification settings

andrejanesic/Hadoop-Beginner-Exercise-Streaming-Word-Count

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop Beginner Exercise: Streaming Word Count

Hadoop beginner exercise - using the Streaming API to determine the word count of select Wikipedia articles.

Running

The solution is available as a Jupyter notebook.

Open up the main.ipynb notebook file to view the solution, along with the pre-computed results.

Or, you may re-run the solution on your device this way:

  1. Run make to build the Makefile with Docker
  2. Go to localhost:8888
  3. Open main.ipynb in the Jupyter notebook
  4. Run all the cells

Author

Author

About

Exercise of using the Streaming API with Hadoop to determine the word count of Wikipedia articles.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages