Skip to content

An introductory lab for PySpark MapReduce framework made for CSSE434

Notifications You must be signed in to change notification settings

lamdav/SparkMapReduceLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark MapReduce Lab

Description:

This is an introductory lab in using PySpark to perform rudimentary MapReduce jobs. This assumes that the user has prior knowledge of Python and the concept of MapReduce. Furthermore, this assumes that the user has Spark running on a Hadoop cluster. That is, installation details have been omitted.

This was written for the class CSSE434 as a part of our research project.

Instructions:

To work through this lab, please clone the repo. To do so on the command line, execute the following:

  $ git clone https://github.com/lamdaV/SparkMapReduceLab.git

Once the project has been cloned, read through the introduction and work through its example. With the introduction read and the example worked through, attempt to work on the wordCountTask and friendsListTask.

What's Next?

If you would like to learn more about Spark and what it is capable of, try checking out the Spark Machine Learning Lab

About

An introductory lab for PySpark MapReduce framework made for CSSE434

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages