Skip to content

Hadoop map-reduce to derive some statistics from Yelp Dataset

Notifications You must be signed in to change notification settings

vinayak0792/MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Map-Reduce Jobs to perform analysis on the Yelp dataset, such as:

1. List the unique categories of business located in “Palo Alto”.

2. Find the top ten rated businesses using the average ratings. Top
rated business will come first. Recall that 4th column in review.csv
file represents the rating.

3. List the  business_id , full address and categories of the Top 10
businesses using the average ratings. (Reduce side join and job chaining technique)

4. List the 'user id' and 'rating' of users that reviewed businesses
located in Stanford (In Memory Join technique)







Find below the commands to be run to execute the jar files:

1 : hadoop jar yelp1.jar yelpq1 /business.csv /output1.1
2 : hadoop jar yelp2.jar yelpq2 /review.csv /outpuut2.1
3 : hadoop jar yelp3.jar yelpq2 /review.csv /outpuut3.2 /business.csv /outpuut3.3
4 : hadoop jar yelp4.jar yelp4 /review.csv /business.csv /output4.1




About

Hadoop map-reduce to derive some statistics from Yelp Dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages