GitHub - vinayak0792/MapReduce: Hadoop map-reduce to derive some statistics from Yelp Dataset

vinayak0792 / MapReduce Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Hadoop map-reduce to derive some statistics from Yelp Dataset

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Readme		Readme
yelp1.jar		yelp1.jar
yelp2.jar		yelp2.jar
yelp3.jar		yelp3.jar
yelp3.java		yelp3.java
yelp4.jar		yelp4.jar
yelp4.java		yelp4.java
yelpq1.java		yelpq1.java
yelpq2.java		yelpq2.java

Repository files navigation

Simple Map-Reduce Jobs to perform analysis on the Yelp dataset, such as:

1. List the unique categories of business located in “Palo Alto”.

2. Find the top ten rated businesses using the average ratings. Top
rated business will come first. Recall that 4th column in review.csv
file represents the rating.

3. List the  business_id , full address and categories of the Top 10
businesses using the average ratings. (Reduce side join and job chaining technique)

4. List the 'user id' and 'rating' of users that reviewed businesses
located in Stanford (In Memory Join technique)







Find below the commands to be run to execute the jar files:

1 : hadoop jar yelp1.jar yelpq1 /business.csv /output1.1
2 : hadoop jar yelp2.jar yelpq2 /review.csv /outpuut2.1
3 : hadoop jar yelp3.jar yelpq2 /review.csv /outpuut3.2 /business.csv /outpuut3.3
4 : hadoop jar yelp4.jar yelp4 /review.csv /business.csv /output4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme

Readme

yelp1.jar

yelp1.jar

yelp2.jar

yelp2.jar

yelp3.jar

yelp3.jar

yelp3.java

yelp3.java

yelp4.jar

yelp4.jar

yelp4.java

yelp4.java

yelpq1.java

yelpq1.java

yelpq2.java

yelpq2.java

Repository files navigation

About

Releases

Packages

Languages

vinayak0792/MapReduce

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages