BigData-ReviewsAnalysis

Background

In the era of information explosion,using local machines to analyse big data might take hours to generate worth trends or strategies. This project is to demonstrate how to use AWS services and google Colab to process big data.

ETL-AWS-COLAB

Create an AWS account
Connect google colab with an ipynb file
In AWS(RDS), create a data base
Follow References 1 to build up a connection between local progreSQL and RDS
Extract datasets from https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt
Use Spark to rocess ETL method to clean the data
Use Spark .write.jdbc method to load data into progreSQL

Content:

Project
├── Image
│   ├──  kitchen_review_info.png
│   ├── kitchen_customers.png
│   ├── kitchen_products.png
│   ├── kitchen_vine_info.png
│   ├── tools_customers.png
│   ├── tools_products.png
│   ├── tools_review_info.png
│   └── tools_vine_info.png
├── README.md
├── requirements.txt
├── reviews_us_Kitchen.ipynb
└── reviews_us_Tools.ipynb

Prerequisites

A Colab account - Colab Notebooks
An AWS account - S3 and RDS service
Remember to closely monitor any AWS resources that you choose to use. It’s crucial that you clean up and stop, or shut down any AWS resources to avoid accruing additional costs.
S3 bucket permission setting:

{
    "Version": "2012-10-17",  
    "Statement": [  
        {  
            "Sid": "getobject",  
            "Effect": "Allow",  
            "Principal": "*",  
            "Action": "s3:GetObject",  
            "Resource": "<bucket codes>/*"  
        }
    ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigData-ReviewsAnalysis

Background

ETL-AWS-COLAB

Content:

Prerequisites

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Image		Image
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
reviews_us_Kitchen.ipynb		reviews_us_Kitchen.ipynb
reviews_us_Tools.ipynb		reviews_us_Tools.ipynb

LynHJ/BigData-ReviewsAnalysis

Folders and files

Latest commit

History

Repository files navigation

BigData-ReviewsAnalysis

Background

ETL-AWS-COLAB

Content:

Prerequisites

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages