Skip to content

A sample code to load one million sales order records (json.gz) into bigquery and executed a select SQL command with groupby and aggregation method.

License

Notifications You must be signed in to change notification settings

michael-hll/BigQueryStudy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

BigQuery Study

A sample code to load one million sales order records into bigquery and executed a select SQL command with groupby and aggregation method.

It's a maven java command project:

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-bigquery</artifactId>
    <version>0.8.1-beta</version>
</dependency>
<dependency>
    <groupId>org.json</groupId>
    <artifactId>json</artifactId>
    <version>20160810</version>
</dependency>

In order to use Google's BigQuery java library, you need follow the steps from below link first: https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-java

It also uses the BigQuerySnippets sample code from Github: https://github.com/GoogleCloudPlatform/google-cloud-java/blob/master/google-cloud-examples/src/main/java/com/google/cloud/examples/bigquery/snippets/BigQuerySnippets.java
But I enhanced it to meet my requirement.

The table schema with json format looks like below:

{"company_id":"1","user_firstname":"michael","user_lastname":"han","product_price":"5000.50","product_name":"iPhone6","sales_order_number":"001","sales_order_date":"1486283602455","user_id":"1","sales_order_id":"1","product_id":"1","company_name":"QAD","customer_name":"HP","customer_id":"1","status":"open"}

Execution & Output

After you import it into your Eclipse project, run the java main method, you can see the following output in the console:
The important thing is it used about 2 seconds to execute the SQL command with 1 million rows. The BigQuery official said it could execute SQL query command fast even with 1 trillion records.

Step 1 start: Generate json data file at: Sun Feb 19 13:23:02 CST 2017
A json file with records[1000000] :/Users/Janine/Documents/workspace/BigQueryStudy/SalesOrder_1000000.json
Step 1 end: Generate json data file at: Sun Feb 19 13:23:14 CST 2017

Step 2 start: Compress json data file at: Sun Feb 19 13:23:14 CST 2017
A json.gz file was created: /Users/Janine/Documents/workspace/BigQueryStudy/SalesOrder_1000000.json.gz
Step 2 end: Compress json data file end at: Sun Feb 19 13:23:16 CST 2017

Step 3 start: Get BigQuery instance at: Sun Feb 19 13:23:16 CST 2017
Step 3 end: Get BigQuery instance at: Sun Feb 19 13:23:16 CST 2017

Step 4 start: Create dataset and table in google bigquery at: Sun Feb 19 13:23:16 CST 2017
DataSet: sales_order_ds, table: sales_order were created!
Step 4 end: Create dataset and table in google bigquery at: Sun Feb 19 13:23:27 CST 2017

Step 5 start: Loading json.gz file into google bigquery at: Sun Feb 19 13:23:27 CST 2017
A json.gz file was loaded into bigquery: /Users/Janine/Documents/workspace/BigQueryStudy/SalesOrder_1000000.json.gz
Step 5 end: Loading json.gz file into google bigquery at: Sun Feb 19 13:24:22 CST 2017

Step 6 start: Executing sql query command at: Sun Feb 19 13:24:22 CST 2017
A query command :
SELECT product_name, sum(product_price) total_amount FROM [bigquery-study-157507:sales_order_ds.sales_order] group by product_name order by total_amount desc LIMIT 10
was executed! Executing result lists below:
product 999, 9.99E7, 
product 998, 9.98E7, 
product 997, 9.97E7, 
product 996, 9.96E7, 
product 995, 9.95E7, 
product 994, 9.94E7, 
product 993, 9.93E7, 
product 992, 9.92E7, 
product 991, 9.91E7, 
product 990, 9.9E7, 
Step 6 end: Executing sql query command at: Sun Feb 19 13:24:24 CST 2017

Celebrating to yourself!!!

About

A sample code to load one million sales order records (json.gz) into bigquery and executed a select SQL command with groupby and aggregation method.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages