Skip to content

EQWorks/ws-data-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Work Sample for Data Aspect, PySpark Variant

What is this for?

Environment setup

If you already have a functioning Apache Spark configuration, you can use your own. For your convenience, the provided docker-compose.yml is based on the jupyter/pyspark-notebook image. You will need to have Docker and Docker Compose configured on your computer. Check out the Docker Desktop documentation for details.

You can run docker-compose up and follow the prompt to open the Jupyter Notebook UI (looks like http://127.0.0.1:8888/?token=<SOME_TOKEN>).

The given data/ directory mounts as a Docker volume at ~/data/ for easy access:

import os
from pyspark.sql import SparkSession

spark = SparkSession.builder.master('local').getOrCreate()
df = spark.read.options(
    header='True',
    inferSchema='True',
    delimiter=',',
).csv(os.path.expanduser('~/data/DataSample.csv'))

example

Submission

Please host your solution as one or multiple Notebooks (.ipynb) in a public git remote repository and reply with its link to the email thread you initially received to work on this work sample.

Releases

No releases published

Packages

No packages published