This project works with Spark Cluster with Hadoop Distributed File System ( HDFS ). It also makes data transformation with RDF format .
To get started you must clone this project and build it with maven. After that you must copy the jar file into $SPARK_HOME folder and run the spark_submit command.
You must have installed HDFS cluster and also Spark Cluster to make this work. You can find instructions how to build your clusters in the following link HDFS and Spark Cluster but this guide is in Greek.
The code is written in Java.
Clone and build the project. You must have maven already installed. After that copy the two configuration files , config.properties and run.properties in the following path
/usr/lib/spark/conf/appConf/
With run.properties you can specify what do you want to do and with config.properties you can specify the paths and folders where the data are. In these files there are comments which will guide you though, but these are in Greek too.
After the built you must copy the Jar file into the $SPARK_HOME folder. Our $SPARK_HOME folder is
/usr/lib/spark/
If you follow our instructions for the cluster you should have the same $SPARK_HOME folder.
After that, you can run the spark_submit command
./spark-submit --master spark://83.212.100.21:7077 --class rdf.RDFReading sparkExerciseFinal10.jar
In - -master parameter you must specify the ips of your's spark master node and you can find this in spark's master WEB UI , which is in master ip and in port 8080.
In - - class parameter you must put - -class rdf.RDFReading which is the name of the main class in our Project.
And the last parameter is the name of the jar file that you copied before into $SPARK_HOME folder.
- Transform your RDF Dataset in Vertical Partitioning format and save the output into HDFS optionally.
- Transform CSV files into Parquet format.
- Make Base Graph Pattern queries in CSV files.
- Make Base Graph Pattern queries in Vertical Partitioning files.
- Make Base Graph Pattern queries in Parquet format files.
- Make Joins queries with two tables in CSV files.
- Make Joins queries with two tables in Vertical Partitioning files.
- Make Joins queries with two tables in Parquet format files.
- Tsotzolas George
- Kleftakis Spiros
See also the list of contributors who participated in this project.
Maven - Dependency Management