spark-drools

This Project will show how to use the spark job to fire the rules.
Simple Credit Approval logic, if the Applicant Credit score is greater than 600, fire the rules (approve the loan).
Approval.drl (drool file) holds the business logic.
I have checked in the sucessful output.log.
Project can be easily imported as an Maven Project and fitted to be running in any ide.
Idea is based on this video https://www.redhat.com/en/about/videos/red-hat-consulting-decision-manager-and-apache-spark-integration.

Deployment in OpenShift - Oshinko Cluster

oc new-project demo
oc create -f https://raw.githubusercontent.com/Pkrish15/spark-drools/master/resources.yaml
oc new-app oshinko-webui
oc get routes
oc new-app --template oshinko-java-spark-build-dc
-p APPLICATION_NAME=spark-drools
-p APP_MAIN_CLASS=com.redhat.gpte.App
-p GIT_URI=https://github.com/Pkrish15/spark-drools
-p APP_FILE=spark-drools.jar
You can find the Spark-Cluster Logs with Output as "Number of Applicant Approved:5"
Please note the cluster will immediately terminates, once the job completes. You can observe the logs in the Openshift WebUI console.

Why you need Apache Spark for this usecase? Can't be a simple drools application or RHDM?

Ofcourse, we can use the RedHatDecisionManager UI to upload the rules and can be dealth with any UI framework to display.
Apache Spark Performs sequences of Operations like Inputting the data -> Filter the data -> Tag the Data (Business Logics like loan defaulters ) -> Model -> as the useful business data. Hence huge amount of Java Code (rule logic)in tagging the data for writing/cleansing the data for different transactions and scenarios which is not necessary.
Instead you can use the RedHatDecisionManager which completes the seperates the business logic with the application scope and make the clear abstractions with business rules and application logic. Developer can easily maintain these rules in RedHatDecision Manager.
Apart from these Spark Provides several performance benefits by handling and parallelly processing huge amount of data and also in near-realtime transactions.
Whenever the RedhatDecisionManager Admin Pushes the rules, we can make it available as a decent data through spark job with UI.

How can it be performed?

Please check the code which is very simple and self explanatory.

What are the important concepts used in Apache Spark?

BroadCast Variables - Advanced Concept in Spark.
SparkContext Parallelization - Beginner Level.

BroadCast Variables in Spark

Where "m" is the "rules" which is shared/broadcasted across the nodes.

When to use BroadCast Variables in Spark

Before running each tasks on the available executors, Spark computes the task’s closure. The closure is those variables and methods which must be visible for the executor to perform its computations on the RDD.
If you have huge array that is accessed from Spark Closures, for example some reference data, this array will be shipped to each spark node with closure.
For example if you have 10 nodes cluster with 100 partitions (10 partitions per node), this Array will be distributed at least 100 times (10 times to each node).
If you use broadcast it will be distributed once per node using efficient p2p protocol.

Important Point to use BroadCast Variables.

Once we broadcasted the value to the nodes, we shouldn’t make changes to its value to make sure each node have exact same copy of data. The modified value might be sent to another node later that would give unexpected results.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
.gitignore		.gitignore
README.md		README.md
Screenshot from 2018-06-22 14-42-22.png		Screenshot from 2018-06-22 14-42-22.png
dependency-reduced-pom.xml		dependency-reduced-pom.xml
output.log		output.log
pom.xml		pom.xml
resources.yaml		resources.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-drools

Deployment in OpenShift - Oshinko Cluster

Why you need Apache Spark for this usecase? Can't be a simple drools application or RHDM?

How can it be performed?

What are the important concepts used in Apache Spark?

BroadCast Variables in Spark

When to use BroadCast Variables in Spark

Important Point to use BroadCast Variables.

About

Releases

Packages

Languages

Neuw84/spark-drools

Folders and files

Latest commit

History

Repository files navigation

spark-drools

Deployment in OpenShift - Oshinko Cluster

Why you need Apache Spark for this usecase? Can't be a simple drools application or RHDM?

How can it be performed?

What are the important concepts used in Apache Spark?

BroadCast Variables in Spark

When to use BroadCast Variables in Spark

Important Point to use BroadCast Variables.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages