Skip to content

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Notifications You must be signed in to change notification settings

markush81/fastdata-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast Data Cluster

Warning Because this repo is based upon VirtualBox which isn't available vor Apple Silicon based Macs, i have to deprecated this repo.

2023: there are test builds of VirtualBox for Apple Silicon, but so far it is not stable enough.

Content

In case you need a local cluster providing Kafka, Cassandra and Spark you're at the right place.

Prerequisites

  • Vagrant (tested with 2.2.14)
  • VirtualBox (tested with 6.1.18)
  • Ansible (tested with 2.10.5)
  • The VMs take approx 18 GB of RAM, so you should have more than that.

⚠️ Vagrant might ask you for your admin password. The reason behind is, that vagrant-hostsupdater is used to have the vms available with their names in your network.

Init

git clone https://github.com/markush81/fastdata-cluster.git
vagrant up

Cluster

The result if everything wents fine should be

FastData Cluster

Coordinates

Servers

IP Hostname Description Settings
192.168.10.2 kafka-1 running a kafka broker 1024 MB RAM
192.168.10.3 kafka-2 running a kafka broker 1024 MB RAM
192.168.10.4 kafka-3 running a kafka broker 1024 MB RAM
192.168.10.5 cassandra-1 running a cassandra node 1024 MB RAM
192.168.10.6 cassandra-2 running a cassandra nodee 1024 MB RAM
192.168.10.7 cassandra-3 running a cassandra node 1024 MB RAM
192.168.10.8 hadoop-1 running a yarn resourcemanager and nodemanager, hdfs namenode, spark distribution, flink distribution 4096 MB RAM
192.168.10.9 hadoop-2 running a yarn nodemanager, hdfs datanode 4096 MB RAM
192.168.10.10 hadoop-3 running a yarn nodemanager, hdfs datanode 4096 MB RAM

Connections

Name
Zookeeper kafka-1:2181,kafka-2:2181,kafka-3:2181
Kafka Brokers kafka-1:9092,kafka-2:9092,kafka-3:9092
Cassandra Hosts cassandra-1,cassandra-2,cassandra-3
YARN Resource Manager http://hadoop-1:8088
HDFS Namenode UI http://hadoop-1:9870

Usage

Cassandra

lucky:~ markus$ vagrant ssh cassandra-1
[vagrant@cassandra-1 ~]$ cqlsh
Connected to analytics at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 4.0-beta4 | CQL spec 3.4.5 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh> CREATE KEYSPACE example WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
cqlsh> USE example;
cqlsh:example> CREATE TABLE users (id UUID PRIMARY KEY, lastname text, firstname text );
cqlsh:example> INSERT INTO users (id, lastname, firstname) VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'Mustermann','Max') USING TTL 86400 AND TIMESTAMP 123456789;
cqlsh:example> SELECT * FROM users;

 id                                   | firstname | lastname
--------------------------------------+-----------+------------
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |       Max | Mustermann

(1 rows)

Check Cluster Status:

[vagrant@cassandra-1 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load        Tokens  Owns  Host ID                               Rack
UN  192.168.10.5  105.69 KiB  16      ?     74e6aff4-3561-4f48-bdbb-d030a9da0c01  rack1
UN  192.168.10.7  100.65 KiB  16      ?     3b428824-a9f2-4a49-ae1d-3639fc584e92  rack1
UN  192.168.10.6  100.66 KiB  16      ?     4418963f-5e94-4046-9cc1-f9614c6eae6e  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Zookeeper

[vagrant@kafka-1 ~]$ zookeeper-shell.sh kafka-1:2181/
Connecting to kafka-1:2181/
Welcome to ZooKeeper!
JLine support is disabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
ls /
[admin, brokers, cluster, config, consumers, controller, controller_epoch, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]
ls /brokers/ids
[0, 1, 2]

Kafka

Topic Creation

lucky:~ markus$ vagrant ssh kafka-1
[vagrant@kafka-1 ~]$ kafka-topics.sh --create --zookeeper kafka-1:2181 --replication-factor 2 --partitions 6 --topic sample
Created topic "sample".
[vagrant@kafka-1 ~]$ kafka-topics.sh --zookeeper kafka-1 --topic sample --describe
Topic:sample	PartitionCount:6	ReplicationFactor:2	Configs:
	Topic: sample	Partition: 0	Leader: 1	Replicas: 1,2	Isr: 1,2
	Topic: sample	Partition: 1	Leader: 2	Replicas: 2,3	Isr: 2,3
	Topic: sample	Partition: 2	Leader: 3	Replicas: 3,1	Isr: 3,1
	Topic: sample	Partition: 3	Leader: 1	Replicas: 1,3	Isr: 1,3
	Topic: sample	Partition: 4	Leader: 2	Replicas: 2,1	Isr: 2,1
	Topic: sample	Partition: 5	Leader: 3	Replicas: 3,2	Isr: 3,2
[vagrant@kafka-1 ~]$

Producer

[vagrant@kafka-1 ~]$ kafka-console-producer.sh --broker-list kafka-1:9092,kafka-3:9092 --topic sample
Hey, is Kafka up and running?

Consumer

[vagrant@kafka-1 ~]$ kafka-console-consumer.sh --bootstrap-server kafka-1:9092,kafka-3:9092 --topic sample --from-beginning
Hey, is Kafka up and running?

YARN

The YARN ResourceManager UI can be accessed by http://hadoop-1:8088, from there you can navigate to your application .

YARN

Spark

Spark Examples

lucky:~ markus$ vagrant ssh hadoop-1
[vagrant@hadoop-1 ~]$ spark-submit --master yarn --class org.apache.spark.examples.SparkPi --deploy-mode cluster --driver-memory 512M --executor-memory 512M --num-executors 2 /usr/local/spark-3.0.2-bin-without-hadoop/examples/jars/spark-examples_2.12-3.0.2.jar 1000

Flink

Flink Example Run

Access Flink UI:

http://hadoop-1:8088/cluster -> Click ID Link of "Flink session cluster" and then "Tracking URL: ApplicationMaster"

Submit a job:

[vagrant@hadoop-1 ~]$ HADOOP_CLASSPATH=$(hadoop classpath) flink run /usr/local/flink-1.12.1/examples/streaming/WordCount.jar

Flink

Further Links

About

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published