This tutorial will provide you with the instructions for setting up fully distributed multi-broker cluster of Apache Kafka.
Abstract
Apache Kafka is an open source, distributed, high-throughput publish-subscribe messaging system. It is often leveraged in real-time stream processing systems. Apache Kafka can be deployed into following two schemes -
- Pseduo distributed multi-broker cluster - All Kafka brokers of a cluster are deployed on a single machine.
- Fully distributed multi-broker cluster - Each of Kafka brokers of a cluster is deployed on a separate machine.
We will be providing the instructions for setting fully distributed multi-broker cluster in this tutorial. Instructions are very similar to those for pseudo-distributed multi-broker cluster.
Pre-requisites
Here are the software and hardware requirement to follow the instructions in this tutorial -
- 2 Physical or Virtual Machines ideally each of those with 4 GB RAM, 2 CPU cores and 20 GB disk space
- Linux operating system as Apache Kafka does not officially support Windows as yet
- JDK 8 with JAVA_HOME pointing to it
Installing Apache Kafka
Installing Apache Kafka is as simple as downloading its binaries and extracting those to your file system.
You can download the latest version of Apache Kafka from offical website. You would see multiple binary downloads for different scala versions (2.10 and 2.11). If you are going to use Scala APIs, download the one with your scala version. In case of Java APIs, you can just download any of these.
At the time of writing this tutorial, latest version is 0.10.0.1 so we will be installing this version.
Once you have downloaded binary file, extract it to a directory where you would like it to execute from. In my case, i have extracted it to /opt/big-data/kafka/kafka_2.11-0.10.0.1 path.
Configuring Apache ZooKeeper
We first need to check configuration for Apache ZooKeeper. If you are using a separate ZooKeeper cluster, please skip this and next step related to ZooKeeper.
We can configure or start ZooKeeper on any one of machines. You can check ZooKeeper configuration by executing following command from Kafka home directory (for me - /opt/big-data/kafka/kafka_2.11-0.10.0.1) -
vi config/zookeeper.properties
And check for following properties -
- dataDir - It should point to a directory where you want ZooKeeper to save its data
- clientPort - Defaults to 2181. Leave it as it is.
Here are how these properties have been configured in my case -
# this should ideally be a well maintained and properly backed up directory
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
Starting up Apache ZooKeeper
We can now start our ZooKeeper by running following command from Kafka home directory -
./bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
You can then use below command to verify if it has started -
jps
You will see a process called QuorumPeerMain if ZooKeeper has started successfully -
10956 QuorumPeerMain
We will use <zookeeper.ip> placeholder as IP of machine running ZooKeeper. You obviously need to use your ZooKeeper machine's actual ip.
Configuring Apache Kafka
We will be creating a cluster of two Kafka instances (brokers) running on two machines. We will be mainly focussing on following properties -
- broker.id - Id of the broker i.e. an integer. Each broker in a cluster needs to have a unique id.
- log.dirs - Directory where you want Kafka to commit its message. Not to be confused it with usual log files.
- port - Port on which Kafka will accept connections from producers and consumers
- zookeeper.connect - Comma separate list of ZooKeeper nodes. E.g. hostname1:port1,hostname2:port2. In our case, we will set it to localhost:2181
You need to follow below instructions on each of the machines to configure Kafka -
- Kafka property file is already present in config directory of our Kafka installation and can be edited using below command from Kafka home directory -
vi config/server.properties
Properties for Broker on first VM -
broker.id=0
# The port the socket server listens on
port=9092
# A comma seperated list of directories under which to store log files
# this should ideally be a well maintained and properly backed up directory
log.dirs=/tmp/kafka-logs
zookeeper.connect=<zookeeper.ip>:2181
Properties for Broker on Second VM (notice that broker id is different from other broker) -
broker.id=1
# The port the socket server listens on
port=9092
# A comma seperated list of directories under which to store log files
# this should ideally be a well maintained and properly backed up directory
log.dirs=/tmp/kafka-logs
zookeeper.connect=<zookeeper.ip>:2181
Starting up Apache Kafka Cluster
Finally, it's now time to start our Apache Kafka brokers. Once we have configured Kafka brokers, starting Apache Kafka is as simple as executing following commands on each of the machines-
# start broker
./bin/kafka-server-start.sh -daemon config/server.properties
You can use jps command and check whether Kafka broker process is running.
Testing Apache Kafka Cluster
We have successfully started two Kafka brokers on two different machine and checked that Kafka and ZooKeeper processes are running fine. However, we still need to check that cluster is functioning properly. In order to do that we will use Kafka provided utilities to create a topic, sending message and consuming messages.
We will start with creating a test-topic using below command on any of the machines (change zookeeper url as per your setting) -
./bin/kafka-topics.sh --create --zookeeper <zookeeper.ip>:2181 --topic test-topic --partitions 2 --replication-factor 2
# you will get below message if it is created successfully
Created topic "test-topic".
We will now send couple of sample messages to this newly created topic using Kafka console producer utility. After executing below command, type your message and press enter to send it through.
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic
# type your message and press enter
First test message
Second test message
Next step is to consume these mesasges from Kafka console consumer utility using below command. Message that we sent using producer will be printed on console after successful execution of command -
./bin/kafka-console-consumer.sh --zookeeper <zookeeper.ip>:2181 --topic test-topic --from-beginning
# below should be output of this command
First test message
Second test message
We can hence conclude that our Apache Kafka cluster is ready for our applications.
Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.