Recent Tutorials and Articles
    Setting up Apache Storm Cluster
    Published on: 2018-05-25 16:05:43
    Posted By: Amit Kumar

    This article provides step by step instructions to configure and start up Apache Storm 0.9.4 Multi-node cluster. Additionally, this article also covers how to amend Storm script to start the cluster nodes without supervision.

    Abstract

    Apache Storm is a distributed framework for real time processing of Big Data like Apache Hadoop is a distributed framework for batch processing. Apache Storm works on data parallelism principle where in the same code is executed on multiple nodes with different input data.

    This article assumes that you have got the basic idea of technical architecture and components of Apache Storm. If you are totally new to Apache Storm, you are strongly recommended to read the article - Introduction to Apache Storm.

    Pre-requisites

    First thing that we would need in order to install Apache Storm are multiple machines. In this tutorial, We will be utilizing following virtual machines to install Apache Storm -

    Parameter Name Virtual Machine 1 Virtual Machine 2
    Name VM1 VM2
    IP Address 192.168.111.130 192.168.111.132
    Operating System Ubuntu-14.04.1-64bit Ubuntu-14.04.1-64bit
    No of CPU Cores 4 4
    RAM 6 GB 6 GB

    Apart from above machines, please ensure that the following pre-requisites have been fulfilled to ensure that you are able to follow this article without any issues-

    1. JDK 6 or higher installed on all the virtual machines
    2. JAVA_HOME variable set to the path where JDK is installed
    3. Python 2.6.6 installed on all the virtual machines
    4. Apache ZooKeeper installed on atleast VM1. In case, Apache ZooKeeper is not installed, you may install it by following the instructions specified here
    5. Root access on all the virtual machines as all the steps should ideally be performed by root user
    6. Updated /etc/hosts file on both the virtual machines with the IP address of other virtual machines. E.g. /etc/hosts on VM1 will need to have IP address of VM2 along with hostname (VM2). In my case, this additional line in VM1 hosts file looks like 192.168.111.132 VM2.
    Installing Apache Storm

    First step to install Apache Storm is to download its binaries on both the virtual machines. In this article, we will be installing Apache Storm 0.9.4 to set up cluster which can be downloaded from here.

    Once the libraries have been downloaded on the virtual machines, you can extract it to a directory where you would like Apache Storm to be installed. We will refer this directory as $Storm_Base_Dir throughout this tutorial.

    Configuring Multi-node Storm Cluster

    Once Apache Storm binaries has been extracted on all the virtual machines, next step is to configure these. Below diagram depicts the deployment architecture that we will be setting up -

    Since Storm cluster has two types of nodes - Nimbus (Master node) and Supervisor (slave node), there will be some difference in how we configure these nodes. We will be configuring VM1 as Nimbus and VM2 as Supervisor.

    First, let's see the common configuration that needs to be done on all the nodes of cluster. Below are the properties that need to be set in storm configuration file called storm.yaml located in the directory- $Storm_Base_Dir/conf. Please create storm.yaml file if it is already not present in conf directory

    Storm Common Configuration - $Storm_Base_Dir/conf/storm.yaml
    storm.zookeeper.servers:
         - "192.168.111.130"
    
    nimbus.host: "192.168.111.130"
    
    storm.local.dir: "<path-to-data-directory>"
    

    In the above configuration, you would need to replace <path-to-data-directory> with the path to directory where you would like Storm to save its data. Ideally, you should create data directory in Storm base directory.

    Once the common configuration is done on all the nodes, following additional property needs to be added to all Supervisor (slave) nodes. This property defines the port number of Supervisor workers. By default, each Supervisor has four workers. For customizing the number of workers, specify as many unique port numbers as workers you would like Supervisor to utilize. Below property configures Supervisor to have four workers as it only specifies four port numbers.

    Supervisor Additional Configuration - $Storm_Base_Dir/conf/storm.yaml
    
    supervisor.slots.ports:
        - 6700
        - 6701
        - 6702
        - 6703
    
    
    Starting Up Storm Cluster

    Once you are all set up, next step is to start the cluster. Here is the command to start Nimbus under supervision (you will not get control back after you run the script. It means that Nimbus process will not run as daemon process) -

    $Storm_Base_Dir/bin on VM1
    
    ./storm nimbus
    
    

    Next step is to start Supervisor nodes. Below is the command that you need to run on your Supervisor nodes to start Supervisor process under supervision -

    $Storm_Base_Dir/bin on VM2
    
    ./storm supervisor
    
    

    Storm library also contains an UI that can be used to monitor Nimbus, Supervisors and topologies among other things. Here is the command to start this UI interface. This command can be run on any node (Nimbus or supervisor) but we will be starting it on Nimubs (VM1) -

    Next step is to start Supervisor nodes. Below is the command that you need to run on your Supervisor nodes to start Supervisor process under supervision -

    $Storm_Base_Dir/bin on VM1
    
    ./storm ui
    
    

    Once UI command is run successfully, UI can be accessed using the URL - http://:8080/. In my case, it is http://192.168.111.130:8080/

    Starting Storm Cluster without Supervision

    Storm community strongly recommends to start Storm under supervision as Storm is fail-fast and shuts down if any error is propagated till Storm. However if you still want to start Storm without supervision, you may do so by following instructions -

    • Go to $Storm_Base_Dir/bin on all nodes and create a copy of storm script with name storm-daemon (or any name you may prefer)
    • In the script, replace the default value of Fork parameter of operation exec_storm_class with True
    • In the same operation exec_storm_class, replace the line os.spawnvp(os.P_WAIT, JAVA_CMD, all_args) with os.spawnvp(os.P_NOWAIT, JAVA_CMD, all_args)
    References

    Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.

    Posted By: Amit Kumar
    Published on: 2018-05-25 16:05:43

    Comment Form is loading comments...