Recent Tutorials and Articles
    Creating index in Elasticsearch Cluster
    Published on: 8th March 2020
    Posted By: Amit Kumar

    This article talks about how to create an index in Elasticsearch cluster.

    Introduction


    Elastic is one of the most popular set of products in search domain due to ease of setting up and many out of the box tools and features available for data ingestion (logstash, beats), search (Elasticsearch cluster) and visualizations (Kibana) among others. Elasticsearch cluster stores the data as JSON documents in something called Index. Index is comparable to a table in RDBMS world and helps us categorize and manage documents for search efficiency.

    Hence, creating an index is first thing to do while experimenting or working with Elasticsearch and we will just do that in this article.

     

    Pre-requisites


    1. Elasticsearch cluster. You can get started with Elasticsearch using docker by following Install Elasticsearch with Docker.
    2. Http Client like cURL, Postman etc. to make calls to Elasticsearch Cluster.

     

    Creating an Index in Elasticsearch Cluster


    Let's understand following basic concepts before we move on to creating an index as we will be tweaking these - 

    Index Settings
      Description Default Value Setting Name
    Shard

    It is an atomic unit of storage in Elasticsearch and helps to distribute data on multiple nodes in Elasticsearch Cluster making document search horizontally scalable.We can configure an index to have multiple shards based on volume of data and no of queries.

    Please note that it is not possible to change number of shards once index has been created. Only way is to create new index with desired no of shards and move data over to that new index. However, all this work is abstracted out by Elasticsearch using Split and Shrink APIs.

    1 number_of_shards
    Replica A Replica is an additional copy of shard data used for failover and read throughput scaling. We need at least two nodes in Elasticsearch cluster to have replicas created since having replicas on same node as primary shard is not useful for failover. 1 number_of_replicas

     

    Now let's see how a request for index creation looks like -

    PUT http://<host>:<port>/<index-name>

    And here is cURL equivalent for this -

    curl -X PUT http://<host>:<port>/<index-name>

     

    Finally, create an index called products with default settings using following cURL request -

    curl -X PUT http://<host>:<port>/products

     

    Similarly, we can also create an index with modified number of shards and/or replicas. Here is cURL request for creating an index named hotels with modified settings -

    curl -X PUT http://<host>:<port>/hotels -H "Content-Type: application/json" -d "{\"settings\": {\"number_of_shards\": 2,\"number_of_replicas\": 2}}"

     

    Listing Indices of Elasticsearch Cluster


    Once we are done with creation of our indices, we can verify same using below cURL request - 

    curl -X GET http://<host>:<port>/_cat/indices?v

    Here is the output of above command on my machine - 

    health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    yellow open   hotels   vzvOPzPHQ1G8r4zJy0Z4tw   2   2          0            0       920b           460b
    green  open   products yiO69_vBQEiog1AbcOkJPA   1   1          0            0       566b           283b

    As we can see, we have got our products and hotels indices created.

    Please note that if you are using single node cluster, your indices status will be yellow since we need additional nodes for replica copies as having replica on same node does not provide failover capabilities.

     

    Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.

    Posted By: Amit Kumar
    Published on: 8th March 2020

    Comment Form is loading comments...