This article talks about how to create an index in Elasticsearch cluster.
Introduction
Elastic is one of the most popular set of products in search domain due to ease of setting up and many out of the box tools and features available for data ingestion (logstash, beats), search (Elasticsearch cluster) and visualizations (Kibana) among others. Elasticsearch cluster stores the data as JSON documents in something called Index. Index is comparable to a table in RDBMS world and helps us categorize and manage documents for search efficiency.
Hence, creating an index is first thing to do while experimenting or working with Elasticsearch and we will just do that in this article.
Pre-requisites
- Elasticsearch cluster. You can get started with Elasticsearch using docker by following Install Elasticsearch with Docker.
- Http Client like cURL, Postman etc. to make calls to Elasticsearch Cluster.
Creating an Index in Elasticsearch Cluster
Let's understand following basic concepts before we move on to creating an index as we will be tweaking these -
Description | Default Value | Setting Name | |
---|---|---|---|
Shard |
It is an atomic unit of storage in Elasticsearch and helps to distribute data on multiple nodes in Elasticsearch Cluster making document search horizontally scalable.We can configure an index to have multiple shards based on volume of data and no of queries. Please note that it is not possible to change number of shards once index has been created. Only way is to create new index with desired no of shards and move data over to that new index. However, all this work is abstracted out by Elasticsearch using Split and Shrink APIs. |
1 | number_of_shards |
Replica | A Replica is an additional copy of shard data used for failover and read throughput scaling. We need at least two nodes in Elasticsearch cluster to have replicas created since having replicas on same node as primary shard is not useful for failover. | 1 | number_of_replicas |
Now let's see how a request for index creation looks like -
PUT http://<host>:<port>/<index-name>
And here is cURL equivalent for this -
curl -X PUT http://<host>:<port>/<index-name>
Finally, create an index called products with default settings using following cURL request -
curl -X PUT http://<host>:<port>/products
Similarly, we can also create an index with modified number of shards and/or replicas. Here is cURL request for creating an index named hotels with modified settings -
curl -X PUT http://<host>:<port>/hotels -H "Content-Type: application/json" -d "{\"settings\": {\"number_of_shards\": 2,\"number_of_replicas\": 2}}"
Listing Indices of Elasticsearch Cluster
Once we are done with creation of our indices, we can verify same using below cURL request -
curl -X GET http://<host>:<port>/_cat/indices?v
Here is the output of above command on my machine -
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open hotels vzvOPzPHQ1G8r4zJy0Z4tw 2 2 0 0 920b 460b
green open products yiO69_vBQEiog1AbcOkJPA 1 1 0 0 566b 283b
As we can see, we have got our products and hotels indices created.
Please note that if you are using single node cluster, your indices status will be yellow since we need additional nodes for replica copies as having replica on same node does not provide failover capabilities.
Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.