Recent Tutorials and Articles
    Introduction to Big Data Analytics
    Published on: 24th February 2015

    This article provides an introduction to Big Data Analytics by briefly explaining the concept behind it along with its characteristics, causes, challenges and objectives of Big Data.

    What is Big Data?

    Everyday, we create quintillion bytes of data in variety of forms such as sensors used to gather climate information, posts to social media websites, digital pictures and videos, purchase transaction records, web logs, and cell phone GPS signals. This data is called "Big Data".

    Few examples of organizations dealing with Big Data are as follows:

    1. Facebook handles 50 billion photos from its site members.
    2. Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes of data.
    3. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide.

     

    "Big Data" however just doesn't refer only to the huge data manifested in many forms. "Big Data" instead refers to a practice which deals with analysing this data and utilizing the data analysis to derive strategic decisions such as introducing new category of products, restructuring the organization and improving the customer care services by analysing the customer care recordings/logs.

    Big Data generates value from the storage and processing of very large quantities of digital information that can not be analyzed with traditional computing techniques. It requires different techniques, tools, algorithms and architecture. Some of Big Data tools and technologies are Apache Hadoop, Apache Spark, R Language and Apache ZooKeeper.

    Characteristics of Big Data

    After briefly talking about Big Data, let's talk about various characteristics that define Big Data:

    1. Volume- This represents the size of the data which determines the value and potential of the data under consideration.
    2. Variety- This means the category to which Big Data belongs as this helps data analysts to effectively use the data to their advantage and upholding the importance of Big Data. Some of the Big Data categories examples are emails, text messages and documents.
    3. Velocity- This refers to the speed of generation and processing of the data to meet the demands and challenges of lying in the path of growth and development
    4. Veracity- This characteristics represents the fact that quality of the data being captured can vary greatly and thus accuracy of analysis also depends on veracity of the source data
    5. Complexity- Big Data management is very complex process as large volumes of the data coming from multiple sources need to be linked, connected and correlated in order to be able to grasp the required information

    Although above five factors define Big Data characteristics, only first 3 (Volume, Variety and Velocity - commonly known as 3V's) characteristics are commonly talked about and popular. Below diagram depicts how the data is increasing on these 3 scales:

    Why Big Data?

    Next, one may ask why all of sudden we care about this data as earlier also we have had lot of data available to be processed and analysed. In this section, we will discuss about following factors that have contributed into advent of Big Data practice:

    1. Increase in storage capacities
    2. Increase in processing powers
    3. Availability of data
    4. 90% of data in the world have been created in the last couple of years alone
    Challenges with Big Data

    After knowing why we need to deal with Big Data, we will now talk about the challenges that we face while processing, analysing and utilizing Big Data:

    1. Scale- Accessing the level of details needed from sheer volumes of data at a high speed.
    2. Performance- In an online world where nanosecond delays can cost you sales, big data must move at extremely high velocities no matter how much you scale or what workloads your database must perform. The data handling hoops of RDBMS solutions put a serious drag on performance.
    3. Workload Diversity- Big data comes in all shapes and sizes. Rigid schemas have no place here and you instead need a more flexible design.There is need for a technology to fit this type of data.
    4. Manageability- Staying ahead of big data using RDBMS technology is a costly, time-consuming and often futile endeavour.
    5. High Availability- Data driven applications relying on big data to feed essential revenue-generating business applications need high availability than the traditional high availability.
    6. Cost- Meeting the above listed challenges with RDBMS can cost a pretty penny.
    Objectives of Big Data

    In this section, we will explain the objectives that we strive to achieve while dealing with Big Data:

    1. Cost reduction- MIPS (Million Instructions Per Second) and above terabyte storage for structured data are now cheaply delivered through big data technologies like Hadoop clusters. This is mainly because of the ability of Big Data technologies to utilize commodity scale hardware for processing by employing techniques such as data sharding, distributed computing etc.
    2. Faster processing speeds- Big data technologies have also helped reducing large scale-analytics processing from hours to minutes. These technologies have also been instrumental in real time analytics reducing processing times to seconds.
    3. Big Data based offerings- Big data technologies have enabled organizations to leverage big data in developing new product and service offerings. The best example may be LinkedIn, which has used big data and data scientists to develop a broad array of product offerings and features, including People You May Know, Groups You May Like, Jobs You May Be Interested In, Who has Viewed My Profile, and several others. These offerings have brought millions of new customers to LinkedIn.
    4. Supporting internal business decisions- Just like traditional data analytics, big data analytics can be employed to support business decisions when there are new and less structured data sources. For example, any data that can shed light on customer satisfaction is helpful, and much data from customer interactions is unstructured such as website clicks, transaction records, and voice recordings from call centres.

    Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.

    Published on: 24th February 2015

    Comment Form is loading comments...