Main Image

How the problem of Big Data is managed by big giants of Industry.

Ashwani Singh

--

Before knowing the problems first we have to know that some points :

What is Big Data ?

What are its characterstics ?

What are the impacts of Big Data in Industry ?

After these points we will discuss that how companies manages the Big Data problem ?

What is Big Data ?

Big data is a field that treats ways to analyze, systematically extract information from data sets that are too large or complex to be dealt with by traditional data processing methods that involves data collection & data manipulation.

Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets.

Big Data includes 2 types of data :

→ Structured Data

→ Unstructured Data

Structured Data is that type of data which usually resides in relational databases (RDBMS).Fields store length-delineated data phone numbers, Social Security numbers, or ZIP codes. Even text strings of variable length like names are contained in records, making it a simple matter to search.

Data may be human- or machine-generated as long as the data is created within an RDBMS structure. This format is eminently searchable both with human generated queries and via algorithms using type of data and field names, such as alphabetical or numeric, currency or date.

Structured Query Language (SQL) enables queries on this type of structured data within relational databases.

Some relational databases do store or point to unstructured data such as customer relationship management (CRM) applications. The integration can be awkward at best since memo fields do not loan themselves to traditional database queries. Still, most of the CRM data is structured.

Unstructured Data is essentially everything else. Unstructured data has internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated. It may also be stored within a non-relational database like NoSQL.

It includes :

Text Files

Email

Social Media

Website

Mobile Data & Media

Satellite Imagery

Scientific Data

Sensor Data

Survillience Data

What are the characterstics of Big Data ?

We have 4 V’s of Big Data as their features

→ Velocity: It is the same as of the Input/Output or I/O problems that means the speed of storing data on hard disk & the speed of processing data from hard disk.And the size of data results the velocity.

→ Variety: In this we consider the nature & type of data. As earlier we use RDBMS which was capable to handle structured data efficiently and effectively.

→ Volume: It is the quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not.

→ Veracity: It is the extended definition for big data, which refers to the data quality and the data value.

What are the impacts of Big Data in Industry ?

All the big giants of IT Industry like Google,Facebook,Microsoft provides us the services which is used globally.They have very huge data in there data centers and they need to manage & manipulate data efficiently & effectively.

Big data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms specializing in data management and analytics. In 2010, this industry was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole.

Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.

A data center normally holds petabytes to exabytes of data. Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters

Hence to solve all the I/O problems in Big Data we use Master-Slave Topology & Distributed Storage Cluster.

Thanking You,

--

--