Big Data In 5 Minutes | What Is Big Data?| Big Data Analytics | Big Data Tutorial | Simplilearn

Simplilearn
10 Dec 201905:12

Summary

TLDRThis script delves into the vast realm of big data, highlighting the staggering amount of data generated by smartphones and the internet every minute. It introduces the five Vsโ€”volume, velocity, variety, veracity, and valueโ€”as key characteristics of big data. The script uses healthcare as an example to illustrate the benefits of big data, such as improved disease detection and reduced costs. It explains how frameworks like Hadoop, with its Hadoop Distributed File System (HDFS) and MapReduce, enable the storage and processing of big data in a distributed and parallel manner. The video also explores big data's applications in gaming and disaster management, emphasizing its potential to transform various industries.

Takeaways

  • ๐Ÿ“ฑ Smartphones generate approximately 40 exabytes of data per month, per user, which scales to a massive amount when multiplied by the global number of smartphone users.
  • ๐ŸŒ The internet generates an astonishing amount of data per minute, with millions of activities across platforms like Snapchat, Google, Facebook, YouTube, and email services.
  • ๐Ÿ” Big data is characterized by the five Vs: Volume, Velocity, Variety, Veracity, and Value, which help in classifying and understanding the nature of data.
  • ๐Ÿฅ The healthcare industry is an example where big data plays a crucial role, with hospitals and clinics generating large volumes of patient data that can be leveraged for faster disease detection and improved treatments.
  • ๐Ÿ—ƒ๏ธ Hadoop and its ecosystem, including the Hadoop Distributed File System (HDFS), are frameworks designed to store and process big data in a distributed and efficient manner.
  • ๐Ÿ”„ HDFS stores data in a distributed manner, breaking down large files into smaller chunks and replicating them across different nodes for redundancy and fault tolerance.
  • ๐Ÿ”ง MapReduce is a technique used for processing big data by breaking tasks into smaller sub-tasks, which are then processed in parallel across multiple machines.
  • ๐ŸŽฎ Big data analysis has practical applications in various industries, including gaming, where user behavior data can be used to improve game design and reduce customer churn.
  • ๐ŸŒช๏ธ Big data was instrumental in disaster management during Hurricane Sandy, providing predictive insights and aiding in the preparation and response to the storm's impact.
  • โ“ The script poses a question about the incorrect statement regarding HDFS, highlighting the importance of understanding the capabilities and functions of big data storage systems.
  • ๐ŸŽ The video encourages viewer engagement by asking for opinions on the future impact of big data and offering incentives for participation in the form of Amazon gift vouchers.

Q & A

  • How much data does a single smartphone user generate approximately every month?

    -A single smartphone user generates approximately 40 exabytes of data every month.

  • What is the term used to describe the massive amount of data generated by billions of smartphone users?

    -The term used to describe this massive amount of data is 'big data'.

  • What are the five characteristics, or '5 V's, of big data?

    -The five characteristics of big data are Volume, Velocity, Variety, Veracity, and Value.

  • How much data is generated per minute on the internet according to the script?

    -Per minute on the internet, 2.1 million snaps are shared on Snapchat, 3.8 million search queries are made on Google, 1 million people log onto Facebook, 4.5 million videos are watched on YouTube, and 188 million emails are sent.

  • What is the role of the Hadoop Distributed File System (HDFS) in big data storage?

    -HDFS is the storage layer of Hadoop that stores big data in a distributed manner, breaking down large files into smaller chunks and storing them across multiple data nodes.

  • What technique does Hadoop use to process big data?

    -Hadoop uses the MapReduce technique to process big data, which involves breaking a lengthy task into smaller tasks and processing them in parallel across different machines.

  • How does the MapReduce technique contribute to the processing of big data?

    -The MapReduce technique contributes by enabling parallel processing of data, making the processing faster and more efficient by distributing tasks across multiple machines.

  • What is the significance of the 'Veracity' of data in big data analysis?

    -Veracity refers to the accuracy and trustworthiness of the generated data, which is crucial for ensuring the reliability of the insights and decisions derived from big data analysis.

  • How does big data analysis benefit the healthcare industry?

    -Big data analysis in healthcare can enable faster disease detection, better treatment options, and reduced costs by analyzing vast volumes of patient records and test results.

  • Which of the following statements about HDFS is not correct according to the script?

    -The incorrect statement about HDFS according to the script is 'HDFS performs parallel processing of data', as HDFS is responsible for storage, not processing.

  • How was big data utilized during Hurricane Sandy in 2012?

    -During Hurricane Sandy in 2012, big data was used for disaster management to gain a better understanding of the storm's effect on the east coast of the U.S. and to predict the hurricane's landfall five days in advance.

Outlines

00:00

๐Ÿ“ฑ The Phenomenon of Smartphone Data Generation

This paragraph delves into the staggering amount of data generated by smartphones, which is approximately 40 exabytes per month per user. It emphasizes the sheer scale of data produced when multiplied by the global number of smartphone users, highlighting the challenges this poses for traditional computing systems. The concept of big data is introduced, characterized by the five Vs: volume, velocity, variety, veracity, and value. The paragraph provides an example from the healthcare industry to illustrate these concepts, explaining how big data can be harnessed for faster disease detection, better treatment, and cost reduction. It also introduces various frameworks like Cassandra, Hadoop, and Spark for storing and processing big data, with a detailed explanation of how Hadoop's distributed file system and MapReduce technique work to manage big data efficiently.

05:04

๐Ÿ” Applications and Impact of Big Data

This paragraph, although incomplete, suggests a continuation of the discussion on big data's applications and impact. It mentions the use of big data in gaming to improve user experience and reduce churn rates, as well as its role in disaster management during Hurricane Sandy in 2012, where it was instrumental in predicting the storm's landfall and taking necessary precautions. The paragraph concludes with a quiz about the Hadoop Distributed File System (HDFS), inviting viewers to participate in the comments section for a chance to win Amazon gift vouchers. It also encourages viewers to like, share, subscribe, and turn on notifications for new content.

Mindmap

Keywords

๐Ÿ’กSmartphones

Smartphones are mobile devices that have computing capabilities and internet connectivity, allowing users to perform various tasks such as making phone calls, sending texts, and accessing the internet. In the context of the video, they are highlighted as significant data generators, creating approximately 40 exabytes of data per month per user, which contributes to the concept of big data.

๐Ÿ’กBig Data

Big Data refers to the large volume of data, both structured and unstructured, that inundates a business on a day-to-day basis. It's characterized by the five Vs: volume, velocity, variety, veracity, and value. The video uses big data as its central theme, discussing its generation, challenges, and applications across various industries.

๐Ÿ’กExabytes

An exabyte is a unit of information used to describe a large amount of data, equal to one billion gigabytes. The script mentions that a single smartphone user generates approximately 40 exabytes of data per month, emphasizing the scale of data production in the digital age.

๐Ÿ’กFive V's

The concept of the five V's is a framework used to describe the characteristics of big data. They are volume (size of data), velocity (speed of data generation), variety (types of data), veracity (accuracy and trustworthiness of data), and value (the usefulness of data). The video explains each V with examples from the healthcare industry.

๐Ÿ’กHealthcare Industry

The healthcare industry is used as an example in the video to illustrate the application of big data. It discusses how hospitals and clinics generate massive volumes of data in the form of patient records and test results, which can be analyzed to improve disease detection, treatment, and reduce costs.

๐Ÿ’กHadoop

Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is highlighted in the video as a tool for storing and processing big data through its distributed file system and MapReduce programming model.

๐Ÿ’กHadoop Distributed File System (HDFS)

HDFS is the primary distributed storage system used by Hadoop applications. It is designed to store large data sets reliably across clusters of commodity hardware. The video explains how HDFS stores data in a distributed manner, ensuring data safety even if one machine fails.

๐Ÿ’กMapReduce

MapReduce is a programming model and an associated implementation for processing and generating large datasets. In the video, it is described as a technique that breaks down lengthy tasks into smaller ones, which are then processed in parallel by multiple machines, making data processing faster and more efficient.

๐Ÿ’กParallel Processing

Parallel processing is a method in computer science where many calculations are carried out simultaneously. The video uses the example of MapReduce to illustrate how parallel processing can be used to process big data more quickly by distributing tasks across multiple machines.

๐Ÿ’กData Nodes

In the context of HDFS, data nodes are the machines where the actual data is stored. The video mentions that in HDFS, smaller chunks of data are stored on multiple data nodes, which contributes to the distributed storage and fault tolerance of the system.

๐Ÿ’กDisaster Management

Disaster management refers to the organizational and community efforts to prepare for, respond to, and recover from disasters. The video provides an example of how big data was used during Hurricane Sandy in 2012 to predict the storm's landfall and take necessary measures, showcasing the practical applications of big data analysis.

Highlights

Smartphones generate approximately 40 exabytes of data per month per user.

5 billion smartphone users generate a massive amount of data, termed as big data.

Big data is characterized by the 5 V's: Volume, Velocity, Variety, Veracity, and Value.

2.1 million snaps are shared on Snapchat, 3.8 million Google searches, 1 million Facebook logins, 4.5 million YouTube videos watched, and 188 million emails sent per minute on the internet.

Huge volumes of data are generated in healthcare, with 2314 exabytes collected annually in patient records and test results.

Velocity refers to the high speed at which big data is generated.

Variety encompasses structured, semi-structured, and unstructured data types.

Veracity is the accuracy and trustworthiness of the generated data.

Analyzing big data in healthcare enables faster disease detection, better treatment, and reduced costs.

Frameworks like Cassandra, Hadoop, and Spark are used to store and process big data.

Hadoop uses Hadoop Distributed File System (HDFS) to store data in a distributed manner.

HDFS breaks down large files into smaller chunks and stores copies across different nodes for redundancy.

MapReduce is a technique used for parallel processing of big data.

Parallel processing allows for faster and more efficient data processing by distributing tasks across multiple machines.

Big data analysis in gaming helps improve user experience and reduce customer churn rate.

Big data was instrumental in disaster management during Hurricane Sandy, enabling better understanding and prediction of the storm's impact.

The Hadoop Distributed File System (HDFS) is the storage layer of Hadoop.

Smaller chunks of data are stored on multiple data nodes in HDFS.

HDFS does not perform parallel processing of data; that is a function of MapReduce.

Transcripts

play00:00

we all use smart phones but have you

play00:02

ever wondered how much data it generates

play00:04

in the form of texts phone calls emails

play00:07

photos videos searches and music

play00:10

approximately 40 exabytes of data gets

play00:13

generated every month by a single

play00:15

smartphone user

play00:17

now imagine this number multiplied by 5

play00:19

billion smartphone users that's a lot

play00:22

for our mind even process isn't it in

play00:25

fact this amount of data is quite a lot

play00:27

for traditional computing systems to

play00:29

handle and this massive amount of data

play00:31

is what we term as big data let's have a

play00:35

look at the data generated per minute on

play00:37

the internet 2.1 million snaps are

play00:40

shared on snapchat 3.8 million search

play00:43

queries are made on google one million

play00:45

people log onto facebook 4.5 million

play00:49

videos are watched on youtube

play00:51

188 million emails are sent that's a lot

play00:55

of data so how do you classify any data

play00:58

as big data this is possible with the

play01:00

concept of five v's

play01:02

volume

play01:03

velocity

play01:05

variety

play01:06

veracity and value

play01:09

let us understand this with an example

play01:10

from the health care industry

play01:12

hospitals and clinics across the world

play01:14

generate massive volumes of data

play01:18

2314 exabytes of data are collected

play01:21

annually in the form of patient records

play01:23

and test results

play01:25

all this data is generated at a very

play01:27

high speed which attributes to the

play01:28

velocity of big data

play01:31

variety refers to the various data types

play01:34

such as structured semi-structured and

play01:36

unstructured data examples include excel

play01:39

records log files and x-ray images

play01:43

accuracy and trustworthiness of the

play01:45

generated data is termed as veracity

play01:48

analyzing all this data will benefit the

play01:50

medical sector by enabling faster

play01:52

disease detection better treatment and

play01:55

reduced cost

play01:57

this is known as the value of big data

play02:00

but how do we store and process this big

play02:02

data to do this job we have various

play02:05

frameworks such as cassandra hadoop and

play02:08

spark let us take hadoop as an example

play02:12

and see how hadoop stores and processes

play02:14

big data

play02:16

hadoop uses a distributed file system

play02:18

known as hadoop distributed file system

play02:21

to store big data if you have a huge

play02:23

file your file will be broken down into

play02:26

smaller chunks and stored in various

play02:28

machines not only that when you break

play02:30

the file you also make copies of it

play02:33

which goes into different nodes this way

play02:35

you store your big data in a distributed

play02:37

way and make sure that even if one

play02:39

machine fails your data is safe on

play02:42

another

play02:44

mapreduce technique is used to process

play02:46

big data a lengthy task a is broken into

play02:50

smaller tasks

play02:51

b

play02:52

c

play02:53

and d

play02:54

now instead of one machine three

play02:56

machines take up each task and complete

play02:59

it in a parallel fashion and assemble

play03:01

the results at the end thanks to this

play03:03

the processing becomes easy and fast

play03:06

this is known as parallel processing

play03:10

now that we have stored and processed

play03:12

our big data we can analyze this data

play03:14

for numerous applications

play03:16

in games like halo 3 and call of duty

play03:19

designers analyze user data to

play03:22

understand at which stage most of the

play03:24

users pause restart or quit playing this

play03:28

insight can help them rework on the

play03:29

story line of the game and improve the

play03:31

user experience

play03:33

which in turn reduces the customer churn

play03:35

rate

play03:36

similarly big data also helped with

play03:39

disaster management during hurricane

play03:41

sandy in 2012. it was used to gain a

play03:43

better understanding of the storm's

play03:45

effect on the east coast of the u.s and

play03:47

necessary measures were taken it could

play03:50

predict the hurricane's landfall five

play03:52

days in advance which wasn't possible

play03:54

earlier

play03:55

these are some of the clear indications

play03:57

of how valuable big data can be once it

play03:59

is accurately processed and analyzed

play04:02

so here's a question for you which of

play04:04

the following statements is not correct

play04:06

about hadoop distributed file system

play04:08

hdfs

play04:10

a hdfs is the storage layer of hadoop

play04:14

b

play04:15

data gets stored in a distributed manner

play04:17

in hdfs

play04:19

c

play04:20

hdfs performs parallel processing of

play04:22

data

play04:23

d

play04:24

smaller chunks of data are stored on

play04:26

multiple data nodes in hdfs

play04:29

give it a thought and leave your answers

play04:31

in the comment section below three lucky

play04:33

winners will receive amazon gift

play04:35

vouchers now that you have learned what

play04:37

big data is what do you think will be

play04:39

the most significant impact of big data

play04:41

in the future let us know in the

play04:43

comments below if you enjoyed this video

play04:46

it would only take a few seconds to like

play04:48

and share it also to subscribe to our

play04:51

channel if you haven't yet and hit the

play04:52

bell icon to get instant notifications

play04:55

about our new content stay tuned and

play04:57

keep learning

play05:04

[Music]

play05:11

you

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
Big DataData AnalysisHadoopHealthcareSocial MediaDisaster ManagementData StorageParallel ProcessingUser ExperienceData Volume