Lec-114: What is RAID? RAID 0, RAID 1, RAID 4, RAID 5, RAID 6, Nested RAID 10 Explained

Gate Smashers
8 Oct 202114:34

Summary

TLDRThis educational video delves into RAID technology, explaining its significance in data storage for performance and security. It covers various RAID levels, including RAID 0 for performance via data stripping, RAID 1 for data security through mirroring, and RAID 5 for a balance of both by distributing parity across disks. The script uses real-life examples like Facebook's outage to emphasize the importance of data availability and introduces RAID 6, which offers double parity for even greater fault tolerance. The video is tailored for students, professionals preparing for exams or interviews, and anyone interested in data storage solutions.

Takeaways

  • 💾 RAID stands for Redundant Array of Independent Disks (or Inexpensive Disks), which is a method of storing data across multiple hard drives.
  • 🔄 RAID 0, or data stripping, splits data into pieces and stores them across different disks to enhance performance but offers no data redundancy.
  • 🔄 RAID 1, or mirroring, creates exact copies of data on separate disks to ensure data availability and security in case of disk failure.
  • 🔄 RAID 0+1 (or 1+0) combines both data stripping and mirroring to balance performance and data security.
  • 🔄 RAID 3 stores data in blocks with a dedicated parity disk, allowing recovery from a single disk failure but can be a bottleneck due to high utilization of the parity disk.
  • 🔄 RAID 4 is similar to RAID 3 but distributes parity information across all disks to prevent bottlenecks.
  • 🔄 RAID 5 distributes parity information across all disks, ensuring equal utilization and enhancing performance compared to RAID 3 and 4.
  • 🔄 RAID 6 calculates two parities and stores them across all disks, allowing for recovery from up to two disk failures.
  • 💡 The choice of RAID level depends on the balance between performance, data security, and cost, with each level offering different advantages and trade-offs.
  • 🌐 Real-world examples, such as Facebook, WhatsApp, and Instagram outages, highlight the importance of data availability and the potential financial and credibility impact of downtime.

Q & A

  • What does RAID stand for?

    -RAID stands for Redundant Array of Independent Disks, or sometimes Redundant Array of Inexpensive Disks.

  • Why is redundancy important in RAID?

    -Redundancy in RAID is important for duplicating data across multiple disks to ensure data availability and fault tolerance, which helps in case one or more disks fail.

  • What are the two main factors companies consider when using RAID?

    -Companies consider performance, which includes fast read and write speeds, and security or availability, ensuring data is accessible 24/7.

  • What is the significance of the Facebook, WhatsApp, and Instagram outage mentioned in the script?

    -The outage of these platforms for 6-7 hours resulted in a significant loss of revenue and credibility, highlighting the importance of data availability and the impact of downtime on businesses.

  • What is RAID 0 and how does it improve performance?

    -RAID 0 is a level where data is broken into pieces and distributed across multiple disks. This data striping allows for parallel reading and writing, which increases performance and throughput.

  • How does RAID 1 provide data security?

    -RAID 1, also known as mirroring, involves creating exact copies of data on separate disks. This ensures that if one disk fails, the data is still accessible from the other copies.

  • What is the difference between RAID 1 and RAID 0+1?

    -RAID 1+0 (or 0+1) is a nested RAID level that combines both mirroring and striping. It mirrors data across multiple sets of disks and then stripes the data within each mirror set, offering both performance and data security.

  • What is RAID 3 and how does it use parity for data recovery?

    -RAID 3 is a level where data is divided into blocks and distributed across multiple disks, with a dedicated disk for storing parity information. Parity is used to recover data if one disk fails, but if two disks fail, data recovery is not possible.

  • How does RAID 4 differ from RAID 3?

    -RAID 4 is similar to RAID 3 in that it also uses block-level striping and parity for data recovery. However, the main difference is that RAID 4 stores the parity block across all disks, rather than on a single dedicated parity disk.

  • What is RAID 5 and how does it distribute parity?

    -RAID 5 distributes parity information across all disks in the array, rather than storing it on a single disk. This prevents any single disk from becoming a bottleneck and ensures that all disks are utilized equally.

  • What advantage does RAID 6 offer over RAID 5?

    -RAID 6 offers the advantage of having two parity blocks, which allows for the recovery of data even if two disks fail simultaneously, providing an additional layer of fault tolerance.

Outlines

00:00

💾 Introduction to RAID

The video begins with an introduction to RAID, which stands for Redundant Array of Independent Disks or Redundant Array of Inexpensive Disks. The presenter explains that RAID is used to store data across multiple disks, which are independent of each other, meaning the failure of one disk does not affect the others. The term 'inexpensive' refers to the cost-effective nature of disks compared to other forms of memory like registers or RAM. The importance of RAID is highlighted in terms of its relevance for competitive exams, college studies, and interviews. The video emphasizes the need for both performance and data availability, using the example of Facebook, WhatsApp, and Instagram's downtime causing significant financial losses. The presenter also touches on the increasing storage capacities in laptops as an indicator of the cost reduction in disk technology.

05:01

🔄 RAID 0: Data Stripping for Performance

The second paragraph delves into RAID 0, also known as data stripping. In RAID 0, data is broken down into pieces and distributed across multiple disks, which can improve read and write performance as these operations can be done in parallel. The video uses a diagram to illustrate how data is split and stored across different disks. The benefit of RAID 0 is its ability to enhance performance and throughput. However, it does not provide data redundancy, meaning if one disk fails, the entire data set is lost. The video also mentions RAID 1, which is mirroring, where data is duplicated across disks to ensure availability and security, contrasting it with RAID 0's focus on performance over redundancy.

10:03

🔄 RAID 1 and RAID 1+0: Mirroring and Nested RAID

This paragraph discusses RAID 1, which involves mirroring data across two or more disks, providing data security and availability. If one disk fails, the data can still be accessed from the mirror. The video points out that while this does increase cost due to the need for additional disks, the use of inexpensive disks makes it a viable option. The concept of RAID 1+0, or nested RAID, is introduced as a combination of RAID 0 and RAID 1. It involves both mirroring and striping data, providing both high performance and data security. The video explains that this configuration is commonly used in email and web servers due to its balance of performance and reliability.

🔄 RAID 3, 4, 5, and 6: Advanced RAID Levels

The final paragraph covers more advanced RAID levels, starting with RAID 3, which involves block-level striping and a single parity disk. The parity disk allows for the recovery of data if one disk fails, but if two disks fail, data recovery is not possible. RAID 4 is similar to RAID 3 but addresses the issue of a potential bottleneck with the parity disk by distributing the parity information across all disks. RAID 5 improves upon RAID 4 by distributing the parity information, preventing any single disk from being overused. Lastly, RAID 6 introduces double parity, allowing for the recovery of data even if two disks fail. The video concludes by emphasizing the importance of understanding these RAID levels for various applications and potential interview questions.

Mindmap

Keywords

💡RAID

RAID stands for Redundant Array of Independent Disks (or Inexpensive Disks). It is a data storage virtualization technology that combines multiple physical disk drive components into one logical unit for the purposes of data redundancy, performance improvement, or both. In the video, RAID is the central theme, with various RAID levels discussed to explain how they provide different levels of data security and performance.

💡Redundancy

Redundancy in the context of RAID refers to the duplication of data across multiple disks to ensure data availability in the event of a disk failure. This concept is crucial for understanding RAID's purpose, as it directly relates to the video's discussion on data availability and fault tolerance. For example, RAID 1 uses mirroring to create a redundant copy of data on another disk.

💡Performance

In the video, performance refers to the speed and efficiency of data read and write operations. RAID configurations like RAID 0 are designed to enhance performance by striping data across multiple disks, allowing for parallel operations that increase throughput. The script mentions how companies value performance for quick data access and manipulation.

💡Availability

Availability in the context of the video refers to the constant accessibility of data, which is critical for business operations. The video uses the example of Facebook, WhatsApp, and Instagram's downtime to illustrate the importance of data availability and how it impacts business continuity and revenue.

💡Data Striping

Data striping is a RAID technique where data is divided into blocks and distributed across multiple disks. This method is used in RAID 0 to enhance performance by allowing simultaneous read and write operations on different disks. The video explains how striping can lead to faster data access but does not provide data redundancy.

💡Mirroring

Mirroring, as discussed in the video, is a RAID technique where data is copied exactly onto another disk. This is used in RAID 1 to provide data redundancy, ensuring that if one disk fails, the data can still be accessed from the mirror. The video emphasizes the balance between cost and data security that mirroring provides.

💡Parity

Parity in RAID is a method of error detection and correction where a parity bit or block is calculated from the data and stored separately. RAID levels like RAID 3, 4, and 5 use parity to enable data recovery in the event of a single disk failure. The video explains how parity works and its role in maintaining data integrity.

💡Bottleneck

A bottleneck in the context of RAID refers to a point of congestion where the system's performance is limited by a particular component. The video mentions that in RAID 3, the parity disk can become a bottleneck because all read and write operations that affect parity require access to this disk, potentially slowing down the system.

💡RAID 5

RAID 5 is a RAID level that combines the striping of RAID 0 with distributed parity. The video explains how RAID 5 distributes parity across all disks, preventing any single disk from becoming a bottleneck and improving overall performance and fault tolerance compared to RAID 3 and RAID 4.

💡RAID 6

RAID 6 is a RAID level that extends RAID 5 by calculating two parity blocks. This provides an additional layer of fault tolerance, allowing the system to sustain the failure of two disks without data loss. The video highlights RAID 6's advantage in high-availability storage systems where data integrity is critical.

Highlights

Introduction to RAID (Redundant Array of Independent Disks) and its importance for data storage.

Explanation of the term 'redundant' in RAID and its significance for data duplication.

Discussion on the cost-effectiveness of disks and their role in RAID configurations.

Importance of RAID for performance and data availability in organizations.

Real-world example of Facebook, WhatsApp, and Instagram outage highlighting the need for RAID.

Description of RAID 0 and its method of data stripping for improved performance.

Advantages of RAID 0 in terms of read and write speed due to parallel operations.

Introduction to RAID 1 and its mirroring technique for data redundancy.

Cost-benefit analysis of using inexpensive disks for RAID 1 configurations.

Explanation of data security and availability in RAID 1 with real-world examples.

Combination of RAID 0 and RAID 1 to form RAID 10, offering both performance and data security.

Description of RAID 3 and its block-level data stripping with parity for fault tolerance.

Challenges with RAID 3, such as the single parity disk becoming a bottleneck.

Introduction to RAID 4, which is similar to RAID 3 but with distributed parity.

Explanation of RAID 5, which distributes parity across all disks to prevent bottlenecks.

Description of RAID 6, which calculates two parities to allow recovery from the failure of two disks.

Practical applications of different RAID levels in email and web servers.

Summary of RAID levels and their respective advantages for data storage solutions.

Transcripts

play00:00

Dear student, welcome to Gate Smashers

play00:02

In this video I'm going to explain

play00:04

RAID

play00:04

That is redundant array of independent Disk

play00:08

or redundant array of inexpensive disk

play00:12

So guys here I'll discuss about different RAID levels

play00:17

with real life examples

play00:18

So this video from competitive exams point of view, for college and university and for interviews

play00:24

is very important

play00:25

So guys so like the video and subscribe the channel

play00:28

if you havn't done and if you have done so do subscribe from other other device

play00:32

Subscriber are very important

play00:34

lets start first of all

play00:36

here we are writing Redundant array

play00:39

redundant means

play00:41

redundant means copy

play00:43

it means duplicacy

play00:45

means multiple copies of 1 thing

play00:47

so here

play00:49

reduntdancy

play00:51

duplicacy

play00:53

that is of

play00:54

that is of disk

play00:57

and why we're using disk

play00:59

we're using disk for storing the data

play01:02

so we are storing data in disk

play01:05

that disk is independent

play01:08

lets say if I've 5 disk

play01:10

and array mean

play01:13

that we don't have 1-2 disk but we've multiple disk

play01:18

so this disk

play01:19

first this is independent

play01:21

it means it is not like that if 1 is closed

play01:25

then rest will also get automatically closed, there is no dependency between them

play01:30

and second why it is called inexpensive

play01:33

because if we talk about memory hierchary

play01:36

it is said that register are very much expensive

play01:39

then cash

play01:40

then RAM

play01:42

So somewhere this memory are more costly

play01:45

but we talk about disk

play01:47

So disk at this time, its cost is less

play01:51

and even gradually it is getting decrease

play01:55

when you've purchased your laptop

play01:57

so you also have seen that first there is 256 and then there 512 GB

play02:01

then 1 TB and now a days if you buy a normal laptop theres is by default 1TB is coming

play02:06

it may be like you can take 2 TB in future also

play02:10

So it mean that we're using indepentent and inexpensive disk we're using

play02:15

for storing the data

play02:16

but why we're doing

play02:19

what is need of this, this is the first question that arise

play02:22

So guys whenever we think of company or organsition point of view

play02:27

so we need 2 things

play02:30

first is performance

play02:33

performance in case of read and write

play02:36

means you from data

play02:38

you want to read data, want to read data from disk

play02:42

and write

play02:44

means want to change data in disk

play02:46

some transaction has changed something

play02:48

you want Read and Write

play02:50

to be fast

play02:51

so you want performance

play02:53

Second you want security

play02:55

avialability

play02:56

that data should be avaialalbe 24*7

play02:59

and its real life example I'll give to you

play03:02

you have heard fews days ago

play03:05

that facebook, whatapp and insta

play03:08

was down for 6-7 hours

play03:10

and you might have heard this also

play03:12

and because of that Mark zuckerberg income

play03:16

how much loss does he faced

play03:18

he faced 7 billion loss

play03:20

7 billion dollar loss

play03:23

from 3rd place he has shifted to 5th place

play03:26

only becasue of 6-7 hours

play03:29

services were down

play03:31

and what is there in that services

play03:32

what we do in whatsapp

play03:34

sharing data only

play03:35

what is there in facebook

play03:37

our profile and friends profile

play03:39

that is also data

play03:41

and if we talk about instagram

play03:42

in that also there is only photos and videos, that is also data only

play03:46

so it is dealing with data only

play03:47

and only because of this small down

play03:51

so because of that there is 7 billion of loss

play03:55

so guys

play03:56

thats why companies want

play03:58

there data should be 24*7 available

play04:01

and if is not available it would be great loss for company

play04:07

so that loss

play04:08

can be in term of money also

play04:10

and it can be in term of cerdibility also

play04:13

means trust of them

play04:14

I'll give one more example of this

play04:16

if you've heard about telegram

play04:19

So telegram in only those 6-7 hours

play04:22

60-70 millions accounts has been added

play04:25

when facebook and all were down

play04:27

that 6-7 hours

play04:29

60-70 new accounts has been created

play04:33

So see one compant faced a good benefit and another one has faced loss

play04:37

and only because of avaialbity and performance

play04:41

so lets start, so you understood storty

play04:44

that why we are using and what is its purpose

play04:46

So we'll talk anout different levels

play04:49

So in Raid we've diffrenet levels

play04:52

that how we are keeping the data

play04:54

So first level in this is RAID 0

play04:58

What we do in it

play04:59

data stripping

play05:01

means we are breaking data, breaking data in pieces

play05:04

and keep in different disk

play05:06

like you can see in diagram

play05:08

that in RAID 0, original data that I've lets say it is A

play05:12

B,C,D

play05:15

means you can put data in this way

play05:17

what I did, I broke A in A1 and A2

play05:21

B into B1 and B2 means you can do mutiple pieces also

play05:25

but if you see in simple diagram

play05:27

So I break A1 into A1 and A2

play05:30

then on A3 and A4 and A5 and A6

play05:35

I mean to say that breaking data in pieces

play05:37

and keeping that data in different disk

play05:40

so what is the use of this

play05:42

its benefit is performance

play05:44

RAID 0 gives us

play05:46

performance

play05:47

because if you want to read, you want to read data of A

play05:51

so parallelly you can read the data from both as it is independent disk

play05:55

parallelly you can read data from this

play05:57

So your performance will be fat

play05:59

same if you want write(B)

play06:01

you want change something in A7

play06:03

and in A8

play06:05

so both write will be parallelly

play06:08

so this give high performance and throughput

play06:13

RAID 0

play06:14

So remember this point

play06:16

then

play06:17

so it can be asked from you that next level is RAID 1

play06:20

What is purpose of RAID 1, it is called mirroring

play06:23

as RAID 0 is called data stripping

play06:24

here it is called mirroring

play06:26

here we are not breaking data

play06:28

same data copy

play06:30

is shifting to another disk

play06:32

means I've a data A,B,C

play06:34

I've made a mirror of that, mirror means like there is mirror image

play06:38

copy of that data

play06:40

is kept on another disk also

play06:42

So you will ask sir it will cost a lot

play06:45

cost is there

play06:46

but again inexpensive disk

play06:49

means we're choosing that disk or memory

play06:51

whose price is not that much high

play06:53

we are not using ssd, cash, RAM

play06:56

we 're using that disk

play06:57

whose performance is also good

play06:59

and price is also not that much high

play07:01

So we're keeping multiple copies

play07:03

so you got its simple advantage

play07:06

it is not pointing out on performance

play07:08

here it talks about availability

play07:10

data security

play07:12

lets say if one disk failed

play07:15

still I can access that data because in other disk that work is going

play07:20

this is 2 simple example

play07:22

companies keep 3-4 mirrors and distribute in whole globe

play07:27

if there is problem in one geographical area

play07:30

So it can fetch data from different geographical area

play07:33

but idea must have clear from here

play07:36

So mirroring work is to secure data in a way

play07:38

in case of

play07:40

failure we can access data from another

play07:43

and in case if this is coming in your mind that if both disk fails

play07:46

So guys there is no solution of this, you can give 10 mirrors

play07:50

then also you can say and what if 10 mirror fails

play07:53

so that is a worst case

play07:55

but still we are reducing the probability

play07:58

if failure occurred

play08:00

whether I'll able to access my data or not

play08:02

so we are increasing its probability

play08:04

that yes I can still access my data

play08:07

so we're improving that thing

play08:10

so this is combination

play08:12

means here you can see RAID 1+0

play08:15

at same places 1+0 can also be written

play08:16

so we don't call it 10

play08:17

it is not 10 but is 1+0

play08:19

1+0 means we've mix 1 and 0

play08:22

So 0+1 can also be written on 1+0 also

play08:25

but what we did in 1+0

play08:28

that data is here mirrored also

play08:31

and along with this it is striped also

play08:34

see A1 and A2

play08:35

means we break the data also, original data was A

play08:39

it is broken in 2 parts A1 and A2

play08:42

and kept a copy of that also

play08:45

and its further part also made A3 and A4

play08:48

and kept a copy of that, means stripping is also done like data is broken

play08:52

and mirroring also

play08:53

that’s why we say

play08:54

nested raid because both combination

play08:56

and it is advantageous

play08:59

very useful

play09:00

and real life application

play09:02

whether you talk about email server or web server

play09:06

there it is mostly used

play09:08

because this raid 0 it give high performance

play09:11

so it is using that also

play09:13

and Raid 1 data security

play09:15

so it is using this also

play09:17

so it ls the combination of both

play09:19

then next if we talk about RAID 3

play09:22

RAID 2 is also there but has become absolute in that we break data in bit level

play09:27

that is not used here

play09:29

Now we'll come to RAID 3

play09:30

in this we break data in block level

play09:33

I've data A

play09:34

I've divided that in bloc

play09:37

blocks means

play09:38

broken data in pieces

play09:41

in pieces

play09:42

and capture data in different disk

play09:46

A1 here and A2 here and A3 here

play09:49

A4 here A5 here and A6 here

play09:51

Breaking original data into 6 pieces and kept it like this

play09:55

So in last you might be seeing that what it is A

play09:58

this is parity bit, means taking parity of data

play10:03

and stored in a different disk

play10:05

all data

play10:07

what we need to dot that along with storing original data

play10:11

its parity is also stored, why we do this

play10:14

parity storing means

play10:16

if one of the disk is failed

play10:18

if one of the disk is failed

play10:20

Still I can recover the data from

play10:23

parity

play10:23

I'll give ls simple example

play10:25

lets say data is 1

play10:26

and this is 2

play10:28

and this is data 3

play10:29

just for an example, I'll tell you with a simple example

play10:31

so if anyone ask you can easily answer that

play10:34

So method of finding parity is adding all of three

play10:38

So 1+2+3 will be

play10:43

it will be 6

play10:44

so parity 6 we've stored here

play10:46

so like this we have find out its Parity

play10:49

now lets say in future A1 fails

play10:52

A1 disk fails

play10:54

so now how I'll recover the data

play10:57

with the help of parity

play10:58

Parity is 6

play11:00

and I know how parity was found 1+2+3

play11:03

so what I'll do

play11:05

I've to find this one

play11:06

take rest of the parity

play11:09

its sum is 5

play11:10

So will subtract from parity so it will be 5

play11:13

So see its data is 1

play11:15

this I'm telling in a form of example

play11:18

that if 1 disk is failed

play11:20

Still I can recover the data from the parity

play11:23

only 1 disk but if 2 disk fail

play11:25

then we can't do anything

play11:26

then we can't do anything

play11:28

then we can't find with parity as there will be confusion

play11:32

then multiple options can come, so here

play11:34

So here you got the advantage that we broke block level data

play11:39

in blocks

play11:40

made blocks of original and made one parity

play11:43

but what is the problem here

play11:45

problem is that all parity is there in 1 parity

play11:49

data is in hard disk

play11:50

kept parity in different disk

play11:53

problem from this is that in case

play11:56

in case if there if problem came in parity

play11:59

So all the parities will not able to get use

play12:03

second bottle neck can come

play12:06

if anyone read

play12:07

or write, lets say here also write and here also

play12:10

and here also

play12:11

whenever you write or change

play12:14

so you need to change in parity also

play12:15

Parity will change always

play12:17

lets say if I've here instead of 2 I've done 20

play12:19

then new parity I've to calculate

play12:22

so whenever you'll write

play12:24

then you've to take parity from here

play12:26

So this disk will be used a lot

play12:29

because of this it can came in bottle neck state

play12:32

it will be very usable that it can create problem in accessibility

play12:35

So this problem can come and this problem is solved by next one

play12:39

RAID 4 says

play12:40

what we're doing in RAID 4

play12:42

same concept

play12:44

but in this we've saved parity in such way

play12:48

in actual there was not much difference in RAID 3 and 4

play12:52

but this problem is solved by

play12:54

RAID 5

play12:56

what RAID 5 did

play12:57

All these parities

play13:01

are distributed

play13:03

Parities are distributed

play13:06

Parity in not in 1 disk

play13:07

Now that parities are distributed all disk

play13:12

now there is data and parity both

play13:16

so what will happen because of this

play13:18

because of this no disk will be over utilized

play13:22

all disk will be equally utilized, right operation will be performed well by this

play13:26

as compare to RAID 3 and 4

play13:28

finally there is RAID 6, this is last,in raid 6 there is little difference

play13:34

in this we are calculating 2 parities

play13:37

instead of 1 we are calculating 2 parities

play13:39

A and A

play13:41

this data A1,A2 and A3

play13:44

calculated 2 parities of this

play13:46

calculating 2 parties means as in earlier one if 1 disk fails

play13:51

I can recover from 1 parity

play13:53

but if 2 disk fail

play13:55

then it is not possible

play13:56

but here advantage is

play13:58

that if 2 disk also fails

play14:01

still I can recover the data from the parity

play14:04

because I've 2 parities

play14:05

So there are 2 equations

play14:06

so from 2 equations I can find out the value, like we normally do

play14:11

that in one equation x, y value is this and in second it is this

play14:14

so obviously we

play14:15

find out them

play14:17

I mean to say

play14:18

That in RAID six advantage is

play14:20

that from 2 parities if 2 disk fails

play14:23

then that you can easily recover

play14:25

So from all these parameter whatever question comes you can easily explain

play14:32

Thank you

Rate This

5.0 / 5 (0 votes)

Связанные теги
RAID LevelsData StoragePerformanceSecurityRedundancyDisk ArraysTech TutorialData RecoveryTech EducationServer Management
Вам нужно краткое изложение на английском?