What is Normalization in SQL? | Database Normalization Forms - 1NF, 2NF, 3NF, BCNF | Edureka

edureka!
26 Oct 201912:44

Summary

TLDRThis video script by Predict from Ed Eureka dives into the concept of database normalization, a systematic approach to organizing data and eliminating redundancy. It covers the basics of normalization, including First, Second, and Third Normal Forms, as well as the Boyce-Codd Normal Form. The script explains how normalization addresses data anomalies like insertion, update, and deletion issues, ensuring data integrity and logical sense. With simple examples, it guides viewers through the process of achieving each normal form, emphasizing the importance of atomicity, avoiding partial dependencies, and ensuring non-prime attributes' dependencies are direct and logical.

Takeaways

  • 😀 Database normalization is a systematic approach to organizing data and eliminating redundancy by decomposing tables.
  • 🔑 The primary purpose of normalization is to remove repeated data and ensure data dependencies make logical sense.
  • 📊 Normalization helps to address data anomalies such as insertion, update, and deletion anomalies, which can cause data inconsistency and inefficiency.
  • 📝 There are different levels of normal forms, including First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF).
  • 🧩 In 1NF, the table should have atomic values, meaning no cell should hold multiple values, and each set of related data is identified with a primary key.
  • 🔄 2NF requires a table to be in 1NF and to eliminate partial dependencies where non-prime attributes depend on only a part of the primary key.
  • 🔄 3NF builds on 2NF by ensuring that non-prime attributes are not transitively dependent on other non-prime attributes, but only on the primary key.
  • 🔍 BCNF is an extension of 3NF, addressing cases where functional dependencies violate the rule that every determinant must be a super key.
  • 🛠️ Normalization is crucial for maintaining data integrity, reducing the need for restructuring, and making the data model more informative.
  • 📈 The process of normalization involves breaking down tables to eliminate data redundancy and anomalies, thus improving the overall database design.
  • 💻 Understanding and applying normalization principles is essential for database administrators and designers to manage large datasets effectively.

Q & A

  • What is normalization in the context of databases?

    -Normalization is a systematic approach to organizing data in a database by decomposing tables to eliminate data redundancy. It involves a multi-step process that puts data into a tabular form, removing duplicated data from relational tables.

  • Why is normalization necessary for database management?

    -Normalization is necessary to eliminate repeated data and ensure data dependencies make logical sense. It helps to prevent data anomalies, reduce the need for restructuring over time, and make the data model more informative.

  • What are the different types of normalization forms discussed in the script?

    -The script discusses First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF).

  • What is the main goal of achieving the First Normal Form (1NF)?

    -The main goal of achieving 1NF is to ensure atomicity, meaning that values in the table should not be further divided and a single cell cannot hold multiple values.

  • Can you explain the concept of 'partial dependency' in the context of Second Normal Form (2NF)?

    -In 2NF, partial dependency refers to a situation where a non-prime attribute (an attribute not part of any candidate key) is dependent on a proper subset of a candidate key, rather than the entire candidate key.

  • What is the primary condition for a table to be in Third Normal Form (3NF)?

    -For a table to be in 3NF, it must first be in 2NF, and second, it must not have any transitive dependency where a non-prime attribute is dependent on another non-prime attribute.

  • What are the main problems that normalization aims to solve, as mentioned in the script?

    -Normalization aims to solve data anomalies such as insertion, update, and deletion anomalies, which can lead to data inconsistency and inefficiency in database management.

  • How does the Boyce-Codd Normal Form (BCNF) differ from Third Normal Form (3NF)?

    -BCNF is a higher version of 3NF and addresses certain types of anomalies not dealt with by 3NF. In BCNF, every functional dependency implies that the attribute(s) on the left side of the dependency must be a super key of the table.

  • What is a 'super key' in the context of Boyce-Codd Normal Form (BCNF)?

    -A super key in BCNF is a combination of one or more attributes that uniquely identifies rows in a table.

  • Can you provide an example of how to apply the concepts of normalization to a database table?

    -An example from the script is the employee table with columns for employee ID, name, phone number, and salary. If the phone number column has multiple values, it violates 1NF. To achieve 1NF, the phone number column could be split into separate rows or a new table with a foreign key reference to the employee ID.

  • What are the benefits of normalizing a database according to the script?

    -The benefits of normalizing a database include eliminating data redundancy, reducing the risk of data anomalies, improving data integrity, and making the database more efficient and easier to maintain.

Outlines

00:00

😀 Introduction to Database Normalization

The video script introduces the concept of database normalization, a systematic approach to organizing data to eliminate redundancy and improve data integrity. The session is led by Predict from Ed Eureka, who outlines the agenda, which includes understanding normalization, exploring various normal forms such as first, second, third normal forms, and the Boyce-Codd normal form. The importance of normalization is emphasized to avoid data anomalies like insertion, update, and deletion anomalies, which can lead to data inconsistencies and inefficiencies. The script also encourages viewers to subscribe to the channel for updates.

05:03

📚 Achieving First Normal Form (1NF)

This paragraph delves into the specifics of the first normal form (1NF), which addresses the issue of atomicity in database tables. It explains that a table should not contain composite or multivalued attributes, and that each cell should hold a single value. The process of achieving 1NF involves removing repeating groups and creating separate tables for related data, each identified by a primary key. An example is provided where an employee table with a phone number column containing multiple values is split into a table that adheres to 1NF, ensuring no cell contains multiple values.

10:05

🔑 Understanding Second Normal Form (2NF)

The second paragraph explains the second normal form (2NF), which builds upon 1NF by eliminating partial dependencies. It defines partial dependency as a situation where a non-prime attribute is dependent on only a part of a composite primary key. To satisfy 2NF, the table must be in 1NF and have no partial dependencies. An example is given where a table with a composite primary key of employee ID and department ID is split into two tables to remove the partial dependency of the office location on department ID alone. The result is two tables, each with a primary key and non-key attributes fully functionally dependent on that key.

🔄 Transitioning to Third Normal Form (3NF)

The third normal form (3NF) is discussed in this paragraph, focusing on reducing data duplication and ensuring referential integrity by eliminating transitive dependencies. A table must be in 2NF and have no non-prime attribute transitively dependent on another non-prime attribute to be in 3NF. The explanation simplifies the concept by using an example where a table with a transitive dependency between student ID, subject ID, and subject is split into two tables to achieve 3NF. The result is that all non-key attributes in the first table are dependent only on the primary key, and in the second table, the subject is dependent only on subject ID.

🎓 Boyce-Codd Normal Form (3.5NF)

The final paragraph introduces the Boyce-Codd normal form (3.5NF), which is an extension of 3NF designed to address certain anomalies not covered by 3NF. In 3.5NF, every non-trivial functional dependency implies that the left side of the dependency is a super key. The paragraph uses an example of a table with student ID, subject, and professor, where the professor is dependent on the subject, which is a prime attribute. To satisfy 3.5NF, the table is divided into two, with the second table having professor ID as a super key, thus removing the non-prime attribute's functional dependency. The session concludes with an invitation for viewers to ask questions and engage with the content.

Mindmap

Keywords

💡Normalization

Normalization is a systematic approach in database management to organize data by decomposing tables and eliminating redundancy. It is central to the video's theme as it is the primary technique discussed for organizing data in a database. The script mentions various types of normalization, such as first, second, third normal forms, and Boyce-Codd normal form, to illustrate the process of refining data structures for efficiency and consistency.

💡Data Redundancy

Data redundancy refers to the duplication of data across a database, which can lead to inefficiencies and data anomalies. In the context of the video, normalization is presented as a solution to reduce data redundancy, ensuring that each piece of information is stored only once, thus simplifying data management and reducing storage requirements.

💡First Normal Form (1NF)

First Normal Form is the initial step in the normalization process, focusing on atomicity, which means that table cells should not contain multiple values. The video script uses the example of an employee table to demonstrate how to achieve 1NF by ensuring that each cell contains a single value and splitting composite attributes into separate tables.

💡Second Normal Form (2NF)

Second Normal Form builds upon the first by addressing partial dependencies, where non-prime attributes are dependent on only a part of a composite primary key. The script explains 2NF by illustrating how to split a table to ensure that non-key attributes are fully functionally dependent on the primary key, thus eliminating partial dependencies.

💡Third Normal Form (3NF)

Third Normal Form further refines the database by eliminating transitive dependencies, where non-prime attributes depend on other non-prime attributes through the primary key. The video script provides an example of a student-subject relationship table and shows how to restructure it to ensure that only prime attributes determine non-prime attributes, achieving 3NF.

💡Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form, also known as 3.5NF, is an extension of third normal form that deals with certain types of anomalies not addressed by 3NF. The script explains BCNF by showing how to restructure a table to ensure that every non-trivial functional dependency is a super key, thus avoiding functional dependencies on non-prime attributes.

💡Data Anomalies

Data anomalies are problems that arise from improper data organization, such as insertion, update, and deletion anomalies. The video script discusses these anomalies to highlight the need for normalization, explaining how they can lead to data inconsistencies and the necessity of restructuring to maintain data integrity.

💡Composite Key

A composite key is a primary key made up of two or more attributes that together uniquely identify a record in a table. The video script uses the concept of a composite key to explain how partial dependencies can occur in second normal form and how to resolve them by splitting the table.

💡Functional Dependency

Functional dependency is a relationship between attributes in a table where the value of one attribute determines the value of another. The video script discusses functional dependencies in the context of normalization, explaining how they should be managed to achieve the various normal forms and avoid data anomalies.

💡Super Key

A super key is any set of attributes that uniquely identifies a record in a table, which may include a combination of one or more attributes. The script introduces the concept of a super key in the context of Boyce-Codd normal form, emphasizing that every functional dependency should be on a super key to satisfy BCNF.

💡Insertion Anomaly

Insertion anomaly occurs when new data cannot be added to a table because of missing related information. The video script provides an example of an insertion anomaly where adding a new employee without a department would require setting the department information as null, illustrating the problem and the need for normalization.

💡Update Anomaly

Update anomaly happens when changes in the data require updating multiple records, risking data inconsistency if some records are missed. The script uses the example of updating Mr. Ratchett's department change to explain how update anomalies can occur and the importance of normalization in preventing them.

💡Deletion Anomaly

Deletion anomaly arises when deleting a record also removes associated information that should be retained. The video script describes a scenario where deleting employee records at the end of a financial year would also delete department information, demonstrating the need for normalization to avoid such data loss.

Highlights

Normalization is a systematic approach to organizing data in a database to eliminate data redundancy.

There are different types of normalization including first, second, third normal form, and Boyce-Codd normal form.

Normalization helps to eliminate repeated data and ensure data dependencies make logical sense.

Data anomalies such as insertion, update, and deletion anomalies can occur if a table is not properly normalized.

First normal form (1NF) tackles the problem of atomicity, ensuring no cell holds multiple values.

Second normal form (2NF) requires tables to be in 1NF and to not contain partial dependencies.

In 2NF, non-prime attributes should be fully dependent on the primary key, not just a part of it.

Third normal form (3NF) further reduces data duplication and ensures referential integrity.

3NF requires tables to be in 2NF and that no non-prime attribute is transitively dependent on another non-prime attribute.

Boyce-Codd normal form (BCNF) is a higher version of 3NF, addressing anomalies not covered by 3NF.

In BCNF, every functional dependency implies that the determinant must be a super key of the table.

Normalization reduces the need for restructuring over time and makes the data model more informative.

Examples are provided to illustrate how to achieve each normal form by decomposing tables.

The video explains how to identify and correct data anomalies through the process of normalization.

Understanding normalization helps in managing large datasets and maintaining data consistency.

The session concludes with an invitation for viewers to ask questions and engage with the content.

The presenter encourages viewers to like, comment, and subscribe for more educational content.

The video aims to provide a clear understanding of normalization and its practical applications in database management.

Transcripts

play00:00

[Music]

play00:11

dito in the database is stored in terms

play00:13

of enormous quantity retrieving certain

play00:16

data will be a tedious task if the data

play00:18

is not organized correctly with the help

play00:21

of normalization we can organize this

play00:23

data and also reduce the redundant data

play00:26

hey guys this is predict from Ed Eureka

play00:29

and I welcome you all to this

play00:30

interesting session on normalization in

play00:32

SQL in this session I'll explain

play00:35

everything that is related to

play00:36

normalization with simple examples that

play00:39

are easy to remember firstly let's look

play00:41

at the agenda for today's session so

play00:44

we're going to start off with

play00:45

understanding normalization and moving

play00:47

further we shall look at various types

play00:49

of normalization and those are first

play00:51

normal form second normal form third

play00:54

normal form and boyce-codd normal form I

play00:56

hope you guys are clear with the agenda

play00:59

but before moving further if you haven't

play01:01

subscribed to a channel then do

play01:03

subscribe to never miss out an update

play01:04

with that being said let's get started

play01:07

the first topic in today's session is

play01:09

what is normalization database

play01:12

normalization is a technique of

play01:14

organizing the data in the database it

play01:17

is a systematic approach of decomposing

play01:19

tables to eliminate data redundancy it

play01:21

is a multi-step process that puts data

play01:24

into tabular form removing the

play01:26

duplicated data from its relational

play01:28

tables on the screen we just saw that

play01:31

the table is getting decomposed into two

play01:33

smaller table is it really necessary to

play01:36

normalize the table that is present on

play01:38

the database well every table in the

play01:40

database has to be in the normal form so

play01:43

normalization is used mainly for two

play01:45

purpose so the first one is it is used

play01:48

to eliminate repeated data having

play01:50

repeated data in the system not only

play01:52

makes the process flow but will cause

play01:54

trouble during the later part of

play01:55

transactions and second one is to ensure

play01:58

the data dependencies make some logical

play02:00

sense yes usually the data is stored in

play02:03

database with certain logic huge data

play02:06

sets without any purpose are completely

play02:08

waste it's like having an abundant

play02:11

resource without any application

play02:13

the data that we have should make some

play02:15

logical sense normalization came into

play02:17

existence because of the problems that

play02:19

occurred on data now let's look at those

play02:22

problems and these are known as data

play02:24

anomalies if a table is not properly

play02:27

normalized and has data redundancy then

play02:30

it will not only eat up the extra memory

play02:32

space but will also make it difficult to

play02:34

handle and update the data base let's

play02:37

look at the first anomaly that is

play02:39

insertion anomaly suppose for a new

play02:42

position in a company mr. Ratchett is

play02:44

selected but the department has not been

play02:46

allotted for him in that case if we want

play02:49

to update his information to the

play02:50

database we need to set the department

play02:52

information as null similarly if we have

play02:55

to insert data of thousand employees who

play02:58

are in similar situation then the

play03:00

department information will be repeated

play03:02

for all those thousand employees this

play03:05

scenario is a classical example of

play03:07

insertion anomalies the next one is

play03:09

update anomaly what if mr. Ratchett

play03:12

leaves the company or is in no longer

play03:15

the head of the marketing department in

play03:16

that case all the employee records will

play03:19

have to be updated and if by mistake we

play03:22

miss any record it will lead to data

play03:24

inconsistency this is nothing but

play03:26

updation anomaly and the final one is

play03:30

deletion anomaly in our employee table

play03:33

two different pieces of information are

play03:35

kept together that is employee

play03:37

information and Department information

play03:39

hence at the end of financial year

play03:42

if employee records are deleted we will

play03:44

also lose the department information

play03:46

this is nothing but deletion anomaly so

play03:49

these were some of the problems that

play03:51

occurred while managing the data to

play03:53

eliminate all these anomalies

play03:55

normalization came into existence

play03:57

there are many normal forms which are

play03:59

still under development but let's focus

play04:01

on the very basic and the essential ones

play04:03

only so we will be talking about first

play04:06

normal form second normal form third

play04:09

normal form and finally end this session

play04:11

with boyce-codd normal form so without

play04:14

wasting for the time let's proceed your

play04:16

first normal form

play04:17

in first normal form we tackle the

play04:20

problem of atomicity here at alma city

play04:23

means values in the table should not be

play04:25

further divided in simple terms a single

play04:28

cell cannot hold multiple values if a

play04:31

table contains a composite or

play04:32

multivalued attributes it violates the

play04:35

first normal form so the following

play04:37

functions will be performed in first

play04:39

normal form the first one is it removes

play04:41

repeating groups from the table and next

play04:44

it creates a separate table for each set

play04:46

of related data and finally it

play04:49

identifies each set of related data with

play04:51

the primary key to understand this in a

play04:54

better way let's look at the given table

play04:56

in the employee table we have employee

play04:59

ID employee name phone number and salary

play05:03

as columns we can clearly see that the

play05:05

phone number column has two values thus

play05:08

it violates the first normal form now if

play05:11

we apply the first normal form to the

play05:13

above table we get the following result

play05:15

in this table each and every row is

play05:18

listing that is no cell has multiple

play05:21

values the table has achieved a dhama

play05:24

city first normal form is simple and can

play05:27

be easily identified in the table we can

play05:30

clearly see there is no multiple values

play05:32

in each and every column thus the first

play05:35

normal form is achieved now let's move

play05:38

to the second normal form second normal

play05:40

form was originally defined by EF chord

play05:43

in 1971 a table is said to be in second

play05:47

normal form only when it fulfills the

play05:49

following condition the first condition

play05:52

is it has to be in first normal form and

play05:54

the second one is the table also should

play05:57

not contain partial dependency here

play06:00

partial dependency means the proper

play06:02

subset of a candidate key determines a

play06:05

non-prime attribute so what is a

play06:07

non-prime attribute let's understand

play06:10

this in a simple way attributes that

play06:12

form a candidate key in a table or

play06:14

called Prime attributes and the rest of

play06:16

the attributes of the relation are non

play06:18

prime for a table prime attributes can

play06:21

be like employee ID and Department ID

play06:23

and the non prime attributes can be like

play06:26

office location to understand second

play06:29

normal form let's consider this table

play06:31

this table has a composite primary key

play06:34

that is employee ID and department ID

play06:37

makes a primary key the non key

play06:39

attribute is office location in this

play06:42

case office location only depends on

play06:44

department ID which is only the power of

play06:47

primary key

play06:48

therefore this table does not satisfy

play06:50

the second normal form so what to do in

play06:53

such scenario the answer is simple

play06:56

split the table accordingly to bring

play06:59

this table to second normal form we need

play07:01

to break the table into two parts which

play07:03

will give the following tables the first

play07:05

table has employee ID and department ID

play07:08

as columns the second one has department

play07:11

ID and office location as columns as you

play07:14

can see we have removed the partial

play07:16

functional dependency that we initially

play07:18

had now in the table the column office

play07:21

location is fully dependent on the

play07:23

primary key of that table which is

play07:25

nothing but department ID I hope you

play07:28

understood second normal form now that

play07:30

we have learned first normal form and

play07:32

second normal form let's head to the

play07:34

next part of this normalization next

play07:37

topic is third normal form third normal

play07:39

form is a normal form that is used in

play07:42

normalizing the table to reduce the

play07:44

duplication of data and ensure

play07:45

referential integrity the following

play07:48

condition has to be met by the table to

play07:50

be in third normal form and the first

play07:52

condition is the table has to be in

play07:54

second normal form and the second

play07:56

condition is no non-prime attribute is

play07:59

transitively dependent on any non-prime

play08:02

attribute which depends on other non

play08:04

prime attributes

play08:05

I know it's bit confusing so let me make

play08:08

it simple for you it's like if C is

play08:10

dependent on B and in turn B is

play08:13

dependent on a then transitively C is

play08:16

dependent on a this should not happen in

play08:18

third normal form all the non prime

play08:21

attributes must depend only on the prime

play08:23

attributes

play08:25

these are the two necessary condition

play08:26

that needs to be attained so why was the

play08:29

normal form design firstly to eliminate

play08:32

undesirable data anomalies next one is

play08:35

to reduce the need for restructuring

play08:37

over time and finally to make the data

play08:40

model more informaiton since we have

play08:43

understood the third normal form let's

play08:45

look at the example table in the above

play08:47

table student ID determines subject ID

play08:50

and subject ID determine subject

play08:52

therefore student ID determines subject

play08:55

yr subject ID this implies that we have

play08:58

transitive functional dependency and

play09:00

this table does not satisfy the third

play09:03

normal form now in order to achieve the

play09:06

normal form we need to divide the table

play09:08

as shown below

play09:09

firstly let's divide the table and store

play09:12

student ID student name subject ID and

play09:15

address in it all the columns are

play09:17

referring to the primary key which is

play09:19

student ID let the second table have

play09:22

subject ID and subject column so subject

play09:25

is dependent only on subject ID and not

play09:28

on student ID as you can see from the

play09:31

above table all the non-key attributes

play09:33

are now fully functionally dependent

play09:35

only on the primary key in the first

play09:37

table column such as student name

play09:40

subject ID and address are only

play09:42

dependent on student ID in the second

play09:45

table subject is only dependent on

play09:47

subject ID with this being understood

play09:50

now we can proceed further to next

play09:52

normal form that is Boyce Codd normal

play09:53

form this is also known as 3.5 normal

play09:57

form it is the higher version of third

play10:00

normal form and was developed by Raymond

play10:02

F boys and Edgar F Codd to address

play10:05

certain types of anomalies which were

play10:07

not dealt with third normal form before

play10:09

proceeding to Boyce Codd normal form the

play10:11

table has to satisfy third normal form

play10:13

in Boyce Codd normal form if every

play10:17

functional dependency that is a implies

play10:19

B then a has to be the super key of that

play10:22

particular table so what is a super key

play10:25

a super key is a group of single or

play10:27

multiple keys which identifies rows in a

play10:30

table let's look at the table to clearly

play10:33

understand Boyce Codd normal form

play10:35

in the given table one student can

play10:38

enroll for multiple subjects there can

play10:40

be multiple professor teaching one

play10:42

subject and for each subject a professor

play10:45

is assigned to the student these are the

play10:48

necessary condition of the stable in

play10:50

this table all the normal forms are

play10:52

satisfied except boyce-codd normal form

play10:55

Y as you can see that student ID and

play10:58

subject form the primary key which means

play11:01

that the subject column is prime

play11:03

attribute but there is one more

play11:05

dependency that is professor is

play11:07

depending on subject and well subject is

play11:10

a prime attribute professor is a

play11:12

non-prime attribute which is not allowed

play11:14

by boyce-codd normal form now in order

play11:17

to satisfy the boyce-codd normal form we

play11:20

will be dividing the table into two

play11:21

parts the table at the top will hold

play11:24

student ID which already exists and we

play11:27

will create a new column that is

play11:28

professor ID and in the second table

play11:31

which is below we'll have the columns

play11:33

professor ID professor and subject

play11:36

columns why do we need to have a new

play11:38

column that is professor ID by doing

play11:41

this we are removing the non prime

play11:43

attributes functional dependency in the

play11:46

second table professor ID will be the

play11:48

super key of that table

play11:49

and remaining column will be

play11:51

functionally dependent on it by doing

play11:53

this we are satisfying boyce-codd normal

play11:55

form so this brings us to the end of

play11:58

this session I hope you have clearly

play12:00

understood the normalization and its

play12:02

different types if you have any queries

play12:05

or doubts regarding this session please

play12:06

let me know in the comment section and

play12:08

I'll get back to you with an answer

play12:10

thank you guys for watching this video

play12:12

and have a great day

play12:14

I hope you have enjoyed listening to

play12:16

this video please be kind enough to like

play12:18

it and you can comment any of your

play12:21

doubts and queries and we will reply

play12:23

them at the earliest do look out for

play12:26

more videos in our playlist and

play12:27

subscribe to any rekha channel to learn

play12:30

more happy learning

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
SQLNormalizationData RedundancyDatabase Design1NF2NF3NFBoyce-CoddData IntegrityDatabase ManagementData Anomalies
هل تحتاج إلى تلخيص باللغة الإنجليزية؟