Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF
Summary
TLDRThis video script explores the concept of database normalization, explaining its purpose, process, and benefits. It covers the five normal forms, from First to Fifth Normal Form, using practical examples to illustrate how normalization prevents data anomalies and improves database integrity. The script simplifies complex concepts, ensuring viewers understand the importance of normalization in maintaining accurate and consistent data.
Takeaways
- đ Normalization is a database design process aimed at structuring a database to reduce redundancy and improve data integrity.
- đ The purpose of normalization is to organize data to prevent anomalies such as contradictory information and to ensure that the database accurately represents reality.
- đ Normalization involves applying a series of normal forms, starting from the First Normal Form (1NF) up to the Fifth Normal Form (5NF), each building on the previous to ensure a higher level of data integrity.
- đ« First Normal Form (1NF) requires that a table should not use row order to convey information, should not mix data types within a column, must have a primary key, and should not contain repeating groups.
- đ Second Normal Form (2NF) builds on 1NF by ensuring that every non-key attribute is fully dependent on the primary key, preventing partial dependencies that can lead to update and deletion anomalies.
- đ Third Normal Form (3NF) further refines the database by eliminating transitive dependencies, meaning that non-key attributes must depend only on the primary key, not on other non-key attributes.
- đ Fourth Normal Form (4NF) addresses multivalued dependencies, ensuring that a table is only in 4NF if it does not exhibit such dependencies unless they are on the primary key.
- 𧩠Fifth Normal Form (5NF) is concerned with the situation where a table can be logically deduced from other tables through a join operation, which should be avoided for the table to be in 5NF.
- đ ïž The process of normalization helps in making the database design more robust against insertion, update, and deletion anomalies, which can occur if the database is not properly structured.
- đ€ Understanding and applying the normal forms is crucial for database designers to create a well-structured database that is both efficient and reliable in representing and managing data.
- đ The principles of normalization are fundamental to relational database theory and are essential for maintaining data consistency and avoiding logical inconsistencies in the database.
Q & A
What is normalization in the context of relational databases?
-Normalization is the process of structuring a database table in such a way that it can't express redundant information, ensuring data integrity and consistency.
Why is normalization important in database design?
-Normalization is important to prevent data anomalies such as insertion, update, and deletion anomalies, and to ensure that the database design is logical, consistent, and easy to maintain.
What are the different normal forms in database normalization?
-The normal forms range from First Normal Form (1NF) to Fifth Normal Form (5NF), with each form providing a higher level of data integrity and reducing redundancy.
What is the primary key in a database table?
-A primary key is a column or a combination of columns that uniquely identifies each row in a table, which is a requirement for achieving First Normal Form.
What is an example of a violation of First Normal Form?
-An example of violating 1NF is using row order to convey information, mixing data types within a column, lacking a primary key, or having repeating groups within a table.
What is a deletion anomaly and how does it relate to normalization?
-A deletion anomaly occurs when deleting a row of data leads to the loss of information that is not related to the data being deleted, which can happen if a table is not in Second Normal Form.
What is an update anomaly and how can normalization prevent it?
-An update anomaly occurs when an attempt to update data in one row inadvertently affects other rows with the same non-key attribute, which can be prevented by ensuring the table is in Second Normal Form.
What is the definition of Third Normal Form?
-Third Normal Form (3NF) is achieved when every non-key attribute in a table is directly dependent on the primary key, not on any other non-key attribute, thus preventing transitive dependencies.
What is a multivalued dependency and how does it relate to Fourth Normal Form?
-A multivalued dependency is a relationship where one attribute can determine multiple values of another attribute. Fourth Normal Form (4NF) ensures that only multivalued dependencies on the key are allowed in a table.
What is the purpose of Fifth Normal Form and how is it achieved?
-Fifth Normal Form (5NF) is achieved when a table cannot be logically thought of as the result of joining other tables, ensuring that the table represents atomic facts and eliminating any redundancy that could arise from joining tables.
Outlines
đ Introduction to Database Normalization
This paragraph introduces the concept of database normalization, explaining its purpose and benefits. Normalization is the process of structuring a database to reduce redundancy and improve data integrity. The video aims to simplify the topic by minimizing jargon and using practical examples. It covers normal forms from First through Fifth Normal Form, discussing the advantages of normalization and the potential issues arising from its absence, such as data integrity problems and bad database design.
đ Understanding First Normal Form (1NF)
The paragraph delves into the specifics of the First Normal Form (1NF), which is the initial step in database normalization. It emphasizes the prohibition of using row order to convey information, the necessity of having a primary key, and the avoidance of mixing data types within a column or having repeating groups. The explanation includes examples, such as the incorrect recording of the Beatles' heights and the proper way to structure a table with a primary key to ensure data consistency and prevent anomalies.
đ Advancing to Second Normal Form (2NF)
This section discusses the Second Normal Form (2NF), building upon 1NF by addressing the issue of non-key attributes that depend on only a part of the primary key, leading to update, insertion, and deletion anomalies. The paragraph uses the example of a player inventory table to illustrate these issues and explains that 2NF requires each non-key attribute to be dependent on the entire primary key. The solution involves creating separate tables for related but distinct pieces of information to maintain data integrity.
đ ïž Achieving Third Normal Form (3NF)
The paragraph explains Third Normal Form (3NF), which focuses on eliminating transitive dependencies where a non-key attribute depends on another non-key attribute. Using a Player table example with Player_Rating and Player_Skill_Level, it shows how data inconsistencies can arise and how to correct the design by creating separate tables for Player and Player_Skill_Levels. The key takeaway is that every attribute should depend on the primary key directly and not through other attributes.
đš Beyond 3NF: Fourth and Fifth Normal Forms
This paragraph introduces Fourth Normal Form (4NF) and Fifth Normal Form (5NF), which address more complex scenarios not covered by 3NF. It uses the example of a birdhouse supplier's database to explain 4NF, which deals with multivalued dependencies where an attribute may have multiple values for another attribute, leading to potential inconsistencies. The solution is to separate these into distinct tables. Fifth Normal Form (5NF), also known as Project-Join Normal Form, is mentioned as the final step in ensuring a table is not the result of joining other tables, which would imply a dependency that violates normalization rules.
đ Conclusion and Invitation for Feedback
The final paragraph wraps up the video by summarizing the key points of database normalization, from First Normal Form to Fifth Normal Form. It reviews the rules for each normal form and emphasizes the importance of adhering to these rules to maintain a well-structured and consistent database. The speaker invites viewers to share comments, questions, or suggestions for other topics they'd like to see explained in future videos, encouraging further engagement and learning.
Mindmap
Keywords
đĄNormalization
đĄFirst Normal Form (1NF)
đĄSecond Normal Form (2NF)
đĄThird Normal Form (3NF)
đĄBoyce-Codd Normal Form
đĄFourth Normal Form (4NF)
đĄFifth Normal Form (5NF)
đĄData Integrity
đĄAnomalies
đĄPrimary Key
đĄFunctional Dependency
Highlights
Normalization is the process of structuring a database to minimize redundancy and dependency.
Normalization helps maintain data integrity and prevents logical inconsistencies in the database.
A database in First Normal Form (1NF) ensures no repeating groups, mixed data types, or row order significance.
A primary key is essential for a table to achieve 1NF, uniquely identifying each record.
Second Normal Form (2NF) requires that all non-key attributes depend on the entire primary key, preventing partial dependencies.
Third Normal Form (3NF) further ensures that non-key attributes do not depend on other non-key attributes, eliminating transitive dependencies.
Boyce-Codd Normal Form is a stronger version of 3NF, where every attribute must depend solely on the primary key.
Fourth Normal Form (4NF) addresses multivalued dependencies, allowing only those on the key.
Fifth Normal Form (5NF) ensures that a table cannot be logically deduced from joining other tables, avoiding join anomalies.
Normalization protects against insertion, update, and deletion anomalies by adhering to the normal forms.
Examples given in the video illustrate the practical application of normal forms in database design.
The video simplifies complex database concepts, making normalization principles accessible to a broader audience.
Data anomalies such as contradictory information are prevented through proper normalization.
Designing a database without normalization can lead to logical impossibilities and data inconsistencies.
The video provides a clear understanding of what is gained by normalization and what is lost by failing to normalize.
The Decomplexify series aims to demystify complex topics, offering simplicity to complex subjects like database normalization.
The video concludes with a review of all normal forms, reinforcing the importance of normalization in database design.
Transcripts
If youâve had some exposure to relational databases, youâve probably come across the term Â
ânormalizationâ. But what is normalization? Why do we do it? How do we do it? And what Â
bad things can happen if we donât do it? In this video, weâre going to explore database Â
normalization from a practical perspective. Weâll keep the jargon to a minimum, Â
and weâll use lots of examples as we go. By the end of it, youâll understand the so-called normal Â
forms from First Normal Form all the way up to Fifth Normal Form â and youâll have a clear sense Â
of what we gain by doing normalization, and what we lose by failing to do it. Â
This is Decomplexify, bringing a welcome dose of simplicity to complex topics. Â
Data: itâs everywhere. And some of it is wrong. Â
By and large, even a good database design canât protect against bad data. Â
But there are some cases of bad data that a good database design can protect against. These are Â
cases where the data is telling us something that logically cannot possibly be true: Â
One customer with two dates of birth is logically impossible. Itâs what we might Â
call a failure of data integrity. The data canât be trusted because it disagrees with itself. Â
When data disagrees with itself, thatâs more than just a problem of bad data. Â
Itâs a problem of bad database design. Â
Specifically, itâs what happens when a database design isnât properly normalized. Â
So what does normalization mean? When you normalize a database table, Â
you structure it in such a way that canât express redundant information. Â
So, for example, in a normalized table, you wouldnât be able to give Customer 1001 two dates Â
of birth even if you wanted to. Very broadly, the table can only express one version of the truth. Â
Normalized database tables are not only protected from contradictory data, theyâre also: Â
easier to understand easier to enhance and extend Â
protected from insertion anomalies, update anomalies, Â
and deletion anomalies (more on these later) Â
How do we determine whether a table isnât normalized enough â in other words, how do Â
we determine if thereâs a danger that redundant data could creep into the table? Well, it turns Â
out that there are sets of criteria we can use to assess the level of danger. These sets of criteria Â
have names like âfirst normal formâ, âsecond normal formâ, âthird normal formâ, and so on. Â
Think of these normal forms by analogy to safety assessments. We might imagine an engineer doing a Â
very basic safety assessment on a bridge. Letâs say the bridge passes the basic assessment, Â
which means it achieves âSafety Level 1: Safe for Pedestrian Trafficâ. Â
That gives us some comfort, but suppose we want to know if cars can safely drive across the bridge? Â
To answer that question, we need the engineer to perform an even stricter assessment of the bridge. Â
Letâs imagine that the engineer goes ahead and does this stricter assessment, and again the Â
bridge passes, achieving âSafety Level 2: Safe for Carsâ. If even this doesnât satisfy us, Â
we might ask the engineer to assess the bridge for âSafety Level 3: Safe for Trucks.â And so on. Â
The normal forms of database theory work the same way. Â
If we discover that a table meets the requirements of first normal form, Â
thatâs a bare minimum safety guarantee. If we further discover that the table meets Â
the requirements of second normal form, thatâs an even greater safety guarantee. And so on. Â
So letâs begin at the beginning, with First Normal Form. Â
Suppose you and I are both confronted by this question: Â
âWho were the members of the Beatles?â You might answer âJohn, Paul, George, and Ringoâ. Â
I might answer âPaul, John, Ringo, and Georgeâ. Of course, my answer and your answer are Â
equivalent, despite having the names in a different order. Â
When it comes to relational databases, the same principle applies. Letâs record the names of the Â
Beatles in a table, and then letâs ask the database to return those names back to us. Â
The results will get returned to us in an arbitrary order. For example, they might Â
get returned like this. Or like this. Â
Or in any other order. There is no ârightâ order. Are there ever situations where thereâs a right Â
order? Suppose we write down the members of the Beatles from tallest to shortest, Â
like this. We title our list âMembers Of The Beatles From Tallest To Shortestâ. Â
In this list, itâs not just the names that convey meaning. The order of the names conveys Â
meaning too. Paul is the tallest, John is the second-tallest, and so on. Lists like this are Â
totally comprehensible to us â but theyâre not normalized. Remember, thereâs no such thing as row Â
order within a relational database table. So here we have our first violation of First Normal Form. Â
When we use row order to convey information, weâre violating First Normal Form. Â
The solution is very simple. Be explicit â if we want to capture height information, we should Â
devote a separate column to it â like this. Or even better, like this. Â
So far, weâve seen one way in which a design can fail to achieve Â
First Normal Form. But there are others. A second way of violating First Normal Form Â
involves mixing data types. Suppose our Beatle_Height dataset looked like this. Â
If youâre accustomed to spreadsheets, youâll be aware that they typically wonât stop you from Â
having more than one datatype within a single column â for example, they wonât stop you from Â
storing both numbers and strings in a column. But in a relational database, youâre not allowed to be Â
cagey or ambiguous about a columnâs data type. The values that go in the Height_In_Cm column Â
canât be a mix of integers and strings. Once you define Height_In_Cm as being an integer column, Â
then every value that goes into that column will be an integer â no strings, no timestamps, Â
no data types of any kind other than integers. So: mixing datatypes within a column Â
is a violation of First Normal Form, and in fact the database platform wonât even let you do it. Â
A third way of violating First Normal Form is by designing a table without a primary key. A primary Â
key is a column, or combination of columns, that uniquely identifies a row in the table. Â
For example, in the table Beatle_Height, our intention is that each row should tell Â
us about one particular Beatle, so we ought to designate âBeatleâ as the primary key of the Â
Beatle_Height table. The database platform will need to know about our choice of primary key, Â
so weâll want to get the primary key into the database by doing something like this. Â
With the primary key in place, the database platform will prevent multiple Â
rows for the same Beatle from ever being inserted. Thatâs a good thing, Â
because multiple rows for the same Beatle would be nonsensical, and perhaps contradictory. Â
Obviously, a Beatle canât have two different heights at once. Â
Every table we design should have a primary key. If it doesnât, itâs not in First Normal Form. Â
The last way of failing to achieve First Normal Form involves the notion of Â
ârepeating groupsâ. Suppose weâre designing a database for an online multiplayer game. Â
At a given time, each player has a number of items of different types, like arrows, Â
shields, and copper coins. We might represent the situation like this. Â
A playerâs inventory is what we call a ârepeating groupâ. Each inventory contains potentially many Â
different types of items: arrows, shields, copper coins, and so on; and in fact there Â
may be hundreds of different types of items that a player might have in their inventory. Â
We could design a database table that represents the Inventory as a string of text: Â
But this is a terrible design because thereâs no easy way of querying it. Â
For example, if we want to know which players currently have more than 10 copper coins, Â
then having the inventory data lumped together in a text string Â
will make it very impractical to write a query that gives us the answer. Â
We might be tempted to represent the data like this. Â
This lets us record up to 4 items per inventory. But given that a player can have an inventory Â
consisting of hundreds of different types of items, how practical is it going to be to design Â
a table with hundreds of columns? Even if we were to go ahead and create a super-wide table to hold Â
all possible inventory data, querying it would still be extremely awkward. Â
The bottom line is that storing a repeating group of data items on a single row violates First Â
Normal Form. So what sort of alternative design would respect First Normal Form? Â
It would be this. To communicate the fact that Â
trev73 owns 3 shields, we have a row for Player âtrev73â, Item_Type âshieldsâ, Item_Quantity 3. Â
To communicate the fact that trev73 also owns 5 arrows, Â
we have a row for Player âtrev73â, Item_Type âarrowsâ, Item_Quantity 5. And so on. Â
And because each row in the table tells us about one unique combination of Player Â
and Item_Type, the primary key is the combination of Player and Item_Type. Â
So letâs review what we know about First Normal Form. Â
1. using row order to convey information is not permitted Â
2. mixing data types within the same column is not permitted Â
3. having a table without a primary key is not permitted Â
4. repeating groups are not permitted Next up: Second Normal Form. Â
Letâs look again at our Player Inventory table. This table is fully normalized. But suppose we Â
enhance the table slightly. Letâs imagine that every player has a rating: Beginner, Â
Intermediate, or Advanced. We want to record the current rating of each player â and to achieve Â
that, we simply include in our table an extra column called Player_Rating. Â
Notice whatâs happening here. Player jdog21 has a Player_Rating of Intermediate, Â
but because jdog21 has two rows in the table, both those rows have to be marked Intermediate. Â
Player trev73 has a Player_Rating of Advanced, Â
but because trev73 has four rows in the table, all four of those rows have to be marked Advanced. Â
This is not a good design. Why not? Well, suppose player gila19 loses all her copper coins, Â
leaving her with nothing in her inventory. The single entry that she did have in the Â
Player_Inventory table is now gone. If we try to query the database to find Â
out what gila19âs Player Rating is, weâre out of luck. We can no longer access gila19âs Player Â
Rating because the database no longer knows it. This problem is known as a deletion anomaly. Â
And thatâs not all. Suppose jdog21 improves his rating from Intermediate to Advanced. Â
To capture his new Advanced rating in the Player_Inventory table, Â
we run an update on his two records. But letâs imagine the update goes wrong. Â
By accident, only one of jdog21âs records gets updated, and the other record gets left alone. Â
Now the data looks like this. As far as the database is concerned, Â
jdog21 is somehow both Intermediate and Advanced at the same time. Â
Our table design has left the door open for this type of logical inconsistency. Â
This problem is called an update anomaly. Or suppose a new player called tina42 comes along. Â
Sheâs a Beginner and she doesnât have anything in her inventory yet. We want to record the fact Â
that sheâs a Beginner, but because she has nothing in her inventory, we canât Â
insert a tina42 row into the Player_Inventory table. So her rating goes unrecorded. This Â
problem is known as an insertion anomaly. The reason our design is vulnerable to these Â
problems is that it isnât in Second Normal Form. Why not? What is Second Normal Form? Â
Second Normal Form is about how a tableâs non-key columns relate to the primary key. In our table, Â
the non-key columns â or to use slightly different terminology, non-key attributes â are Â
Item_Quantity and Player_Rating. They are columns (also called attributes), that donât belong Â
to the primary key. As we saw earlier, the primary key is the combination of Player and Item Type. Â
Now weâre in a position to give a definition of Second Normal Form. Â
The definition weâre going to give is an informal one which leaves out some Â
nuances â but for most practical purposes, that shouldnât matter. Â
Informally, what Second Normal Form says is that each non-key attribute in the table Â
must be dependent on the entire primary key. How does our table measure up to this definition? Â
Letâs examine our non-key attributes, which are the attributes Item_Quantity and Player_Rating. Â
Does Item_Quantity depend on the entire primary key? Yes, because an Item_Quantity is about a Â
specific Item_Type owned by specific Player. We can express it like this. Â
The arrow signifies a dependency â or to give it its proper name, a functional dependency. Â
This simply means that each value of the thing on the left side of the arrow is associated with Â
exactly one value of the thing on the right side of the arrow. Each combination of Player_ID and Â
Item_Type is associated with a specific value of Item_Quantity â for example the combination Â
of Player_ID jdog21 / Item_Type âamuletsâ is associated with an Item_Quantity of 2. Â
As far as Second Normal Form is concerned, this dependency is fine, Â
because itâs a dependency on the entire primary key. But what about the other dependency? Â
Does Player_Rating depend on the entire primary key? No, it doesnât. Player_Rating is a property Â
of the Player only. In other words, for any given Player, thereâs one Player_Rating. Â
This dependency on Player is the problem. Itâs a problem because Player isnât the Â
primary key â Player is part of the primary key, but itâs not the whole key. Â
Thatâs why the table isnât in Second Normal Form, and thatâs why itâs vulnerable to problems. Â
At what point did our design go wrong, and how can we fix it? The design went wrong Â
when we chose to add a Player_Rating column to a table where it didnât really belong. Â
The fact that a Player_Rating is a property of a Player should have helped us to realise Â
that a Player is an important concept in its own right â so surely Player deserves its own table: Â
Nothing could be simpler than that. A Player table will contain one row per Player, Â
and in it we can include as columns the ID of the player, the rating of the player, as well Â
as all sorts of other properties of the player â maybe the playerâs date of birth, for example, Â
maybe the playerâs email address. Our other table, Player_Inventory, can stay as it was. Â
For both tables, we can say that there are no part-key dependencies. Â
In other words, itâs always the case that every attribute depends on the whole primary key, Â
not just part of it. And so our tables are in Second Normal Form. Â
Now letâs move on to Third Normal Form. Suppose we decide to enhance the Player table. Â
We decide to add a new column called Player_Skill_Level. Â
Imagine that in this particular multiplayer game, thereâs a nine-point scale for skill level. Â
At one extreme, a player with skill level 1 is an absolute beginner; Â
at the opposite extreme, a player with skill level 9 is as skilful as itâs possible to be. Â
And letâs say that weâve defined exactly how Player Skill Levels relate to Player Ratings. Â
âBeginnerâ means a skill level between 1 and 3. âIntermediateâ means a skill Â
level between 4 and 6. And âAdvancedâ means a skill level between 7 and 9. Â
But now that both the Player_Rating and the Player_Skill_Level exist in the Player table, Â
a problem can arise. Letâs say that tomorrow, player gila19âs skill level increases from 3 Â
to 4. If that happens, weâll update her row in the Player table to reflect this new skill level. Â
By rights, we should also update her Player_Rating to Intermediate â but suppose something goes Â
wrong, and we fail to update the Player_Rating. Now weâve got a data inconsistency. gila19âs Â
Player_Rating says sheâs a Beginner, but her Player_Skill_Level implies sheâs Intermediate. Â
How did the design allow this happen? Second Normal Form didnât flag up any problems. Thereâs Â
no attribute here that depends only partially on the primary key â as a matter of fact, Â
the primary key doesnât have any parts; itâs just a single attribute. And both Player_Rating Â
and Player_Skill_Level are dependent on it. But in what way are they dependent on it? Letâs Â
look more closely. Player_Skill_Level is dependent on Player_ID. Â
Player_Rating is dependent on Player IDÂ too, but only indirectly â like this. Â
A dependency of this kind is called a transitive dependency. Player Rating depends on Player Skill Â
Level which in turn depends on the primary key: Player ID. The problem is located just Â
here â because what Third Normal Form forbids is exactly this type of dependency: the dependency of Â
a non-key attribute on another non-key attribute. Because Player Rating depends on Player Skill Â
Level â which is a non-key attribute â this table is not in Third Normal Form. Â
Thereâs a very simple way of repairing the design to get it into Third Normal Form. Â
We remove Player Rating from the Player table;Â so now the Player table looks like this. Â
And we introduce a new table called Player_Skill_Levels. Â
The Player Skill Levels table tells us everything we need to know about how to translate a player Â
skill level into a player rating. Third Normal Form is the culmination of everything Â
weâve covered about database normalization so far. It can be summarised in this way: Every Â
non-key attribute in a table should depend on the key, the whole key, and nothing but the key. Â
If you commit this to memory, and keep it constantly in mind while youâre designing a Â
database, then 99% of the time you will end up with fully normalized tables. Â
Itâs even possible to shorten this guideline slightly by knocking out the phrase Â
ânon-keyâ â giving us the revised guideline: every attribute in a table should depend on the key, the Â
whole key, and nothing but the key. And this new guideline represents a slightly stronger flavor of Â
Third Normal Form known as Boyce-Codd Normal Form. In practice, the difference between Third Normal Â
Form and Boyce-Codd Normal Form is extremely small, and the chances of you ever encountering Â
a real-life Third Normal Form table that doesnât meet Boyce-Codd Normal Form are almost zero. Â
Any such table would have to have what we call multiple overlapping candidate keys â which gets Â
us into realms of obscurity and theoretical rigor that are a little bit beyond the scope Â
of this video. So as a practical matter, just follow the guideline that every attribute in a Â
table should depend on the key, the whole key, and nothing but the key, and you can Â
be confident that the table will be in both Third Normal Form and Boyce-Codd Normal Form. Â
In almost all cases, once youâve normalized a table this far, youâve fully normalized Â
it. There are some instances where this level of normalization isnât enough. Â
These rare instances are dealt with by Fourth and Fifth Normal Form. Â
So letâs move on to Fourth Normal Form. Weâll look at an example of a situation where Third Â
Normal Form isnât quite good enough and something a bit stronger is needed. In our example, thereâs Â
a website called DesignMyBirdhouse.com â the worldâs leading supplier of customized birdhouses. Â
On DesignMyBirdhouse.com, customers can choose from different birdhouse models, Â
and, for the model theyâve selected, they can choose both a custom color Â
and a custom style. Each model has its own range of available colors and styles. Â
One way of capturing this information is to put it all the possible Â
combinations in a single table, like this. This table is in Third Normal Form. The primary Â
key consists of all three columns: {Model, Color, Style}. Everything depends on the key, Â
the whole key, and nothing but the key. And yet this table is still vulnerable Â
to problems. Letâs look at the rows for the birdhouse model âPrairieâ: Â
The available colors for the âPrairieâ birdhouse model are brown and beige. Â
Now suppose DesignMyBirdhouse.com decides to introduce a third available color for Â
the âPrairieâ model: green. This will mean weâll have to add two extra âPrairieâ rows to the table: Â
one for green bungalow, and one for green schoolhouse. Â
If by mistake we only add a row for green bungalow, and fail to add the row for green Â
schoolhouse, then we have a data inconsistency. Available colors are supposed to be completely Â
independent of available styles. But our table is saying that a customer can choose Â
green only for the bungalow style, not for the schoolhouse style. That makes no sense. Â
The prairie birdhouse model is available in green, so all its styles should be available in green. Â
Something about the way the table is designed has allowed us to represent an impossible situation. Â
To see whatâs gone wrong, letâs have a closer look at the dependencies among Models, Â
Colors, and styles. Can we say that Color has a functional dependency on Model? Â
Actually no, because a specific Model isnât associated with just one Color. Â
And yet it does feel as though Color has some relationship to Model. How can we express it? Â
We can say that each Model has a specific set of available Colors. This kind of dependency is Â
called a multivalued dependency, and we express it with a double-headed arrow, like this: Â
And itâs equally true that each Model has a specific set of available Styles. Â
What Fourth Normal Form says is that the only kinds of multivalued dependency weâre allowed Â
to have in a table are multivalued dependencies on the key. Model is not the key; so the table Â
Model_Colors_And_Styles_Available is not in Fourth Normal Form. Â
As always, the fix is to split things out into multiple tables. Â
Now, if DesignMyBirdhouse.com expands the range of Prairie-Model colors to include green, we simply Â
add a row to the Model_Colors_Available table: And no anomalies are possible. Â
Weâre now ready for Fifth Normal Form, the last normal form covered in this video. Â
For our Fifth Normal Form example, we imagine that there are three different brands of ice Â
cream available: Frostyâs, Alpine, and Ice Queen. Each of the three brands of ice cream Â
offers a different range of flavors: Frostyâs offers vanilla, chocolate, Â
strawberry, and mint chocolate chip Alpine offers vanilla and rum raisin Â
Ice Queen offers vanilla, strawberry, and mint chocolate chip Â
Now we ask our friend Jason what types of ice cream he likes. Â
Jason says: I only like vanilla and chocolate. And I only like the brands Frosty and Alpine. Â
We ask our other friend, Suzy, what types of ice cream she likes. Suzy says: I only like Â
rum raisin, mint chocolate chip, and strawberry. And I only like the brands Alpine and Ice Queen. Â
So, after a little bit of brainwork, we deduce exactly which ice cream products Â
Jason and Suzy are willing to eat;Â and we express this in a table: Â
But time passes, tastes change, and at some point Suzy announces that she now likes Frostyâs brand Â
ice cream too. So we need to update our table. It wonât come as any surprise that we might get Â
this update wrong. We might successfully add a row for Person Suzy â Brand Frostyâs â Flavor Â
Strawberry, but fail to add a row for Person Suzy â Brand Frostyâs â Flavor Mint Chocolate Chip. Â
And this outcome wouldnât just be wrong â it would be logically inconsistent â because weâve Â
already established that Suzy likes Frostyâs brand, and likes Mint Chocolate Chip flavor, Â
and therefore thereâs no way she can dislike Frostyâs Mint Chocolate Chip. Â
In this example, we went wrong right at the beginning. At the beginning, we were given Â
three pieces of information. First, we were told which brands offered which flavors. Second, we Â
were told which people liked which brands. Third, we were told which people liked which flavors. Â
From those three pieces of information, we should have simply created three tables. Â
And thatâs all we needed to do. All the facts of the situation have been represented. Â
If we ever want to know what specific products everyone likes, Â
we can simply ask the database platform, expressing our question in the form of a Â
piece of SQL that logically deduces the answer by joining the tables together. Â
To sum things up: if we want to ensure that a table thatâs in Fourth Normal Â
Form is also in Fifth Normal Form, we need to ask ourselves whether the table can be Â
logically thought of as being the result of joining some other tables together. Â
If it can be thought of that way, then itâs not in Fifth Normal Form. Â
If it canât be thought of that way, then it is in Fifth Normal Form. Â
Weâve now covered all the normal forms from First Normal Form to Fifth Normal Form. Letâs review, Â
keeping in mind that for a table to comply with a particular normal form, it must comply with Â
all the lower normal forms as well. The rules for first normal form are: Â
1. using row order to convey information is not permitted Â
2. mixing data types within the same column is not permitted Â
3. having a table without a primary key is not permitted Â
4. repeating groups are not permitted The rule for second normal form is:Â Â
Each non-key attribute in the table must be dependent on the entire primary key. Â
The rule for third normal form is: Each non-key attribute in a table must depend on the key, Â
the whole key, and nothing but the key. If we prefer to drop the phrase ânon-keyâ, we end up Â
with an even simpler and even stronger version of third normal form called âBoyce-Codd Normal Formâ: Â
Each attribute in a table must depend on the key, the whole key, and nothing but the key. Â
The rule for fourth normal form is that the only kinds of multivalued dependency Â
weâre allowed to have in a table are multivalued dependencies on the key. Â
Finally, the rule for Fifth Normal Form is: it must not be possible to describe Â
the table as being the logical result of joining some other tables together. Â
I hope youâve found this video helpful. If you have any comments or questions Â
on what youâve just seen, by all means put them in the comments section below. Â
And if you have any suggestions for other complex topics that youâd like to see explained Â
on Decomplexify, again let me know in the comments. So long, and thanks for watching!
Browse More Related Video
What is Normalization in SQL? | Database Normalization Forms - 1NF, 2NF, 3NF, BCNF | Edureka
Normalisasi Basis Data 1NF, 2NF, 3NF dan Contoh Kasus Sederhana
Lec-20: Introduction to Normalization | Insertion, Deletion & Updation Anomaly
1NF 2NF 3NF DBMS
Boyce-Codd Normal Form (BCNF) | Database Normalization | DBMS
Database Normalization 1NF 2NF 3NF
5.0 / 5 (0 votes)