Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF
Summary
TLDR本视频深入探讨了数据库规范化的概念,解释了为什么要进行规范化以及如何进行规范化。规范化旨在防止数据冗余和不一致性,提高数据完整性。视频从第一范式到第五范式,通过实例讲解了每个范式的规则及其重要性,帮助观众理解规范化带来的益处和不规范化可能引发的问题。
Takeaways
- 😀 数据库规范化是一个确保数据一致性和可维护性的过程。
- 🔍 规范化可以防止数据插入、更新和删除异常。
- 📚 第一范式(1NF)要求表的每一列都是不可分割的基本数据项,且每个字段都是原子性,不能进一步分解。
- 🔑 第二范式(2NF)要求表中的非主键列必须完全依赖于主键,不能只依赖主键的一部分。
- 🛠️ 第三范式(3NF)要求非主键列只能依赖于主键,不能依赖于其他非主键列,即禁止传递依赖。
- 🎯 第四范式(4NF)要求表中只能存在对主键的多值依赖,不能存在非主键列之间的多值依赖。
- 🌐 第五范式(5NF)要求表不能是其他表的连接结果,即表中的数据必须彼此独立,没有冗余关联。
- 🚫 违反规范化原则可能导致数据不一致,比如一个客户有两个出生日期的情况。
- 🔄 规范化可以减少数据冗余,提高数据的一致性和完整性。
- 📈 通过规范化,数据库表更容易理解、维护和扩展。
Q & A
什么是数据库的规范化(Normalization)?
-规范化是数据库设计中的一种方法,它通过确保数据的逻辑一致性和减少数据冗余来优化数据库结构。
为什么要进行数据库规范化?
-进行数据库规范化可以防止数据的插入、更新和删除异常,提高数据的一致性和完整性,使数据库结构更易于理解和扩展。
如果数据库没有进行规范化,可能会发生哪些问题?
-不进行规范化可能会导致数据不一致、冗余信息、插入异常、更新异常和删除异常等问题。
第一范式(1NF)要求数据库表满足哪些条件?
-第一范式要求表的每一列都是不可分割的基本数据项,每一列的值都是原子的,不可以再分解,并且表中的每一行都有唯一的主键。
第二范式(2NF)是如何定义的?
-第二范式要求表中的非主键属性必须完全依赖于主键,即没有部分依赖于主键的情况。
第三范式(3NF)的主要规则是什么?
-第三范式要求表中的非主键属性不能依赖于其他非主键属性,即不存在传递依赖。
什么是博伊斯-科德范式(Boyce-Codd Normal Form, BCNF)?
-博伊斯-科德范式是第三范式的一种加强形式,要求表中每个属性都直接依赖于整个主键,而非其一部分。
第四范式(4NF)的主要要求是什么?
-第四范式要求表中只允许有关于主键的多值依赖,即表中不应该存在非平凡的且非候选键之间的多值依赖。
第五范式(5NF)是如何定义的?
-第五范式要求表不能被描述为是其他表的连接结果,即表中不应该存在连接依赖,除非它依赖于候选键。
如何判断一个表是否满足第一范式?
-要判断一个表是否满足第一范式,需要检查表中的每一列是否都是原子性的,并且表中是否有一个唯一的主键。
在设计数据库时,如何避免插入异常?
-为了避免插入异常,需要确保数据库表的设计满足第二范式,即非主键属性完全依赖于主键,这样可以避免因为主键不完整而导致无法插入记录的情况。
什么是重复组(Repeating Groups)?为什么它违反了第一范式?
-重复组是指在单个行中多次出现的数据集合。它违反了第一范式,因为这种设计允许在单个行中存储多个值,而这些值应该被分割到不同的行中,每行只包含一个值,以确保数据的原子性和一致性。
Outlines
🔍 数据库规范化简介
本段介绍了数据库规范化(Normalization)的概念、目的和方法。规范化是数据库设计中用来减少数据冗余和提高数据一致性的过程。通过实例讲解了规范化的好处,包括防止数据自身矛盾、提高数据库表的可理解性、便于扩展、防止插入、更新和删除异常。同时,解释了规范化的不同级别,即从第一范式到第五范式,并用桥梁安全评估的类比来形象说明不同范式之间的关系。
📚 第一范式(1NF)详解
详细解释了第一范式(1NF)的要求,包括:1. 不能使用行顺序来传达信息;2. 同一列内不允许混合数据类型;3. 表中必须有主键;4. 不允许出现重复组。通过举例说明违反1NF的情况,如行顺序传达信息、列中数据类型混合、缺少主键和重复组。同时,提供了符合1NF的设计方法,如为身高信息添加单独的列,确保每个列都有明确的数据类型,并为每个玩家设计单独的行来避免重复组。
🛠️ 第二范式(2NF)与删除、更新异常
本段讨论了第二范式(2NF)的概念,即表中的非键属性必须完全依赖于整个主键。通过玩家库存表的例子,说明了2NF防止删除异常和更新异常的重要性。如果非键属性只依赖于主键的一部分,就会产生问题,如玩家等级信息可能会因为库存变化而丢失。正确的做法是将玩家信息和库存信息分开存储,确保每个属性都依赖于整个主键。
🌐 第三范式(3NF)与传递依赖
第三范式(3NF)要求表中的非键属性只能依赖于主键,不能有传递依赖。通过玩家技能等级和玩家等级的例子,说明了3NF的重要性。如果玩家等级依赖于玩家技能等级,而技能等级又依赖于主键(玩家ID),则会产生数据不一致的问题。解决这个问题的方法是将玩家等级从玩家表中移除,创建一个新的表来存储技能等级和等级之间的关系。
🎨 第四范式(4NF)与多值依赖
第四范式(4NF)处理的是多值依赖问题,要求表中的多值依赖必须依赖于主键。通过DesignMyBirdhouse.com网站的例子,说明了如果颜色和风格依赖于模型,但表设计不当,可能会导致数据不一致。正确的做法是将模型、颜色和风格分开存储,确保任何多值依赖都依赖于主键。
🍨 第五范式(5NF)与连接依赖
第五范式(5NF)是最高级别的规范化,要求表不能是其他表连接的结果。通过冰激淋品牌和口味的例子,说明了如果表设计不当,可能会导致数据不一致。正确的做法是将品牌、口味和个人喜好分开存储,确保表的设计不会引入逻辑上的连接依赖。
Mindmap
Keywords
💡规范化(Normalization)
💡第一正规形式(First Normal Form, 1NF)
💡第二正规形式(Second Normal Form, 2NF)
💡第三正规形式(Third Normal Form, 3NF)
💡冗余(Redundancy)
💡插入异常(Insertion Anomaly)
💡更新异常(Update Anomaly)
💡删除异常(Deletion Anomaly)
💡主键(Primary Key)
💡外键(Foreign Key)
💡多值依赖(Multivalued Dependency)
Highlights
数据库规范化的目的和好处。
规范化的基本概念和实际操作。
不进行规范化可能导致的问题。
第一范式(1NF)的定义和要求。
违反第一范式的几种情况。
第二范式(2NF)的定义和要求。
如何识别和解决插入、更新和删除异常。
第三范式(3NF)的定义和要求。
第三范式与博伊斯-科德范式(BCNF)的关系。
第四范式(4NF)的定义和要求。
多值依赖和它的处理方法。
第五范式(5NF)的定义和要求。
如何避免表的逻辑连接导致的异常。
规范化的实用指南和设计原则。
规范化对数据库性能和一致性的影响。
规范化在实际数据库设计中的应用。
作者对视频内容的总结和观众反馈邀请。
Transcripts
If you’ve had some exposure to relational databases, you’ve probably come across the term
“normalization”. But what is normalization? Why do we do it? How do we do it? And what
bad things can happen if we don’t do it? In this video, we’re going to explore database
normalization from a practical perspective. We’ll keep the jargon to a minimum,
and we’ll use lots of examples as we go. By the end of it, you’ll understand the so-called normal
forms from First Normal Form all the way up to Fifth Normal Form – and you’ll have a clear sense
of what we gain by doing normalization, and what we lose by failing to do it.
This is Decomplexify, bringing a welcome dose of simplicity to complex topics.
Data: it’s everywhere. And some of it is wrong.
By and large, even a good database design can’t protect against bad data.
But there are some cases of bad data that a good database design can protect against. These are
cases where the data is telling us something that logically cannot possibly be true:
One customer with two dates of birth is logically impossible. It’s what we might
call a failure of data integrity. The data can’t be trusted because it disagrees with itself.
When data disagrees with itself, that’s more than just a problem of bad data.
It’s a problem of bad database design.
Specifically, it’s what happens when a database design isn’t properly normalized.
So what does normalization mean? When you normalize a database table,
you structure it in such a way that can’t express redundant information.
So, for example, in a normalized table, you wouldn’t be able to give Customer 1001 two dates
of birth even if you wanted to. Very broadly, the table can only express one version of the truth.
Normalized database tables are not only protected from contradictory data, they’re also:
easier to understand easier to enhance and extend
protected from insertion anomalies, update anomalies,
and deletion anomalies (more on these later)
How do we determine whether a table isn’t normalized enough – in other words, how do
we determine if there’s a danger that redundant data could creep into the table? Well, it turns
out that there are sets of criteria we can use to assess the level of danger. These sets of criteria
have names like “first normal form”, “second normal form”, “third normal form”, and so on.
Think of these normal forms by analogy to safety assessments. We might imagine an engineer doing a
very basic safety assessment on a bridge. Let’s say the bridge passes the basic assessment,
which means it achieves “Safety Level 1: Safe for Pedestrian Traffic”.
That gives us some comfort, but suppose we want to know if cars can safely drive across the bridge?
To answer that question, we need the engineer to perform an even stricter assessment of the bridge.
Let’s imagine that the engineer goes ahead and does this stricter assessment, and again the
bridge passes, achieving “Safety Level 2: Safe for Cars”. If even this doesn’t satisfy us,
we might ask the engineer to assess the bridge for “Safety Level 3: Safe for Trucks.” And so on.
The normal forms of database theory work the same way.
If we discover that a table meets the requirements of first normal form,
that’s a bare minimum safety guarantee. If we further discover that the table meets
the requirements of second normal form, that’s an even greater safety guarantee. And so on.
So let’s begin at the beginning, with First Normal Form.
Suppose you and I are both confronted by this question:
“Who were the members of the Beatles?” You might answer “John, Paul, George, and Ringo”.
I might answer “Paul, John, Ringo, and George”. Of course, my answer and your answer are
equivalent, despite having the names in a different order.
When it comes to relational databases, the same principle applies. Let’s record the names of the
Beatles in a table, and then let’s ask the database to return those names back to us.
The results will get returned to us in an arbitrary order. For example, they might
get returned like this. Or like this.
Or in any other order. There is no “right” order. Are there ever situations where there’s a right
order? Suppose we write down the members of the Beatles from tallest to shortest,
like this. We title our list “Members Of The Beatles From Tallest To Shortest”.
In this list, it’s not just the names that convey meaning. The order of the names conveys
meaning too. Paul is the tallest, John is the second-tallest, and so on. Lists like this are
totally comprehensible to us – but they’re not normalized. Remember, there’s no such thing as row
order within a relational database table. So here we have our first violation of First Normal Form.
When we use row order to convey information, we’re violating First Normal Form.
The solution is very simple. Be explicit – if we want to capture height information, we should
devote a separate column to it – like this. Or even better, like this.
So far, we’ve seen one way in which a design can fail to achieve
First Normal Form. But there are others. A second way of violating First Normal Form
involves mixing data types. Suppose our Beatle_Height dataset looked like this.
If you’re accustomed to spreadsheets, you’ll be aware that they typically won’t stop you from
having more than one datatype within a single column – for example, they won’t stop you from
storing both numbers and strings in a column. But in a relational database, you’re not allowed to be
cagey or ambiguous about a column’s data type. The values that go in the Height_In_Cm column
can’t be a mix of integers and strings. Once you define Height_In_Cm as being an integer column,
then every value that goes into that column will be an integer – no strings, no timestamps,
no data types of any kind other than integers. So: mixing datatypes within a column
is a violation of First Normal Form, and in fact the database platform won’t even let you do it.
A third way of violating First Normal Form is by designing a table without a primary key. A primary
key is a column, or combination of columns, that uniquely identifies a row in the table.
For example, in the table Beatle_Height, our intention is that each row should tell
us about one particular Beatle, so we ought to designate “Beatle” as the primary key of the
Beatle_Height table. The database platform will need to know about our choice of primary key,
so we’ll want to get the primary key into the database by doing something like this.
With the primary key in place, the database platform will prevent multiple
rows for the same Beatle from ever being inserted. That’s a good thing,
because multiple rows for the same Beatle would be nonsensical, and perhaps contradictory.
Obviously, a Beatle can’t have two different heights at once.
Every table we design should have a primary key. If it doesn’t, it’s not in First Normal Form.
The last way of failing to achieve First Normal Form involves the notion of
“repeating groups”. Suppose we’re designing a database for an online multiplayer game.
At a given time, each player has a number of items of different types, like arrows,
shields, and copper coins. We might represent the situation like this.
A player’s inventory is what we call a “repeating group”. Each inventory contains potentially many
different types of items: arrows, shields, copper coins, and so on; and in fact there
may be hundreds of different types of items that a player might have in their inventory.
We could design a database table that represents the Inventory as a string of text:
But this is a terrible design because there’s no easy way of querying it.
For example, if we want to know which players currently have more than 10 copper coins,
then having the inventory data lumped together in a text string
will make it very impractical to write a query that gives us the answer.
We might be tempted to represent the data like this.
This lets us record up to 4 items per inventory. But given that a player can have an inventory
consisting of hundreds of different types of items, how practical is it going to be to design
a table with hundreds of columns? Even if we were to go ahead and create a super-wide table to hold
all possible inventory data, querying it would still be extremely awkward.
The bottom line is that storing a repeating group of data items on a single row violates First
Normal Form. So what sort of alternative design would respect First Normal Form?
It would be this. To communicate the fact that
trev73 owns 3 shields, we have a row for Player “trev73”, Item_Type “shields”, Item_Quantity 3.
To communicate the fact that trev73 also owns 5 arrows,
we have a row for Player “trev73”, Item_Type “arrows”, Item_Quantity 5. And so on.
And because each row in the table tells us about one unique combination of Player
and Item_Type, the primary key is the combination of Player and Item_Type.
So let’s review what we know about First Normal Form.
1. using row order to convey information is not permitted
2. mixing data types within the same column is not permitted
3. having a table without a primary key is not permitted
4. repeating groups are not permitted Next up: Second Normal Form.
Let’s look again at our Player Inventory table. This table is fully normalized. But suppose we
enhance the table slightly. Let’s imagine that every player has a rating: Beginner,
Intermediate, or Advanced. We want to record the current rating of each player – and to achieve
that, we simply include in our table an extra column called Player_Rating.
Notice what’s happening here. Player jdog21 has a Player_Rating of Intermediate,
but because jdog21 has two rows in the table, both those rows have to be marked Intermediate.
Player trev73 has a Player_Rating of Advanced,
but because trev73 has four rows in the table, all four of those rows have to be marked Advanced.
This is not a good design. Why not? Well, suppose player gila19 loses all her copper coins,
leaving her with nothing in her inventory. The single entry that she did have in the
Player_Inventory table is now gone. If we try to query the database to find
out what gila19’s Player Rating is, we’re out of luck. We can no longer access gila19’s Player
Rating because the database no longer knows it. This problem is known as a deletion anomaly.
And that’s not all. Suppose jdog21 improves his rating from Intermediate to Advanced.
To capture his new Advanced rating in the Player_Inventory table,
we run an update on his two records. But let’s imagine the update goes wrong.
By accident, only one of jdog21’s records gets updated, and the other record gets left alone.
Now the data looks like this. As far as the database is concerned,
jdog21 is somehow both Intermediate and Advanced at the same time.
Our table design has left the door open for this type of logical inconsistency.
This problem is called an update anomaly. Or suppose a new player called tina42 comes along.
She’s a Beginner and she doesn’t have anything in her inventory yet. We want to record the fact
that she’s a Beginner, but because she has nothing in her inventory, we can’t
insert a tina42 row into the Player_Inventory table. So her rating goes unrecorded. This
problem is known as an insertion anomaly. The reason our design is vulnerable to these
problems is that it isn’t in Second Normal Form. Why not? What is Second Normal Form?
Second Normal Form is about how a table’s non-key columns relate to the primary key. In our table,
the non-key columns – or to use slightly different terminology, non-key attributes – are
Item_Quantity and Player_Rating. They are columns (also called attributes), that don’t belong
to the primary key. As we saw earlier, the primary key is the combination of Player and Item Type.
Now we’re in a position to give a definition of Second Normal Form.
The definition we’re going to give is an informal one which leaves out some
nuances – but for most practical purposes, that shouldn’t matter.
Informally, what Second Normal Form says is that each non-key attribute in the table
must be dependent on the entire primary key. How does our table measure up to this definition?
Let’s examine our non-key attributes, which are the attributes Item_Quantity and Player_Rating.
Does Item_Quantity depend on the entire primary key? Yes, because an Item_Quantity is about a
specific Item_Type owned by specific Player. We can express it like this.
The arrow signifies a dependency – or to give it its proper name, a functional dependency.
This simply means that each value of the thing on the left side of the arrow is associated with
exactly one value of the thing on the right side of the arrow. Each combination of Player_ID and
Item_Type is associated with a specific value of Item_Quantity – for example the combination
of Player_ID jdog21 / Item_Type “amulets” is associated with an Item_Quantity of 2.
As far as Second Normal Form is concerned, this dependency is fine,
because it’s a dependency on the entire primary key. But what about the other dependency?
Does Player_Rating depend on the entire primary key? No, it doesn’t. Player_Rating is a property
of the Player only. In other words, for any given Player, there’s one Player_Rating.
This dependency on Player is the problem. It’s a problem because Player isn’t the
primary key – Player is part of the primary key, but it’s not the whole key.
That’s why the table isn’t in Second Normal Form, and that’s why it’s vulnerable to problems.
At what point did our design go wrong, and how can we fix it? The design went wrong
when we chose to add a Player_Rating column to a table where it didn’t really belong.
The fact that a Player_Rating is a property of a Player should have helped us to realise
that a Player is an important concept in its own right – so surely Player deserves its own table:
Nothing could be simpler than that. A Player table will contain one row per Player,
and in it we can include as columns the ID of the player, the rating of the player, as well
as all sorts of other properties of the player – maybe the player’s date of birth, for example,
maybe the player’s email address. Our other table, Player_Inventory, can stay as it was.
For both tables, we can say that there are no part-key dependencies.
In other words, it’s always the case that every attribute depends on the whole primary key,
not just part of it. And so our tables are in Second Normal Form.
Now let’s move on to Third Normal Form. Suppose we decide to enhance the Player table.
We decide to add a new column called Player_Skill_Level.
Imagine that in this particular multiplayer game, there’s a nine-point scale for skill level.
At one extreme, a player with skill level 1 is an absolute beginner;
at the opposite extreme, a player with skill level 9 is as skilful as it’s possible to be.
And let’s say that we’ve defined exactly how Player Skill Levels relate to Player Ratings.
“Beginner” means a skill level between 1 and 3. “Intermediate” means a skill
level between 4 and 6. And “Advanced” means a skill level between 7 and 9.
But now that both the Player_Rating and the Player_Skill_Level exist in the Player table,
a problem can arise. Let’s say that tomorrow, player gila19’s skill level increases from 3
to 4. If that happens, we’ll update her row in the Player table to reflect this new skill level.
By rights, we should also update her Player_Rating to Intermediate – but suppose something goes
wrong, and we fail to update the Player_Rating. Now we’ve got a data inconsistency. gila19’s
Player_Rating says she’s a Beginner, but her Player_Skill_Level implies she’s Intermediate.
How did the design allow this happen? Second Normal Form didn’t flag up any problems. There’s
no attribute here that depends only partially on the primary key – as a matter of fact,
the primary key doesn’t have any parts; it’s just a single attribute. And both Player_Rating
and Player_Skill_Level are dependent on it. But in what way are they dependent on it? Let’s
look more closely. Player_Skill_Level is dependent on Player_ID.
Player_Rating is dependent on Player ID too, but only indirectly – like this.
A dependency of this kind is called a transitive dependency. Player Rating depends on Player Skill
Level which in turn depends on the primary key: Player ID. The problem is located just
here – because what Third Normal Form forbids is exactly this type of dependency: the dependency of
a non-key attribute on another non-key attribute. Because Player Rating depends on Player Skill
Level – which is a non-key attribute – this table is not in Third Normal Form.
There’s a very simple way of repairing the design to get it into Third Normal Form.
We remove Player Rating from the Player table; so now the Player table looks like this.
And we introduce a new table called Player_Skill_Levels.
The Player Skill Levels table tells us everything we need to know about how to translate a player
skill level into a player rating. Third Normal Form is the culmination of everything
we’ve covered about database normalization so far. It can be summarised in this way: Every
non-key attribute in a table should depend on the key, the whole key, and nothing but the key.
If you commit this to memory, and keep it constantly in mind while you’re designing a
database, then 99% of the time you will end up with fully normalized tables.
It’s even possible to shorten this guideline slightly by knocking out the phrase
“non-key” – giving us the revised guideline: every attribute in a table should depend on the key, the
whole key, and nothing but the key. And this new guideline represents a slightly stronger flavor of
Third Normal Form known as Boyce-Codd Normal Form. In practice, the difference between Third Normal
Form and Boyce-Codd Normal Form is extremely small, and the chances of you ever encountering
a real-life Third Normal Form table that doesn’t meet Boyce-Codd Normal Form are almost zero.
Any such table would have to have what we call multiple overlapping candidate keys – which gets
us into realms of obscurity and theoretical rigor that are a little bit beyond the scope
of this video. So as a practical matter, just follow the guideline that every attribute in a
table should depend on the key, the whole key, and nothing but the key, and you can
be confident that the table will be in both Third Normal Form and Boyce-Codd Normal Form.
In almost all cases, once you’ve normalized a table this far, you’ve fully normalized
it. There are some instances where this level of normalization isn’t enough.
These rare instances are dealt with by Fourth and Fifth Normal Form.
So let’s move on to Fourth Normal Form. We’ll look at an example of a situation where Third
Normal Form isn’t quite good enough and something a bit stronger is needed. In our example, there’s
a website called DesignMyBirdhouse.com – the world’s leading supplier of customized birdhouses.
On DesignMyBirdhouse.com, customers can choose from different birdhouse models,
and, for the model they’ve selected, they can choose both a custom color
and a custom style. Each model has its own range of available colors and styles.
One way of capturing this information is to put it all the possible
combinations in a single table, like this. This table is in Third Normal Form. The primary
key consists of all three columns: {Model, Color, Style}. Everything depends on the key,
the whole key, and nothing but the key. And yet this table is still vulnerable
to problems. Let’s look at the rows for the birdhouse model “Prairie”:
The available colors for the “Prairie” birdhouse model are brown and beige.
Now suppose DesignMyBirdhouse.com decides to introduce a third available color for
the “Prairie” model: green. This will mean we’ll have to add two extra “Prairie” rows to the table:
one for green bungalow, and one for green schoolhouse.
If by mistake we only add a row for green bungalow, and fail to add the row for green
schoolhouse, then we have a data inconsistency. Available colors are supposed to be completely
independent of available styles. But our table is saying that a customer can choose
green only for the bungalow style, not for the schoolhouse style. That makes no sense.
The prairie birdhouse model is available in green, so all its styles should be available in green.
Something about the way the table is designed has allowed us to represent an impossible situation.
To see what’s gone wrong, let’s have a closer look at the dependencies among Models,
Colors, and styles. Can we say that Color has a functional dependency on Model?
Actually no, because a specific Model isn’t associated with just one Color.
And yet it does feel as though Color has some relationship to Model. How can we express it?
We can say that each Model has a specific set of available Colors. This kind of dependency is
called a multivalued dependency, and we express it with a double-headed arrow, like this:
And it’s equally true that each Model has a specific set of available Styles.
What Fourth Normal Form says is that the only kinds of multivalued dependency we’re allowed
to have in a table are multivalued dependencies on the key. Model is not the key; so the table
Model_Colors_And_Styles_Available is not in Fourth Normal Form.
As always, the fix is to split things out into multiple tables.
Now, if DesignMyBirdhouse.com expands the range of Prairie-Model colors to include green, we simply
add a row to the Model_Colors_Available table: And no anomalies are possible.
We’re now ready for Fifth Normal Form, the last normal form covered in this video.
For our Fifth Normal Form example, we imagine that there are three different brands of ice
cream available: Frosty’s, Alpine, and Ice Queen. Each of the three brands of ice cream
offers a different range of flavors: Frosty’s offers vanilla, chocolate,
strawberry, and mint chocolate chip Alpine offers vanilla and rum raisin
Ice Queen offers vanilla, strawberry, and mint chocolate chip
Now we ask our friend Jason what types of ice cream he likes.
Jason says: I only like vanilla and chocolate. And I only like the brands Frosty and Alpine.
We ask our other friend, Suzy, what types of ice cream she likes. Suzy says: I only like
rum raisin, mint chocolate chip, and strawberry. And I only like the brands Alpine and Ice Queen.
So, after a little bit of brainwork, we deduce exactly which ice cream products
Jason and Suzy are willing to eat; and we express this in a table:
But time passes, tastes change, and at some point Suzy announces that she now likes Frosty’s brand
ice cream too. So we need to update our table. It won’t come as any surprise that we might get
this update wrong. We might successfully add a row for Person Suzy – Brand Frosty’s – Flavor
Strawberry, but fail to add a row for Person Suzy – Brand Frosty’s – Flavor Mint Chocolate Chip.
And this outcome wouldn’t just be wrong – it would be logically inconsistent – because we’ve
already established that Suzy likes Frosty’s brand, and likes Mint Chocolate Chip flavor,
and therefore there’s no way she can dislike Frosty’s Mint Chocolate Chip.
In this example, we went wrong right at the beginning. At the beginning, we were given
three pieces of information. First, we were told which brands offered which flavors. Second, we
were told which people liked which brands. Third, we were told which people liked which flavors.
From those three pieces of information, we should have simply created three tables.
And that’s all we needed to do. All the facts of the situation have been represented.
If we ever want to know what specific products everyone likes,
we can simply ask the database platform, expressing our question in the form of a
piece of SQL that logically deduces the answer by joining the tables together.
To sum things up: if we want to ensure that a table that’s in Fourth Normal
Form is also in Fifth Normal Form, we need to ask ourselves whether the table can be
logically thought of as being the result of joining some other tables together.
If it can be thought of that way, then it’s not in Fifth Normal Form.
If it can’t be thought of that way, then it is in Fifth Normal Form.
We’ve now covered all the normal forms from First Normal Form to Fifth Normal Form. Let’s review,
keeping in mind that for a table to comply with a particular normal form, it must comply with
all the lower normal forms as well. The rules for first normal form are:
1. using row order to convey information is not permitted
2. mixing data types within the same column is not permitted
3. having a table without a primary key is not permitted
4. repeating groups are not permitted The rule for second normal form is:
Each non-key attribute in the table must be dependent on the entire primary key.
The rule for third normal form is: Each non-key attribute in a table must depend on the key,
the whole key, and nothing but the key. If we prefer to drop the phrase “non-key”, we end up
with an even simpler and even stronger version of third normal form called “Boyce-Codd Normal Form”:
Each attribute in a table must depend on the key, the whole key, and nothing but the key.
The rule for fourth normal form is that the only kinds of multivalued dependency
we’re allowed to have in a table are multivalued dependencies on the key.
Finally, the rule for Fifth Normal Form is: it must not be possible to describe
the table as being the logical result of joining some other tables together.
I hope you’ve found this video helpful. If you have any comments or questions
on what you’ve just seen, by all means put them in the comments section below.
And if you have any suggestions for other complex topics that you’d like to see explained
on Decomplexify, again let me know in the comments. So long, and thanks for watching!
浏览更多相关视频
Hunting Crypto Trading Bots Using Volume Seasonality
Data Analytics in Supply Chain Management Canada🍁 Tools to use🎒 Pay Range🚛Roles & Titles & Companies
WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...
CÓMO VOLAR DRONES FPV EN CIUDAD EN 2024
Python从入门到精通 第13集 数据类型 整数类型
The Supply Chain Sustainability Forum at Sustainability Live 2023
5.0 / 5 (0 votes)