Database Normalization 1NF 2NF 3NF

Jesper Lowgren
2 Jul 202110:25

Summary

TLDRIn this video, Jesper dives into data normalization, a key concept in data architecture and digital transformation. He explains how normalization organizes structured data to enhance automation, analytics, and AI, while also contrasting it with unstructured data. Jesper introduces normalization’s core rules, focusing on the first three normal forms, and demonstrates how they transform messy data into structured tables. By breaking down complex concepts like primary keys, foreign keys, and cardinality, this video offers a practical introduction to deeper data understanding. Perfect for viewers interested in data modeling and relational databases.

Takeaways

  • 📊 Data normalization is a crucial aspect of understanding structured data, making it essential for automation, analytics, and AI.
  • 🔍 Data normalization helps connect structured data and provides insights into unstructured data, like information in spreadsheets or online sources.
  • 📐 The relational model introduced by Edgar Codd in 1970 is a systematic way to organize and maintain data using mathematical rules.
  • 🗃️ Normalization consists of five forms, with third normal form being the most commonly used in practice.
  • 🔑 First normal form (1NF) ensures that each cell contains a single value, each row is unique, and there are no repeating groups in a dataset.
  • 🔑 Second normal form (2NF) states that all data must depend on the primary key and any columns not depending on the primary key should be split into separate tables.
  • 🔑 Third normal form (3NF) requires that non-key columns must be fully dependent on the primary key and not on any other column.
  • 🔀 Foreign keys are created to link tables, ensuring relationships between entities, like employee IDs and skill IDs.
  • 🧮 Data normalization simplifies complex datasets into organized, relational tables, allowing for clearer relationships and data integrity.
  • 🎓 The video focuses on normalizing data up to third normal form, transforming an unnormalized table into four well-structured normalized tables.

Q & A

  • What is data normalization?

    -Data normalization is the process of organizing data in a database by reducing redundancy and ensuring that data relationships are maintained. It typically involves structuring data into forms that allow for efficient storage and retrieval.

  • How does normalization relate to structured data?

    -Normalization is a way to manage and organize structured data, making it easier to connect and analyze. Structured data, often stored in tables, can be normalized to ensure that relationships between data points are preserved and can be used for automation, analytics, and artificial intelligence.

  • What is the role of 'cardinality' in data normalization?

    -Cardinality in data normalization refers to the nature of relationships between different data sets, such as one-to-one, one-to-many, or many-to-many. Understanding cardinality is crucial in the process of connecting data in a meaningful and efficient way.

  • What are the five rules of normalization mentioned in the video?

    -The five rules of normalization, as proposed by Dr. Edgar Codd, start with the first normal form (1NF) and end with the fifth normal form (5NF). Each form introduces stricter rules for data organization, with third normal form (3NF) being the most commonly used in practice.

  • What is the focus of third normal form (3NF)?

    -Third normal form (3NF) ensures that all non-primary key columns are fully dependent on the primary key. It eliminates transitive dependencies, meaning that non-key attributes cannot depend on other non-key attributes.

  • What is an example of first normal form (1NF)?

    -In first normal form (1NF), each cell in a table must contain only one value, and each row must be unique. For example, if a table lists employees and their skills, the skills should be split into separate columns to comply with 1NF.

  • How is second normal form (2NF) different from 1NF?

    -Second normal form (2NF) builds on 1NF by ensuring that all non-primary key attributes depend entirely on the primary key. If any attributes are only partially dependent on the primary key, they need to be moved into a separate table.

  • What is the significance of a primary key in normalization?

    -A primary key uniquely identifies each row in a table and plays a crucial role in normalization. It ensures that data is organized in a way that maintains uniqueness and facilitates relationships between different tables through foreign keys.

  • What is a foreign key, and how is it used in data normalization?

    -A foreign key is a column or set of columns in one table that refers to the primary key in another table. In normalization, foreign keys establish relationships between tables, allowing data to be connected across multiple normalized tables.

  • Why is normalization typically focused on up to third normal form (3NF)?

    -Normalization up to third normal form (3NF) is sufficient for most practical applications, as it ensures that the data is well-organized and free of redundancy. The latter two forms (4NF and 5NF) handle more complex exceptions, but are rarely needed in everyday database management.

Outlines

00:00

🔍 Introduction to Data Normalization

Jesper introduces the concept of data normalization, emphasizing its significance in data architecture and digital transformation. He explains that data normalization is both mathematical and philosophical in nature. The focus is on structured data and its relationship with automation, analytics, and artificial intelligence. Jesper contrasts structured and unstructured data, which includes spreadsheets and online-generated data. He emphasizes that data normalization helps deepen our understanding of data connections, using data modeling to explain relationships. He also teases the concept of 'cardinality,' to be covered in a future video.

05:00

🔑 First Normal Form: Breaking Down Data

Jesper dives into the first normal form of data normalization. He explains the need to split data in spreadsheets or tables into atomic values and ensure each row is uniquely identified by a primary key. This is crucial for avoiding redundant or repeated data. He provides an example using employee skills data, where columns are separated into skill ID and skill name, ensuring each column is unique. Furthermore, repeating groups are moved into new tables. By following these rules, Jesper demonstrates how data achieves the first normal form, which is the foundation for further normalization.

10:04

📊 Second and Third Normal Forms: Refining Data Relationships

Jesper explains how second normal form extends from the first by ensuring all data depends on the primary key. He shows how skill name and skill ID, while related to each other, do not depend on the employee ID, prompting the creation of a new table. Foreign keys are introduced to link related data across multiple tables. The third normal form, which further refines the structure, ensures no column depends on any key other than the primary key. He explains that in some cases, such as job names not depending on employee ID, additional tables are needed. Ultimately, third normal form breaks one unnormalized spreadsheet into four well-structured tables.

👍 Conclusion and Call to Action

Jesper concludes by summarizing the process of normalizing data to the third normal form. He emphasizes the importance of creating well-structured, normalized tables that prevent redundancy and ensure data integrity. The video closes with an invitation to viewers to like and subscribe for more content on data architecture and transformation.

Mindmap

Keywords

💡Data Normalization

Data normalization refers to organizing data in a structured format, reducing redundancy, and ensuring data integrity. In the video, it is described as a systematic approach for connecting data, which is essential for analytics, automation, and AI. The process moves from unstructured data (e.g., spreadsheets) to structured data by applying rules, resulting in cleaner, more usable data models.

💡Structured Data

Structured data refers to data that is organized in a predefined manner, usually within relational databases. It consists of rows and columns, making it easier to analyze and connect. In the video, the speaker explains how normalization is applied to structured data, such as tables or spreadsheets, to establish meaningful relationships between datasets.

💡Unstructured Data

Unstructured data is information that doesn't follow a specific format or structure, such as data found in spreadsheets, social media, or other internet-generated content. The video contrasts unstructured data with structured data, emphasizing that normalization is more applicable to the latter.

💡Relational Model

The relational model, introduced by Edgar Codd in 1970, is a mathematical approach to managing and connecting data based on predefined rules. It forms the basis of modern relational databases like MySQL, Oracle, and others. In the video, the relational model is highlighted as a key historical breakthrough in data management, facilitating data normalization.

💡Primary Key

A primary key is a unique identifier for each row in a table, ensuring that no two rows are identical. It plays a crucial role in all stages of data normalization. In the video, the speaker demonstrates how primary keys are used to uniquely identify data, such as employee IDs in a table of employee information.

💡Foreign Key

A foreign key is a column or group of columns in a table that links to the primary key of another table, establishing a relationship between the two tables. In the video, foreign keys are introduced when the speaker discusses the relationships between employees, skills, and jobs, where employee ID links different tables.

💡First Normal Form (1NF)

First Normal Form (1NF) ensures that each column contains atomic values, meaning no cell can have more than one value, and each row must be unique. In the video, the speaker explains this by referencing a spreadsheet where cells containing both skill ID and skill name are split into separate columns.

💡Second Normal Form (2NF)

Second Normal Form (2NF) builds on 1NF and requires that all data in a table must depend on the primary key. This ensures that there are no partial dependencies. In the video, the speaker explains how columns that don’t depend on the primary key are moved to their own table, such as skill name being separated from the employee table.

💡Third Normal Form (3NF)

Third Normal Form (3NF) ensures that all non-key columns in a table are only dependent on the primary key and not on any other non-key columns. The video walks through how the job name in the employee table violates this rule and is split into a new table to comply with 3NF.

💡Cardinality

Cardinality refers to the type and nature of relationships between data sets. In the video, the speaker refers to cardinality as the 'alphabet' of data, explaining that understanding the relationships between datasets is fundamental to proper data normalization and relational database design.

Highlights

Introduction to data normalization in the context of digital transformation and structured data.

Data normalization gives a deeper understanding of structured data and how to connect it for automation, analytics, and AI.

Normalization helps in understanding relationships between different types of data, including structured and unstructured data.

Data modeling serves as a tool for understanding how data connects and how these connections tell a story.

Introduction to the concept of cardinality, which will be covered in detail in a future video.

Differentiating between process-based thinking (workflows) and data, which describes 'who we are' versus 'what we do.'

Edgar Codd's contribution to the relational model in 1970, revolutionizing how data is connected based on mathematical rules.

Normalization is a process that follows five rules to structure data, with a focus on relationships between data sets.

The first normal form (1NF) focuses on atomic values, unique identifiers, and eliminating repeating groups of data.

The second normal form (2NF) builds on 1NF by ensuring that all data depends on the primary key.

In 2NF, any column that doesn't depend on the primary key must be moved to a new table.

The third normal form (3NF) ensures that all non-key columns are dependent only on the primary key and not on any other keys.

In 3NF, data that violates the form must be split into additional tables for proper normalization.

By the end of normalization to 3NF, one unnormalized table has been transformed into four normalized tables.

Summary of the normalization process, emphasizing the importance of primary keys and relationships between tables in structured data.

Transcripts

play00:00

Hi it's Jesper here i make data architecture  in digital transformation videos on youtube

play00:15

today i'm going to unpack data normalization  the language of data data normalization is  

play00:22

both mathematics and philosophy and i think you  will get a sense for this as the video progresses  

play00:29

doesn't explain everything but it gives us a  deeper understanding of one particular kind of  

play00:35

data which is called the structure data and how  to connect structured data and do more with it  

play00:41

such as automation analytics prediction artificial  intelligence all those fun and good things and  

play00:48

coincidentally or perhaps not it also gives us an  insight into the other side of data unstructured  

play00:55

data the things that sit in spreadsheets the  things that being generated on the internet  

play01:00

put simply it's a perfect starting point for a  greater and deeper understanding of data and how  

play01:06

data works and how data connects and what you  potentially can do with it it uses a language  

play01:14

data modeling to show how data is connected and  the nature of these connections or relationships  

play01:21

to tell a story this language is radically  different to the language we normally use  

play01:28

it even has its own alphabet called cardinality  but this will be covered in a separate video  

play01:35

we're used to process based thinking such as  planning we use processes and process flows  

play01:44

workflows arrows etc to depict things in  business how we do things the steps and the  

play01:49

sequences of achieving something in life  we often describe ourselves as a process  

play01:55

if we ask that the partly to describe ourselves  we often describe what we do not who we are  

play02:03

now it's getting philosophical a process describes  what we do data describes who we are data  

play02:11

can exist without the process whereas the process  must have data to exist you could say that data  

play02:19

is persistent whereas the process is not that  raises the question so we thought the process  

play02:26

who are we very philosophical and certainly  worthy of a serious dna conversation  

play02:36

but edgar court entered the scene and he  wanted more than great in the conversations  

play02:42

he reduced data in data relationships into  mathematics and in 1970 he released the relational  

play02:49

model which is a systematic approach of connecting  and maintaining data based on mathematical rules  

play02:57

technology companies like oracle ibm microsoft  amazon google used his relational model to  

play03:05

create their own relational databases popular open  source databases like mysql are also based on it  

play03:14

but that's all technology let's forget about  the technology for now doctor cod provides five  

play03:21

rules to normalize data where each rule builds  on the other starting this first normal form  

play03:28

and ending with fifth normal form normalization  is a gateway into deeper data understanding  

play03:36

because it addresses the thing that give data most  meaning which is its relationships with other data  

play03:43

the magic of data lies in its relationships  and types of relationships called cardinality  

play03:51

put simply normalization is about connecting  data in the right way the first three rules of  

play04:00

normalization are about core basics whereas  the latter two deal with exceptions hence  

play04:07

for practical reasons normalization  typically refers to third normal form  

play04:13

and remember to be in third normal form it  must also be in first and second normal form  

play04:21

so the focus today is normalization  up to third normal form

play04:30

first normal form is about atomic  values and unique identifiers  

play04:35

let's say we want to model  employees and their skills  

play04:39

and we have been handed this spreadsheet  of data with the task of normalizing it  

play04:45

i've used spreadsheets as an example to make  it easier to understand but the correct tool is  

play04:51

either to use table or entity first normal form  specifies that the following actions need to be  

play05:00

taken on the data number one each cell may never  contain more than one value for example a cell  

play05:10

cannot contain both skill id and skill name as  a result we need to split into separate columns  

play05:25

number two each row must be unique that  is one column or a combination of columns  

play05:33

must be able to uniquely identify the row this is  called the primary key in this example name and  

play05:42

address would be a potential primary key yet often  the primary key is system generated in our case we  

play05:50

will add a computer generated primary key the  primary key is of great importance and features  

play05:57

prominently in all other normalization rules three  it also means that each column name must be unique  

play06:07

and in this case we need to rename our skill  columns to make them unique and four there  

play06:15

must be no repeating groups repeating groups are  removed and put into a new spreadsheet or table

play06:27

now we have two spreadsheets or tables with  nice rows of data it is uniquely identified  

play06:35

each has no more than one value in each cell and  there are no repeating group yay welcome to first  

play06:41

normal form but the fun doesn't stop here second  normal form enforces new rules and states that  

play06:52

all data must depend on the primary key so let's  first examine spreadsheet one name and address and  

play07:02

job names are all related to employee id so it's  already in second normal form yay that's great  

play07:10

but what about the second spreadsheet skill  name relates to skill id but not to employee id  

play07:21

second normal form stipulates that any column  that don't depend on the whole primary key must  

play07:27

be split into its own spreadsheet or table so  we need to create one more called employee skill

play07:37

a primary key that links to other spreadsheets  or tables are also called a foreign key  

play07:44

so in this case employee id is the foreign key of  employee and skill id is a foreign key of skill  

play07:55

now i have three spreadsheets  or tables with nice rows of data  

play07:59

but each column depends on whole primary  key yay again welcome to second normal form

play08:10

but dr codd still wasn't happy it introduced  a set of type 2 rules called third normal form  

play08:19

third normal form also focuses on the primary key  and states that the primary key must fully define  

play08:27

all columns and columns might not depend on any  other key so let's examine our spreadsheets again  

play08:36

in skills skill id defines skill name and  skill name does not relate to any other key  

play08:43

so it's satisfied third normal form in employee  skills employee id and skill id has no other  

play08:53

columns and hence satisfied third normal form  in employee employee id defines name and address  

play09:02

and name and address do not relate to any  other key and hence satisfied third normal form  

play09:10

but employee id does not define job  name hence violating third normal form  

play09:18

this means that job name needs to be split  into its own spreadsheet and table and for  

play09:24

consistency we have created a computer-generated  job id because the job id links employee and job  

play09:33

we need to create a new column job id in  employee as we discussed in second normal form  

play09:41

any primary key that links spreadsheets  or tables also become a foreign key  

play09:48

now we have four spreadsheets or tables with  nice rows of data but a primary key defines  

play09:54

each non-key column welcome to third normal form  in summary third normal form has transformed  

play10:04

one unnormalized spreadsheet or table  into four normalized spreadsheets  

play10:11

i hope this has explained normalization and  how to normalize data to third normal form  

play10:17

and if you enjoyed this video please hit like  and subscribe hope to see you in my next video

Rate This

5.0 / 5 (0 votes)

Étiquettes Connexes
Data NormalizationRelational ModelThird Normal FormStructured DataDatabase DesignData ModelingAI IntegrationAutomationData ArchitectureDigital Transformation
Besoin d'un résumé en anglais ?