Data Structures: Crash Course Computer Science #14

CrashCourse
31 May 201710:06

Summary

TLDR这段视频脚本介绍了计算机科学中使用的基本数据结构,包括数组、字符串、矩阵、结构体、链表、队列、栈、树和图。数组是存储在内存中的一系列值,可以通过索引访问。字符串是字符的数组,以空字符结尾。矩阵是数组的数组,用于存储二维数据。结构体允许将多个变量捆绑在一起存储。链表是一种灵活的数据结构,可以通过改变节点的指针来动态地添加或删除节点。队列遵循先进先出(FIFO)的原则,而栈遵循后进先出(LIFO)的原则。树是一种层级结构,其中每个节点可能有多个子节点。图是一种更为通用的数据结构,允许任意节点之间的连接。不同的数据结构适用于不同的计算任务,选择正确的数据结构可以大大简化编程工作。大多数编程语言都提供了内置的数据结构库,如C++的标准模板库和Java的类库,这使得程序员可以不必从头开始实现这些结构,而是利用它们进行更高层次的抽象和更有趣的编程工作。

Takeaways

  • 📚 数据结构对于算法的执行至关重要,它们决定了数据在计算机内存中的存储方式。
  • 🔍 数组是基本的数据结构,可以存储一系列值,并通过索引来访问特定的值。
  • 🔑 字符串实际上是字符的数组,以null字符(二进制0)结束,以标识字符串在内存中的结束。
  • 🖇️ 矩阵可以视为数组的数组,用于存储二维数据,如电子表格中的网格或屏幕上的像素。
  • 🏢 结构体(Struct)允许将相关的变量组合在一起存储,如银行账户号和余额。
  • 🔗 指针是一种特殊变量,指向内存中的一个位置,用于创建链表等灵活的数据结构。
  • 🔄 链表是一种可以动态扩展或缩短的数据结构,通过改变节点的指针来添加或删除元素。
  • 📈 队列(Queue)和栈(Stack)是基于链表的更复杂的数据结构,分别遵循FIFO和LIFO原则。
  • 🌳 树是一种数据结构,由节点组成,每个节点有零个或多个子节点,用于表示层次关系。
  • 🔍 二叉树是特殊的树,每个节点最多有两个子节点,常用于各种算法中。
  • 🔗 图(Graph)是一种更为通用的数据结构,允许节点之间有任意的连接,包括循环。
  • 🛠️ 编程语言通常提供内置的或库中的数据结构,如C++的STL和Java的JCL,以简化实现复杂数据结构的工作。

Q & A

  • 数据结构在计算机科学中的作用是什么?

    -数据结构在计算机科学中用于组织和存储数据,以便数据可以被高效地检索和读取。它们允许算法以一种结构化的方式运行,从而提高程序的性能和效率。

  • 数组(Array)在内存中是如何存储的?

    -数组在内存中是连续存储的。它们是一系列值,按照内存地址的连续性存放。例如,一个包含7个数字的数组可能被存储在内存位置1000开始的地方,这些数字一个接一个地存储在连续的内存地址中。

  • 如何通过索引访问数组中的特定值?

    -通过指定索引来访问数组中的特定值。几乎所有的编程语言都从索引0开始计数,使用方括号语法来表示数组访问。例如,要访问数组中索引为5的元素,程序会跳转到内存位置1000加上5的偏移量。

  • 字符串(String)是如何存储在内存中的?

    -字符串实际上是字符的数组。它们在内存中存储时,每个字符占用一个位置,字符串以一个特殊的null字符(二进制值0)结束,这表示字符串的结束。

  • 矩阵(Matrix)可以如何表示?

    -矩阵可以被看作是数组的数组。例如,一个3x3的矩阵实际上是一个包含3个元素的数组,每个元素本身也是一个包含3个元素的数组。

  • 如何通过索引访问矩阵中的特定值?

    -访问矩阵中的特定值需要指定两个索引。例如,要访问矩阵中索引为(2,1)的元素,意味着要查找子数组2中索引为1的元素。

  • 结构体(Struct)在数据存储中有什么作用?

    -结构体允许将相关的变量组合在一起存储。它们可以创建复合数据结构,能够一次性存储多个数据片段,如银行账户的账号和余额。

  • 链表(Linked List)与数组相比有什么优势?

    -链表是一种灵活的数据结构,可以通过改变节点的指针来动态地扩展或缩短。与数组相比,链表可以轻松地插入、删除、排序、分割和反转,非常适合需要频繁变动的数据结构。

  • 队列(Queue)和栈(Stack)是如何基于链表实现的?

    -队列遵循先进先出(FIFO)的原则,像邮局排队一样,通过改变链表的头指针来实现入队(enqueue)和出队(dequeue)。栈遵循后进先出(LIFO)的原则,像堆叠的煎饼,通过压栈(push)和弹栈(pop)操作来管理数据。

  • 树(Tree)和图(Graph)数据结构的主要区别是什么?

    -树是一种有层级结构的数据结构,其中每个节点除了有一个父节点外,还可以有多个子节点,但树中的节点没有循环。而图是一种更为通用的数据结构,允许任意两个节点之间的连接,包括循环。

  • 为什么选择合适的数据结构对于编程很重要?

    -选择合适的数据结构可以显著提高程序的性能和开发效率。不同的数据结构有不同的特性,适用于不同的计算需求。大多数编程语言都提供了丰富的数据结构库,如C++的Standard Template Library和Java的Java Class Library,使得程序员可以不必从零开始实现数据结构。

Outlines

00:00

📚 数据结构基础

Carrie Anne 在这一段落中介绍了计算机科学中数据结构的重要性和基本概念。她首先提到了算法运行时数据存储的问题,并强调了数据组织的必要性。接着,她回顾了数组(或称为列表、向量)这一基本数据结构,解释了数组是如何在内存中存储一系列值的,并介绍了如何通过索引来访问数组中的特定值。此外,她还提到了字符串、矩阵、结构体(Struct)等数据结构,以及它们在内存中的存储方式和使用场景。最后,她讨论了如何使用结构体构建更复杂的数据结构,如链表,以及链表的灵活性和如何通过改变指针来动态地扩展或缩短链表。

05:03

🔗 链表、队列、栈和树

在第二段落中,Carrie Anne 深入探讨了链表的工作原理,包括它们是如何通过指针连接各个节点来形成一个灵活的数据结构。她解释了链表的动态特性,如如何通过改变指针来插入新节点或重新排序链表。此外,她还介绍了队列和栈这两种基于链表的更复杂的数据结构,包括它们的FIFO(先进先出)和LIFO(后进先出)原则。最后,她讨论了树结构,包括二叉树的概念,以及如何通过增加指针来构建更复杂的树结构。她还提到了图结构,用于表示任意连接的数据。最后,她强调了选择合适的数据结构对于简化编程工作的重要性,并提到了大多数编程语言都提供了内置的数据结构库,如C++的Standard Template Library和Java的Java Class Library。

Mindmap

Keywords

💡数据结构

数据结构是计算机科学中用于组织和存储数据的一种方式,以便可以有效地访问和修改。在视频中,数据结构被描述为一种避免数据像约翰·格林大学宿舍那样混乱无序的方法,而是以一种有序的方式组织数据,从而方便数据的检索和读取。数据结构是编程和算法实现的基础,因为它们提供了数据存储和访问的有效机制。

💡数组

数组是一种基本的数据结构,由一系列存储在连续内存位置的值组成。在视频中,数组被提及为一种可以存储一系列数值的变量,而不是单个值。数组通过索引来访问特定的元素,几乎所有的编程语言都从索引0开始。例如,视频脚本中提到,要将数组中第一个和第三个位置的数值相加,可以使用特定的语法来实现。

💡字符串

字符串是字符的数组,可以包括字母、数字、标点符号和其他书写符号。在视频中,字符串被描述为存储在内存中的字符序列,通常用引号括起来。字符串在内存中以null字符(二进制值0)结束,这表示字符串的结束。字符串在计算机中非常常见,因此存在许多专门处理字符串的函数,如字符串连接函数(strcat)。

💡矩阵

矩阵可以被视为数组的数组,用于存储二维数据,如电子表格中的数字网格或计算机屏幕上的像素。在视频中,一个3x3的矩阵实际上是一个包含3个大小为3的数组的数组。在内存中,矩阵的元素是按顺序打包的。访问矩阵中的值需要指定两个索引,例如'J index of 2, then index, of 1',这告诉计算机你正在查找子数组2中位置1的项目。

💡结构体

结构体是一种数据结构,允许将多个相关的变量捆绑在一起存储。在视频中,结构体被用来存储如银行账户号码和余额等相关信息。结构体可以创建不是单个数字的变量,而是复合数据结构,能够一次性存储多块数据。结构体可以用于构建更复杂的数据结构,如链表,它们避免了数组的一些限制,如固定大小和必须在内存中按顺序存储。

💡链表

链表是一种灵活的数据结构,可以通过指针将多个节点存储在一起。每个节点包含一个变量和一个指向内存中下一个节点位置的指针。在视频中,链表被描述为可以通过改变指针来动态扩展或缩短的数据结构。链表可以轻松地重新排序、裁剪、分割和反转,这使得它们对于排序等算法非常有用。

💡队列

队列是一种遵循先进先出(FIFO)原则的数据结构,就像邮局排队一样,等待时间最长的人首先得到服务。在视频中,队列被解释为使用指针指向链表中的第一个节点,当服务完成后,通过读取下一个节点的指针并更新队列指针来实现出队操作。添加新元素到队列被称为入队。

💡

栈是一种后进先出(LIFO)的数据结构,类似于堆叠的煎饼,你将煎饼添加到栈顶,并且当你想吃一个时,你从栈顶取一个。在视频中,栈通过将数据压入(push)和弹出(pop)来操作。与队列不同,栈的操作是从栈顶进行的。

💡

树是一种数据结构,由一系列节点组成,每个节点有零个或多个子节点,并且有一个父节点,除了根节点。在视频中,树被描述为具有层次结构的节点集合,其中根节点位于顶部,叶节点位于底部,没有任何子节点。树结构在许多算法中使用,如二叉树,它允许节点有两个孩子,并且可以通过链接列表来存储所有节点。

💡

图是一种数据结构,用于表示复杂的关系,其中节点可以指向任何其他节点,与树不同,图中没有根节点、叶节点或父节点的概念。在视频中,图被提及为可以存储为具有多个指针的节点集合,类似于树,但是图中的节点可以自由地指向任何其他节点,包括循环。图结构适用于表示道路连接城市等场景。

💡标准库

标准库是大多数编程语言提供的预先构建的数据结构集合。在视频中,提到了C++的标准模板库(STL)和Java的Java类库(JCL),这些库允许程序员不必从头开始实现数据结构,而是可以直接使用这些结构来执行更有趣的任务。标准库的使用提高了编程效率,并允许开发者在更高的抽象层次上工作。

Highlights

计算机科学中,数据结构用于存储算法运行所需的数据,以保持数据的有序性,便于检索和读取。

数组(Array)是一种基本的数据结构,它允许存储一系列值,并通过索引来访问特定的值。

几乎所有编程语言都从索引0开始数组,并使用方括号语法来表示数组访问。

数组在内存中的存储是连续的,可以通过内存地址和偏移量来访问数组中的元素。

字符串(String)实际上是字符的数组,以null字符(二进制值0)结束,表示字符串在内存中的结束。

矩阵(Matrix)可以视为数组的数组,用于存储二维数据,如电子表格中的数字网格或计算机屏幕上的像素。

结构体(Struct)允许将相关的变量组合在一起存储,例如将银行账户号码和余额捆绑在一起。

链表(Linked List)是一种灵活的数据结构,可以通过改变节点的指针来动态地扩展或缩短。

队列(Queue)是一种遵循先进先出(FIFO)原则的数据结构,类似于邮局的排队系统。

栈(Stack)是一种后进先出(LIFO)的数据结构,数据被压入栈顶并从栈顶弹出。

树(Tree)是一种数据结构,其中每个节点可以有零个或多个子节点,顶层节点称为根节点,没有子节点的称为叶节点。

二叉树是一种特殊类型的树,其中每个节点最多有两个子节点。

图(Graph)是一种数据结构,用于表示复杂的关系,其中节点可以指向任何其他节点,没有根和叶的概念。

不同的数据结构具有不同的属性,适用于特定的计算任务,选择合适的数据结构可以大大简化工作。

大多数编程语言都带有包含现成数据结构的库,如C++的标准模板库(STL)和Java的类库(JCL)。

程序员可以利用数据结构的抽象能力,进行更高级别的操作,而不必从零开始实现数据结构。

Transcripts

play00:03

Hi, I'm Carrie Anne, and welcome to Crash Course Computer Science!

play00:05

Last episode, we discussed a few example classic algorithms, like sorting a list of numbers

play00:09

and finding the shortest path in a graph.

play00:11

What we didn’t talk much about, is how the data the algorithms ran on was stored in computer

play00:16

memory.

play00:16

You don’t want your data to be like John Green’s college dorm room, with food, clothing

play00:19

and papers strewn everywhere.

play00:21

Instead, we want our data to be structured, so that it’s organized, allowing things

play00:25

to be easily retrieved and read.

play00:27

For this, computer scientists use Data Structures!

play00:29

INTRO

play00:39

We already introduced one basic data structure last episode, Arrays, also called lists or

play00:43

Vectors in some languages.

play00:45

These are a series of values stored in memory.

play00:47

So instead of just a single value being saved into a variable, like ‘j equals 5’, we

play00:51

can define a whole series of numbers, and save that into an array variable.

play00:55

To be able to find a particular value in this array, we have to specify an index.

play00:59

Almost all programing languages start arrays at index 0, and use a square bracket syntax

play01:04

to denote array access.

play01:05

So, for example, if we want to add the values in the first and third spots of our array

play01:10

‘j’, and save that into a variable ‘a’, we would write a line of code like this.

play01:14

How an array is stored in memory is pretty straightforward.

play01:16

For simplicity, let’s say that the compiler chose to store ours at memory location 1,000.

play01:21

The array contains 7 numbers, and these are stored one after another in memory, as seen here.

play01:26

So when we write “j index of 0”, the computer goes to memory location 1,000, with an offset

play01:31

of 0, and we get the value 5.

play01:33

If we wanted to retrieve “j index of 5”, our program goes to memory location 1000,

play01:38

plus an offset of 5, which in this case, holds a value of 4.

play01:42

It’s easy to confuse the fifth number in the array with the number at index 5.

play01:45

They are not the same.

play01:46

Remember, the number at index 5 is the 6th number in the array because the first number

play01:50

is at index 0.

play01:51

Arrays are extremely versatile data structures, used all the time, and so there are many functions

play01:55

that can handle them to do useful things.

play01:57

For example, pretty much every programming language comes with a built-in sort function,

play02:01

where you just pass in your array, and it comes back sorted.

play02:04

So there’s no need to write that algorithm from scratch.

play02:06

Very closely related are Strings, which are just arrays of characters, like letters, numbers,

play02:11

punctuation and other written symbols.

play02:13

We talked about how computers store characters way back in Episode 4.

play02:16

Most often, to save a string into memory, you just put it in quotes, like so.

play02:20

Although it doesn’t look like an array, it is.

play02:23

Behind the scenes, the memory looks like this.

play02:24

Note that the string ends with a zero in memory.

play02:27

It’s not the character zero, but the binary value 0.

play02:31

This is called the null character, and denotes the end of the string in memory.

play02:34

This is important because if I call a function like “print quote”, which writes the string

play02:38

to the screen, it prints out each character in turn starting at the first memory location,

play02:42

but it needs to know when to stop!

play02:44

Otherwise, it would print out every single thing in memory as text.

play02:47

The zero tells string functions when to stop.

play02:50

Because computers work with text so often, there are many functions that specifically

play02:54

handle strings.

play02:55

For example, many programming languages have a string concatenation function, or “strcat”,

play02:59

which takes in two strings, and copies the second one to the end of the first.

play03:03

We can use arrays for making one dimensional lists, but sometimes you want to manipulate

play03:07

data that is two dimensional, like a grid of numbers in a spreadsheet, or the pixels

play03:11

on your computer screen.

play03:12

For this, we need a Matrix.

play03:14

You can think of a Matrix as an array of arrays!

play03:17

So a 3 by 3 matrix is really 2 an array of size 3, with each index storing an array of

play03:21

size 3.

play03:23

We can initialize a matrix like so.

play03:24

In memory, this is packed together in order like this.

play03:27

To access a value, you need to specify two indexes, like “J index of 2, then index

play03:32

of 1” - this tells the computer you’re looking for the item in subarray 2 at position 1.

play03:37

And this would give us the value 12.

play03:39

The cool thing about matrices is we’re not limited to 3 by 3 -- we can make them any

play03:42

size we want -- and we can also make them any number of dimensions we want.

play03:46

For example, we can create a five dimensional matrix and access it like this.

play03:50

That’s right, you now know how to access a five dimensional matrix -- tell your friends!

play03:54

So far, we’ve been storing individual numbers or letters into our arrays or matrices.

play03:59

But often it’s useful to store a block of related variables together.

play04:02

Like, you might want to store a bank account number along with its balance.

play04:05

Groups of variables like these can be bundled together into a Struct.

play04:09

Now we can create variables that aren’t just single numbers, but are compound data

play04:12

structures, able to store several pieces of data at once.

play04:16

We can even make arrays of structs that we define, which are automatically bundled together

play04:20

in memory.

play04:21

If we access, for example, J index of 0, we get back the whole struct stored there, and

play04:25

we can pull the specific account number and balance data we want.

play04:28

This array of structs, like any other array, gets created at a fixed size that can’t

play04:32

be enlarged to add more items.

play04:34

Also, arrays must be stored in order in memory, making it hard to add a new item to the middle.

play04:38

But, the struct data structure can be used for building more complicated data structures

play04:42

that avoid these restrictions.

play04:44

Let’s take a look at this struct that’s called a “node”.

play04:46

It stores a variable, like a number, and also a pointer.

play04:49

A pointer is a special variable that points, hence the name, to a location in memory.

play04:53

Using this struct, we can create a linked list, which is a flexible data structure that

play04:57

can store many nodes.

play04:59

It does this by having each node point to the next node in the list.

play05:02

Let’s imagine we have three node structs saved in memory, at locations 1000, 1002 and 1008.

play05:09

They might be spaced apart, because they were created at different times, and other data

play05:13

can sit between them.

play05:14

So, you see that the first node contains the value 7, and the location 1008 in its “next”

play05:19

pointer.

play05:20

This means that the next node in the linked list is located at memory location 1008.

play05:24

Looking down the linked list, to the next node, we see it stores the value 112 and points

play05:28

to another node at location 1002.

play05:31

If we follow that, we find a node that contains the value 14 and points back to the first

play05:36

node at location 1000.

play05:37

So this linked list happened to be circular, but it could also have been terminated by

play05:41

using a next pointer value of 0 -- the null value -- which would indicate we’ve reached

play05:45

the end of the list.

play05:47

When programmers use linked lists, they rarely look at the memory values stored in the next

play05:50

pointers.

play05:51

Instead, they can use an abstraction of a linked list, that looks like this, which is

play05:55

much easier to conceptualize.

play05:56

Unlike an array, whose size has to be pre-defined, linked lists can be dynamically extended or

play06:01

shortened.

play06:02

For example, we can allocate a new node in memory, and insert it into this list, just

play06:06

by changing the next pointers.

play06:07

Linked Lists can also easily be re-ordered, trimmed, split, reversed, and so on.

play06:12

Which is pretty nifty!

play06:13

And pretty useful for algorithms like sorting, which we talked about last week.

play06:16

Owing to this flexibility, many more-complex data structures are built on top of linked lists

play06:21

The most famous and universal are queues and stacks.

play06:23

A queue – like the line at your post office – goes in order of arrival.

play06:27

The person who has been waiting the longest, gets served first.

play06:30

No matter how frustrating it is that all you want to do is buy stamps and the person in

play06:33

front of you seems to be mailing 23 packages.

play06:36

But, regardless, this behavior is called First-In First-Out, or FIFO.

play06:39

That’s the first part.

play06:41

Not the 23 packages thing.

play06:42

Imagine we have a pointer, named “post office queue”, that points to the first node in

play06:46

our linked list.

play06:47

Once we’re done serving Hank, we can read Hank’s next pointer, and update our “post

play06:51

office queue” pointer to the next person in the line.

play06:54

We’ve successfully dequeued Hank -- he’s gone, done, finished.

play06:58

If we want to enqueue someone, that is, add them to the line, we have to traverse down

play07:02

the linked list until we hit the end, and then change that next pointer to point to

play07:05

the new person.

play07:06

With just a small change, we can use linked lists as stacks, which are LIFO…

play07:10

Last-In First-Out.

play07:11

You can think of this like a stack of pancakes... as you make them, you add them to the top

play07:15

of stack.

play07:16

And when you want to eat one, you take them from the top of the stack.

play07:19

Delicious!

play07:20

Instead of enqueueing and dequeuing, data is pushed onto the stack and popped from the stacks.

play07:25

Yep, those are the official terms!

play07:27

If we update our node struct to contain not just one, but two pointers, we can build trees,

play07:32

another data structure that’s used in many algorithms.

play07:34

Again, programmers rarely look at the values of these pointers, and instead conceptualize

play07:38

trees like this: The top most node is called the root.

play07:42

And any nodes that hang from other nodes are called children nodes.

play07:45

As you might expect, nodes above children are called parent nodes.

play07:48

Does this example imply that Thomas Jefferson is the parent of Aaron Burr?

play07:51

I’ll leave that to your fanfiction to decide.

play07:54

And finally, any nodes that have no children -- where the tree ends -- are called Leaf Nodes.

play07:58

In our example, nodes can have up to two children, and for that reason, this particular data

play08:03

structure is called a binary tree.

play08:04

But you could just as easily have trees with three, four or any number of children by modifying

play08:09

the data structure accordingly.

play08:10

You can even have tree nodes that use linked lists to store all the nodes they point to.

play08:15

An important property of trees – both in real life and in data structures – is that

play08:18

there’s a one-way path from roots to leaves.

play08:20

It’d be weird if roots connected to leaves, that connected to roots.

play08:23

For data that links arbitrarily, that include things like loops, we can use a graph data

play08:28

structure instead.

play08:29

Remember our graph from last episode of cities connected by roads?

play08:32

This can be stored as nodes with many pointers, very much like a tree, but there is no notion

play08:36

of roots and leaves, and children and parents…

play08:38

Anything can point to anything!

play08:40

So that’s a whirlwind overview of pretty much all of the fundamental data structures

play08:43

used in computer science.

play08:45

On top of these basic building blocks, programmers have built all sorts of clever variants, with

play08:49

slightly different properties -- data structures like red-black trees and heaps, which we don’t

play08:53

have time to cover.

play08:54

These different data structures have properties that are useful for particular computations.

play08:59

The right choice of data structure can make your job a lot easier, so it pays off to think

play09:02

about how you want to structure your data before you jump in.

play09:05

Fortunately, most programming languages come with libraries packed full of ready-made data structures.

play09:10

For example, C++ has its Standard Template Library, and Java has the Java Class Library.

play09:15

These mean programmers don’t have to waste time implementing things from scratch, and

play09:19

can instead wield the power of data structures to do more interesting things, once again

play09:24

allowing us to operate at a new level of abstraction!

play09:27

I’ll see you next week.

Rate This

5.0 / 5 (0 votes)

Related Tags
数据结构算法数组字符串矩阵链表队列二叉树
Do you need a summary in English?