Files & File Systems: Crash Course Computer Science #20

CrashCourse
12 Jul 201712:03

Summary

TLDR本视频介绍了计算机文件的工作原理和组织方式。文件是数据的集合,可以是文本、音乐、图片或视频等类型。为了便于使用和理解,数据通常以某种方式组织,称为文件格式。例如,文本文件使用ASCII编码,而WAV文件则包含音频的元数据和样本数据。BMP文件存储图像,由像素组成,每个像素由红绿蓝三色值定义。计算机使用文件系统来存储和管理文件,文件系统通过目录文件记录文件的位置和大小,允许文件跨多个存储块分布,从而实现文件的动态扩展和收缩。此外,现代文件系统还支持文件的碎片整理,以及层次化存储结构,使得文件和文件夹可以按层次组织,便于管理和访问。视频以深入浅出的方式,让观众对文件系统有了基础的理解。

Takeaways

  • 📚 文件系统是计算机组织和存储数据的方式,它允许数据以文件的形式存在。
  • 📁 文件格式是文件内部数据的组织方式,常见的有文本文件(TXT)、音频文件(WAV)、图片文件(BMP)等。
  • 🔢 计算机文件归根结底是由二进制数字组成的长列表,ASCII编码用于文本文件中字符的表示。
  • 🎵 WAV文件是一种音频文件格式,它在文件的头部存储元数据,如比特率和声道信息。
  • 🖼️ BMP文件是一种图像文件格式,由像素组成,每个像素由红色、绿色和蓝色的值混合而成。
  • 📝 目录文件(Directory File)记录了存储设备上所有文件的位置和大小,类似于书籍的目录。
  • 🔄 随着文件的增加,简单的背靠背存储方式会导致问题,因此现代文件系统采用块存储和文件跨多个块存储的方式。
  • 🧩 文件系统允许文件大小的动态变化,通过分配和回收块来实现文件的扩展和缩减。
  • 🗑️ 删除文件时,文件系统仅删除目录文件中的记录,实际数据并未立即被擦除,这是数据恢复的可能方式之一。
  • 🔄 磁盘碎片整理(Defragmentation)是将分散的文件块重新排列,以便更高效地读取文件。
  • 📁 现代文件系统采用分层结构,允许创建无限深度的文件和文件夹层次结构,这被称为分层文件系统。
  • 📈 文件系统为用户抽象了底层存储介质的细节,使得数据管理更加直观和方便。

Q & A

  • 什么是文件系统,它如何帮助计算机组织文件?

    -文件系统是操作系统的一部分,用于管理和跟踪存储的文件。它允许计算机通过目录文件(Directory File)记录文件的位置和大小,从而帮助组织和访问存储设备上的文件。

  • 为什么需要文件格式,它有什么作用?

    -文件格式是数据在文件内部的组织方式。它允许数据以一种有用的和实用的方式被组织和解释。使用现有的标准文件格式(如JPEG和MP3)通常是最好的选择,因为它们定义了数据如何被编码和解码。

  • ASCII编码标准是如何帮助解释文本文件的?

    -ASCII是一个字符编码标准,它将数字映射到字符。在文本文件中,每个字符都由一个数字表示,这个数字根据ASCII表映射到相应的字符,从而允许我们读取和理解文本内容。

  • WAV文件中的元数据是什么,它存储在哪里?

    -元数据是关于数据的数据,它提供了文件的一些基本信息,比如比特率、音轨类型(单声道或立体声)。在WAV文件中,元数据存储在文件的头部(Header),位于实际音频数据之前。

  • 在计算机中,位图(BMP)文件是如何存储图片的?

    -位图文件通过像素(Pixels)来存储图片,每个像素由红色、绿色和蓝色的组合构成。这些颜色被称为加性原色,可以混合在一起在电子显示屏上创造出任何颜色。BMP文件同样以元数据开始,包括图像宽度、高度和颜色深度等关键值。

  • 什么是目录文件,它在文件系统中扮演什么角色?

    -目录文件是一种特殊类型的文件,它记录了存储设备上其他文件的位置。它包含所有文件的名称、文件扩展名和关于这些文件的元数据,如创建和最后修改时间、所有者以及文件的读写权限。目录文件对于文件系统的运作至关重要,因为它帮助系统知道文件存储的具体位置。

  • 文件系统如何允许文件跨越多个存储块?

    -现代文件系统通过将文件分割成多个块(Chunks),并将这些块存储在不同的存储块中,允许文件跨越多个存储块。目录文件需要存储每个文件所占用的块的列表,从而实现对这些分散的块的管理。

  • 什么是文件碎片化,它对存储设备有什么影响?

    -文件碎片化是指文件被分散存储在存储设备的不同块中,这是文件创建、删除和修改的不可避免的副产品。对于许多存储技术来说,碎片化可能导致性能下降,因为读取文件时可能需要在存储设备上多次寻址和跳转。

  • 什么是文件的去碎片化,它如何帮助提高存储性能?

    -去碎片化是计算机通过复制数据,使得文件的块在存储中顺序相连并正确排序的过程。这个过程可以减少读取文件时的寻址和跳转次数,从而提高文件的访问速度和存储设备的整体性能。

  • 什么是层次文件系统,它与平铺文件系统有何不同?

    -层次文件系统是一种文件系统,它允许通过文件夹(目录)将相关的文件组织在一起,并且可以创建无限深度的文件夹层次结构。与平铺文件系统不同,层次文件系统不将所有文件存储在单一目录下,而是允许通过多级目录结构来组织文件。

  • 文件系统如何帮助普通用户操作和管理数据?

    -文件系统提供了一种抽象层,允许用户以文件和目录的形式来思考和操作数据,而不是直接与存储设备的原始位操作。这样,用户可以更容易地打开文件、组织数据,而不需要了解底层存储技术的细节。

Outlines

00:00

📁 文件系统与文件格式入门

本段介绍了计算机文件系统的基础知识。首先解释了文件格式的重要性,如文本文件(TXT)和音频文件(WAV),并详细描述了如何通过ASCII和元数据解码这些文件的数据。接着,介绍了位图(BMP)文件格式,讲述了如何通过RGB值来表达图像的颜色信息。最后,强调了不同文件格式的通用性,即它们本质上都是二进制形式存储的数字序列。

05:03

🖥️ 计算机文件系统的进阶知识

这一段深入讨论了文件系统的高级概念。从简单的顺序存储转变为更复杂的文件系统,如平面文件系统和分块存储。解释了目录文件的角色和如何管理文件的存储位置和元数据。还讨论了文件碎片化和整理碎片化的必要性,以及现代文件系统如何允许文件跨多个块存储并动态调整大小。最后,介绍了层次化文件系统,这种系统通过文件夹组织文件,使得文件管理更为高效和系统化。

10:06

🌐 文件系统的层次结构和数据管理

本段探讨了文件系统的层次结构和如何管理和访问存储在不同层次的文件和文件夹。通过“根”目录文件的例子,展示了文件和子目录的管理方式。详细描述了如何在不同目录间移动文件而不需要重新安排存储块,凸显了文件系统在现代计算中的抽象和便利性。最后,强调了文件系统不仅是技术性的存储管理,它也为用户提供了更直观和易于操作的数据组织方式。

Mindmap

Keywords

💡数据存储

数据存储是将数据保留在某种介质上的过程,如磁带、硬盘等。在视频中,数据存储技术允许计算机文件长期保存,即使在没有电源的情况下也能保持数据不丢失,这对于存储大量相关数据,即计算机文件来说非常重要。

💡文件系统

文件系统是一种用于存储和组织计算机文件以及它们的数据的方式。它允许计算机追踪文件的存储位置、大小和文件之间的关联。视频中提到,文件系统通过目录文件(Directory File)来记录其他文件的位置,类似于书籍的目录表。

💡文件格式

文件格式是指文件内部数据的组织方式。在视频中,提到了如TXT、WAV、BMP等文件格式,它们使用不同的标准来组织数据,如ASCII编码用于文本文件,而WAV文件则包含了音频的比特率和声道信息等元数据。

💡元数据

元数据是描述数据的数据,比如文件的创建时间、最后修改时间、所有者以及读写权限等。视频中指出,元数据存储在文件的头部(Header),它告诉我们文件中数据的具体信息,如WAV文件中的音频采样率和声道类型。

💡像素

像素是构成电子显示图像的最小单元,每个像素由红绿蓝三种颜色的组合来表示。视频中通过一个4x4像素的低分辨率图像示例,展示了如何通过像素的RGB值来构建图像。

💡目录文件

目录文件是一种特殊的文件,它记录了存储设备上其他文件的位置和信息。视频中解释说,目录文件通常位于存储设备的最开始部分,包含文件名、文件扩展名和文件的元数据,以及文件在存储设备上的起始位置和长度。

💡块存储

块存储是将文件分割成多个块并分别存储在不同的存储位置的方式。视频中提到,现代文件系统使用块存储来简化文件的管理,允许文件数据被对齐到一个共同的大小,并且可以通过分配和回收块来轻松地扩展和缩减文件的大小。

💡文件碎片

文件碎片是指文件被分成多个部分存储在存储设备的不同位置,导致读取文件时需要在存储设备上多次寻址。视频中解释了文件碎片是如何产生的,以及通过磁盘碎片整理(defragmentation)来优化文件的存储布局,提高文件访问速度。

💡层次文件系统

层次文件系统是一种文件存储方式,它允许文件和目录被组织成树状结构,每个目录可以包含文件和其他子目录。视频中指出,这种系统不仅支持无限深度的文件和目录层次结构,还允许用户轻松地移动文件,只需修改目录文件中的条目即可。

💡根目录

根目录是层次文件系统中最顶层的目录,它是所有文件和子目录的起点。视频中提到,根目录下的目录文件包含了存储在计算机上的文件和目录的组织结构,并且所有其他文件和文件夹都位于根目录之下。

💡ASCII编码

ASCII编码是一种字符编码标准,用于文本文件中的字符表示。视频中通过ASCII编码解释了如何将TXT文件中的二进制数值转换为可读的文本字符,例如,ASCII编码中数值72代表大写字母'H'。

Highlights

文件系统允许计算机存储和组织数百万、数十亿甚至数万亿比特的数据,即使在没有电源的情况下也能持续很长时间。

计算机文件是存储相关数据的“大数据块”,可以是文本文件、音乐文件、照片和视频等类型。

文件格式是组织文件内部数据的方式,如JPEG和MP3是常用的标准格式。

TXT文件使用ASCII字符编码标准,将数字转换为可读文本。

WAV文件存储音频数据,需要知道比特率和音轨信息等元数据才能正确读取。

元数据是关于数据的数据,它存储在文件的头部,位于实际数据之前。

数字麦克风通过多次采样声音压力,每个样本可以表示为一个数字。

BMP文件存储图片,由像素组成,每个像素是红绿蓝三种颜色的组合。

BMP文件的元数据包括图像宽度、高度和颜色深度等关键值。

文件系统将数据存储在块中,留有额外空间以便于修改,称为松弛空间。

现代文件系统允许文件被分成多个块并存储在多个块中,从而实现文件的动态扩展和缩减。

目录文件(Directory File)记录了其他文件的位置和大小,类似于书籍的目录。

删除文件时,实际上是从目录文件中移除条目,而不是真正擦除存储中的数据。

文件碎片化是文件创建、删除和修改的不可避免的副产品,影响存储技术的读取效率。

文件系统通过碎片整理将文件块重新排列,提高文件读取速度。

分层文件系统允许无限深度的文件和文件夹层次结构,便于管理和移动文件。

文件系统是操作系统的一部分,负责管理和跟踪存储的文件,提供了一种抽象层,隐藏了存储介质上原始的比特。

文件系统允许用户(非程序员)操作数据,如打开文件和组织文件,预示着系列后续将讨论的内容。

Transcripts

play00:03

Hi, I'm Carrie Anne, and welcome to Crash Course Computer Science!

play00:05

Last episode we talked about data storage, how technologies like magnetic tape and hard

play00:10

disks can store millions, billions and trillions of bits of data, for long durations, even

play00:13

without power.

play00:14

Which is perfect for recording “big blobs” of related data, what are more commonly called

play00:18

computer files.

play00:19

You’ve no doubt encountered many types, like text files, music files, photos and videos.

play00:24

Today, we’re going to talk about how files work, and how computers keep them all organized

play00:28

with File Systems.

play00:29

INTRO

play00:38

It’s perfectly legal for a file to contain arbitrary, unformatted data, but it’s most

play00:43

useful and practical if the data inside the file is organized somehow.

play00:47

This is called a file format.

play00:48

You can invent your own, and programmers do that from time to time, but it’s usually

play00:52

best and easiest to use an existing standard, like JPEG and MP3.

play00:56

Let’s look at some simple file formats.

play00:58

The most straightforward are T-X-T files, which contain, surprise, text.

play01:04

Like all computer files, this is just a huge list of numbers, stored as binary.

play01:08

If we look at the raw values of a T-X-T file in storage, it would look something like this:

play01:13

We can view this as decimal numbers instead of binary, but that still doesn’t help us

play01:16

read the text.

play01:17

The key to interpreting this data is knowing that T-X-T files use ASCII, a character encoding

play01:22

standard we discussed way back in Episode 4.

play01:24

So, in ASCII, our first value, 72, maps to the capital letter H. And in this way, we

play01:29

decode the whole file.

play01:31

Let’s look at a more complicated example: a WAVE File – also called a WAV – which

play01:35

stores audio.

play01:36

Before we can correctly read the data, we need to know some information, like the bit

play01:40

rate and whether it’s a single track or stereo.

play01:42

Data, about data, is called meta data.

play01:45

This metadata is stored at the front of the file, ahead of any actual data, in what’s

play01:49

known as a Header.

play01:50

Here’s what the first 44 bytes of a WAV file looks like.

play01:53

Some parts are always the same, like where it spells out W-A-V-E.

play01:58

Other parts contain numbers that change depending on the data contained within.

play02:01

The audio data comes right behind the metadata, and it’s stored as a long list of numbers.

play02:06

These values represent the amplitude of sound captured many times per second, and if you

play02:10

want a primer on sound, check out our video all about it in Crash Course Physics.

play02:14

Link in the dobblydoo.

play02:16

As an example, let’s look at a waveform of me saying: "hello!" Hello!

play02:19

Now that we’ve captured some sound, let’s zoom into a little snippet.

play02:23

A digital microphone, like the one in your computer or smartphone, samples the sound

play02:27

pressure thousands of times.

play02:28

Each sample can be represented as a number.

play02:31

Larger numbers mean higher sound pressure, what’s called amplitude.

play02:34

And these numbers are exactly what gets stored in a WAVE file!

play02:37

Thousands of amplitudes for every single second of audio!

play02:40

When it’s time to play this file, an audio program needs to actuate the computer's speakers

play02:44

such that the original waveform is emitted.

play02:47

“Hello!”

play02:47

So, now that you’re getting the hang of file formats, let’s talk about bitmaps or

play02:50

B-M-Ps, which store pictures.

play02:53

On a computer, PICtures are made up of little tiny square ELements called pixels.

play02:56

Each pixel is a combination of three colors: red, green and blue.

play03:00

These are called additive primary colors, and they can be mixed together to create any

play03:03

other color on our electronic displays.

play03:05

Now, just like WAV files, BMPs start with metadata, including key values like image

play03:10

width, image height, and color depth.

play03:12

As an example, let’s say the metadata specified an image 4 pixels wide, by 4 pixels tall,

play03:17

with a 24-bit color depth - that’s 8-bits for red, 8-bits for green, and 8-bits for blue.

play03:22

As a reminder, 8 bits is the same as one byte.

play03:25

The smallest number a byte can store is 0, and the largest is 255.

play03:28

Our image data is going to look something like this:

play03:31

Let’s look at the color of our first pixel.

play03:34

It has 255 for its red value, 255 for green and 255 for blue.

play03:39

This equates to full intensity red, full intensity green and full intensity blue.

play03:43

These colors blend together on your computer monitor to become white.

play03:46

So our first pixel is white!

play03:48

The next pixel has a Red-Green-Blue, or RGB value of 255, 255, 0.

play03:54

That’s the color yellow!

play03:56

The pixel after that has a RGB value of 0,0,0 - that’s zero intensity everything, which is black.

play04:02

And the next one is yellow.

play04:04

Because the metadata specified this was a 4 by 4 image, we know that we’ve reached

play04:08

the end of our first row of pixels.

play04:10

So, we need to drop down a row.

play04:12

The next RGB value is 255,255,0 – yellow again.

play04:16

Okay, let’s go ahead and read all the pixels in our 4x4 image… tada!

play04:21

A very low resolution pac-man!

play04:22

Obviously this is a simple example of a small image, but we could just as easily store this

play04:27

image in a BMP.

play04:28

I want to emphasize again that it doesn’t matter if it’s a text file, WAV, BMP, or

play04:32

fancier formats we don’t have time to discuss, like ZIPs and PPTs.

play04:35

Under the hood, they’re all the same: long lists of numbers, stored as binary, on a storage device.

play04:40

File formats are the key to reading and understanding the data inside.

play04:43

Now that you understand files a little better, let’s move on to how computers go about

play04:47

storing them.

play04:48

Even though the underlying storage medium might be a strip of tape, a drum, a disk,

play04:52

or integrated circuits... hardware and software abstractions let us think of storage as a

play04:56

long line of little buckets that store values.

play04:58

In the early days, when computers only performed one computation like calculating artillery

play05:03

range tables – the entire storage operated like one big file.

play05:07

Data started at the beginning of storage, and then filled it up in order as output was

play05:11

produced, up to the storage capacity.

play05:13

However, as computational power and storage capacity improved, it became possible, and

play05:17

useful, to store more than one file at a time.

play05:20

The simplest option is to store files back-to-back.

play05:23

This can work... but how does the computer know where files begin and end?

play05:27

Storage devices have no notion of files – they’re just a mechanism for storing lots of bits.

play05:31

So, for this to work, we need to have a special file that records where other ones are located.

play05:37

This goes by many names, but a good general term is Directory File.

play05:40

Most often, it’s kept right at the front of storage, so we always know where to access it.

play05:45

Location zero!

play05:47

Inside the Directory File are the names of all the other files in storage.

play05:50

In our example, they each have a name, followed by a period, and end with what’s called

play05:54

a File Extension, like “BMP” or “WAV”.

play05:57

Those further assist programs in identifying file types.

play06:01

The Directory File also stores metadata about these files, like when they were created and

play06:05

last modified, who the owner is, and if it can be read, written or both.

play06:09

But most importantly, the directory file contains where these files begin in storage, and how

play06:14

long they are.

play06:15

If we want to add a file, remove a file, change a filename, or similar, we have to update

play06:19

the information in the Directory File.

play06:21

It’s like the Table of Contents in a book, if you make a chapter shorter, or move it

play06:25

somewhere else, you have to update the table of contents, otherwise the page numbers won’t match!

play06:30

The Directory File, and the maintenance of it, is an example of a very basic File System,

play06:35

the part of an Operating System that manages and keep track of stored files.

play06:39

This particular example is a called a Flat File System, because they’re all stored at one level.

play06:44

It’s flat!

play06:45

Of course, packing files together, back-to-back, is a bit of a problem, because if we want

play06:48

to add some data to let’s say “todo.txt”, there’s no room to do it without overwriting

play06:53

part of “carrie.bmp”.

play06:55

So modern File Systems do two things.

play06:57

First, they store files in blocks.

play06:59

This leaves a little extra space for changes, called slack space.

play07:02

It also means that all file data is aligned to a common size, which simplifies management.

play07:07

In a scheme like this, our Directory File needs to keep track of what block each one

play07:11

is stored in.

play07:12

The second thing File Systems do, is allow files to be broken up into chunks and stored

play07:16

across many blocks.

play07:17

So let’s say we open “todo.txt”, and we add a few more items then the file becomes

play07:22

too big to be saved in its one block.

play07:24

We don’t want to overwrite the neighboring one, so instead, the File System allocates

play07:28

an unused block, which can accommodate extra data.

play07:30

With a File System scheme like this, the Directory File needs to store not just one block per

play07:35

file, but rather a list of blocks per file.

play07:37

In this way, we can have files of variable sizes that can be easily expanded and shrunk,

play07:42

simply by allocating and deallocating blocks.

play07:44

If you watched our episode on Operating Systems, this should sound a lot like Virtual Memory.

play07:49

Conceptually it’s very similar!

play07:51

Now let’s say we want to delete “carrie.bmp”.

play07:53

To do that, we can simply remove the entry from the Directory File.

play07:56

This, in turn, causes one block to become free.

play07:59

Note that we didn’t actually erase the file’s data in storage, we just deleted the record of it.

play08:04

At some point, that block will be overwritten with new data, but until then, it just sits there.

play08:08

This is one way that computer forensic teams can “recover” data from computers even

play08:12

though people think it has been deleted. Crafty!

play08:15

Ok, let’s say we add even more items to our todo list, which causes the File System

play08:20

to allocate yet another block to the file, in this case, recycling the block freed from

play08:24

carrie.bmp.

play08:25

Now our “todo.txt” is stored across 3 blocks, spaced apart, and also out of order.

play08:30

Files getting broken up across storage like this is called fragmentation.

play08:34

It’s the inevitable byproduct of files being created, deleted and modified.

play08:38

For many storage technologies, this is bad news.

play08:41

On magnetic tape, reading todo.txt into memory would require seeking to block 1, then fast

play08:46

forwarding to block 5, and then rewinding to block 3 – that’s a lot of back and forth!

play08:50

In real world File Systems, large files might be stored across hundreds of blocks, and you

play08:54

don’t want to have to wait five minutes for your files to open.

play08:57

The answer is defragmentation!

play08:59

That might sound like technobabble, but the process is really simple, and once upon a

play09:03

time it was really fun to watch!

play09:05

The computer copies around data so that files have blocks located together in storage and

play09:10

in the right order.

play09:11

After we’ve defragged, we can read our todo file, now located in blocks 1 through 3, in

play09:16

a single, quick read pass.

play09:17

So far, we’ve only been talking about Flat File Systems, where they’re all stored in

play09:21

one directory.

play09:23

This worked ok when computers only had a little bit of storage, and you might only have a

play09:27

dozen or so files.

play09:28

But as storage capacity exploded, like we discussed last episode, so did the number

play09:33

of files on computers.

play09:34

Very quickly, it became impractical to store all files together at one level.

play09:38

Just like documents in the real world, it’s handy to store related files together in folders.

play09:42

Then we can put connected folders into folders, and so on.

play09:46

This is a Hierarchical File System, and its what your computer uses.There are a variety

play09:51

of ways to implement this, but let’s stick with the File System example we’ve been

play09:54

using to convey the main idea.

play09:56

The biggest change is that our Directory File needs to be able to point not just to files,

play10:00

but also other directories.

play10:01

To keep track of what’s a file and what’s a directory, we need some extra metadata.

play10:05

This Directory File is the top-most one, known as the Root Directory.

play10:09

All other files and folders lie beneath this directory along various file paths.

play10:14

We can see inside of our “Root” Directory File that we have 3 files and 2 subdirectories:

play10:19

music and photos.

play10:21

If we want to see what’s stored in our music directory, we have to go to that block and

play10:25

read the Directory File located there; the format is the same as our root directory.

play10:29

There’s a lot of great songs in there!

play10:31

In addition to being able to create hierarchies of unlimited depth, this method also allows

play10:35

us to easily move around files.

play10:37

So, if we wanted to move “theme.wav” from our root directory to the music directory,

play10:42

we don’t have to re-arrange any blocks of data.

play10:44

We can simply modify the two Directory Files, removing an entry from one and adding it to another.

play10:50

Importantly, the theme.wav file stays in block 5.

play10:53

So that’s a quick overview of the key principles of File Systems.

play10:56

They provide yet another way to move up a new level of abstraction.

play11:06

File systems allow us to hide the raw bits stored on magnetic tape, spinning disks and

play11:11

the like, and they let us think of data as neatly organized and easily accessible files.

play11:15

We even started talking about users, not programmers, manipulating data, like opening files and

play11:20

organizing them, foreshadowing where the series will be going in a few episodes.

play11:24

I’ll see you next week.

Rate This

5.0 / 5 (0 votes)

相关标签
文件系统数据存储文本文件音频格式图片格式ASCII编码元数据BMP图像WAV文件目录文件数据恢复操作系统文件管理计算机科学数据组织技术原理信息科学