Files & File Systems: Crash Course Computer Science #20

CrashCourse
12 Jul 201712:03

Summary

TLDRIn this episode of Crash Course Computer Science, Carrie Anne explores the intricacies of file systems. She explains how files are organized and stored, introducing the concept of file formats like JPEG and MP3, and their metadata. The video delves into the structure of simple files, such as text files using ASCII and audio files like WAV, which contain headers and amplitude data. It also covers bitmaps, or BMP files, and how they store images as pixels with RGB values. The script further explains how files are managed through directory files and the evolution from flat to hierarchical file systems, which allow for efficient storage and retrieval of data, ultimately providing a foundational understanding of how computers keep files organized.

Takeaways

  • 💾 Data storage technologies like magnetic tape and hard disks can store vast amounts of data for long durations without power.
  • 📁 Computer files are 'big blobs' of related data, including text files, music files, photos, and videos.
  • 📑 File formats are essential for organizing data within files, with standard formats like JPEG and MP3 being widely used.
  • 🔢 Text files, such as TXT, store data as binary numbers, which are interpreted using character encoding standards like ASCII.
  • 🎵 WAV files store audio data with metadata in a header, which includes information like bit rate and track type.
  • 🖼️ BMP files store image data as pixels, each defined by red, green, and blue (RGB) values, with metadata detailing image dimensions and color depth.
  • 🗂️ File Systems are part of the Operating System that manage and keep track of stored files, abstracting the complexity of physical storage.
  • 📂 Flat File Systems store files in a single directory, while Hierarchical File Systems allow for a tree-like structure of directories and subdirectories.
  • 🔄 File fragmentation occurs when files are broken up and stored across multiple blocks, which can be mitigated through defragmentation.
  • 🔧 Directory Files act as tables of contents for storage, recording the location and size of files, and are crucial for file management.

Q & A

  • What is the purpose of a file system?

    -A file system is part of an operating system that manages and keeps track of stored files, providing a way to organize and access data on storage devices.

  • What is a file format and why is it important?

    -A file format is the way data is organized within a file. It is important because it allows for the data to be interpreted correctly by programs and is often standardized for consistency and ease of use.

  • How does a computer interpret the data in a text file?

    -Computers interpret the data in a text file using character encoding standards like ASCII, which maps numbers to characters, allowing the text to be decoded and read.

  • What is metadata and where is it typically stored in a file?

    -Metadata is data about data, such as bit rate, track information, or file dimensions. It is typically stored at the beginning of a file, in a header, before the actual data content.

  • How does a WAVE file store audio data?

    -A WAVE file stores audio data as a long list of numbers representing the amplitude of sound captured many times per second, which are then played back through speakers to reproduce the sound.

  • What is a bitmap and how does it store image data?

    -A bitmap, or BMP, stores image data by representing each pixel as a combination of red, green, and blue values, which are combined to create the full range of colors displayed on a screen.

  • How does a directory file help in managing files on a storage device?

    -A directory file acts as a table of contents, storing the names, metadata, and locations of other files on the storage device, allowing the computer to know where each file begins and ends.

  • What is the difference between a flat file system and a hierarchical file system?

    -A flat file system stores all files at one level, while a hierarchical file system organizes files into directories and subdirectories, allowing for a more complex and scalable organization of data.

  • Why is file fragmentation a problem and how is it resolved?

    -File fragmentation occurs when a file is stored across multiple, non-sequential blocks, which can slow down access times. It is resolved through defragmentation, where the computer reorganizes data so that files are stored in contiguous blocks.

  • How does the concept of a file system relate to virtual memory in operating systems?

    -Both file systems and virtual memory involve managing data in blocks and handling the allocation and deallocation of these blocks to accommodate changes in data size, making them conceptually similar.

  • What happens when a file is deleted in a file system?

    -When a file is deleted, its entry is removed from the directory file, marking the space it occupied as free. The actual data remains until it is overwritten, which allows for the possibility of data recovery.

Outlines

00:00

📄 Understanding File Formats and Systems

Carrie Anne introduces the concept of file systems, explaining how they organize data within computer files. She discusses the importance of file formats, such as JPEG and MP3, and provides examples of simple formats like TXT and WAV. The TXT format uses ASCII for character encoding, while WAV files store audio data with metadata in a header. The segment also covers how bitmaps (BMPs) store image data in pixels with color depth information. The key takeaway is that all files, regardless of format, are stored as binary on a device, and file formats are crucial for interpreting this data.

05:03

🗂 The Evolution of File Systems

This segment delves into the evolution of file systems, starting with the simple back-to-back storage method and the need for a directory file to track file locations. It explains the concept of flat file systems and how they manage files in a single level. The discussion then moves to modern file systems that store files in blocks, allowing for easier expansion and management. The idea of file fragmentation and its impact on storage efficiency is introduced, along with the solution of defragmentation. The paragraph concludes with an overview of hierarchical file systems, which organize files and directories in a tree-like structure, making it easier to manage large numbers of files.

10:06

🌐 Hierarchical File Systems and Data Management

The final paragraph focuses on hierarchical file systems, which allow for the organization of files into directories and subdirectories. It describes how the root directory serves as the starting point for file paths and how directory files within these directories manage file and folder listings. The benefits of this system include the ability to create deep hierarchies and easily move files by updating directory entries. The paragraph emphasizes the role of file systems in abstracting the complexity of raw data storage, enabling users to interact with files as organized and accessible entities, setting the stage for future discussions on user data manipulation.

Mindmap

Keywords

💡File Systems

File systems are the methods and data structures that a computer's operating system uses to control how data is stored and retrieved. They manage files and directories, allowing users to organize and access data. In the video, file systems are discussed as a way to keep files organized and to manage how data is stored on various hardware devices, such as magnetic tape or hard disks. The script mentions that file systems abstract the complexity of raw data storage, allowing users to interact with files as if they were simple, organized entities.

💡ASCII

ASCII (American Standard Code for Information Interchange) is a character encoding standard for electronic communication, which represents text in computers and other devices. It's a way to convert characters into numbers for storage and retrieval. In the script, ASCII is used as an example to explain how text files (.TXT) are stored as a series of numbers, where each number corresponds to a character via the ASCII standard.

💡File Formats

File formats are standardized ways of organizing data for storage in a computer file. They define the structure and layout of the data, making it interpretable by software. The script discusses file formats like JPEG, MP3, and TXT, emphasizing that while files can contain arbitrary data, it's more useful when the data is organized according to a specific format. This organization allows programs to correctly interpret and display the data.

💡Metadata

Metadata refers to data that provides information about other data. It's used to describe the characteristics of a file, such as its size, type, and when it was created or modified. In the script, metadata is described as data about data, particularly in the context of WAV files, where it's stored in a header at the beginning of the file and includes information like bit rate and track type, which are essential for interpreting the audio data.

💡Header

A header in the context of computer files is a section at the beginning of a file that contains metadata. It provides essential information about the file's content and structure. The script uses the WAV file as an example, explaining that the header contains metadata such as the file type and audio characteristics, which precede the actual audio data in the file.

💡Bitmap (BMP)

A bitmap, or BMP file, is a type of image file format used to store digital images, particularly in Windows operating systems. It's a raster graphics format that represents images as a grid of pixels, with each pixel's color defined by a set of values. In the script, BMPs are discussed as a way to store pictures, where each pixel is a combination of red, green, and blue values, and the metadata in the file includes the image's dimensions and color depth.

💡Pixel

A pixel, short for 'picture element,' is the smallest addressable element in a display device; it's a single point in a graphic image. Pixels are the building blocks of digital images, with each pixel representing a specific color and intensity. The script explains that in BMP files, images are made up of pixels, each defined by a combination of red, green, and blue values.

💡Directory File

A directory file is a special type of file that contains information about other files, such as their names, types, and locations on storage media. It's a crucial component of file systems, acting as a map to locate and manage files. The script describes how directory files work, noting that they are typically stored at the beginning of a storage device and contain metadata about other files, including their size and location.

💡Flat File System

A flat file system is a simple file system structure where all files are stored in a single directory, without the use of subdirectories or folders. This structure is straightforward but can become cumbersome with a large number of files. The script introduces the flat file system as an early method of file storage, where a directory file at the beginning of storage keeps track of all files, illustrating the evolution of file systems.

💡Fragmentation

In the context of file systems, fragmentation refers to the condition where a file's data is stored across multiple, non-contiguous blocks on a storage device. This can occur as files are added, deleted, or resized, leading to inefficiencies in data retrieval. The script discusses fragmentation as an inevitable byproduct of file system operations and contrasts it with defragmentation, which is the process of consolidating fragmented files into contiguous blocks for more efficient access.

💡Defragmentation

Defragmentation is the process of reorganizing the data on a storage device to store the parts of a file that are fragmented across different locations in a contiguous manner. This improves the performance of the file system by reducing the time it takes to read or write files. The script explains defragmentation as a solution to the problem of file fragmentation, making files easier and faster to access by consolidating their data into a single, sequential block.

💡Hierarchical File System

A hierarchical file system is a type of file system that organizes files and directories into a tree-like structure, with directories containing files and other directories. This structure allows for a more manageable and scalable way to organize large numbers of files. The script contrasts hierarchical file systems with flat file systems, highlighting how they enable the creation of folders and subfolders to logically group related files, making data management more efficient.

Highlights

Introduction to file systems and their role in organizing computer files.

Explanation of file formats and the importance of organizing data within files.

ASCII encoding standard's role in interpreting text files.

WAV file structure and how metadata is stored in a file header.

The process of digital audio sampling and its representation in WAVE files.

Bitmap (BMP) files and the concept of pixels in digital images.

The additive primary colors (red, green, blue) and their role in creating colors on electronic displays.

The metadata included in BMP files and its significance for image data interpretation.

The evolution from single-file storage to multiple files and the need for directory files.

The concept of flat file systems and their limitations in handling large storage capacities.

Modern file systems' use of blocks and slack space to manage file storage.

The process of file fragmentation and its impact on storage efficiency.

Defragmentation techniques to improve file access speeds.

Introduction to hierarchical file systems and their advantages over flat systems.

The role of the root directory in hierarchical file systems and file path management.

The ability to easily move files within hierarchical file systems without rearranging data blocks.

File systems as a form of abstraction that simplifies data organization and accessibility.

Transcripts

play00:03

Hi, I'm Carrie Anne, and welcome to Crash Course Computer Science!

play00:05

Last episode we talked about data storage, how technologies like magnetic tape and hard

play00:10

disks can store millions, billions and trillions of bits of data, for long durations, even

play00:13

without power.

play00:14

Which is perfect for recording “big blobs” of related data, what are more commonly called

play00:18

computer files.

play00:19

You’ve no doubt encountered many types, like text files, music files, photos and videos.

play00:24

Today, we’re going to talk about how files work, and how computers keep them all organized

play00:28

with File Systems.

play00:29

INTRO

play00:38

It’s perfectly legal for a file to contain arbitrary, unformatted data, but it’s most

play00:43

useful and practical if the data inside the file is organized somehow.

play00:47

This is called a file format.

play00:48

You can invent your own, and programmers do that from time to time, but it’s usually

play00:52

best and easiest to use an existing standard, like JPEG and MP3.

play00:56

Let’s look at some simple file formats.

play00:58

The most straightforward are T-X-T files, which contain, surprise, text.

play01:04

Like all computer files, this is just a huge list of numbers, stored as binary.

play01:08

If we look at the raw values of a T-X-T file in storage, it would look something like this:

play01:13

We can view this as decimal numbers instead of binary, but that still doesn’t help us

play01:16

read the text.

play01:17

The key to interpreting this data is knowing that T-X-T files use ASCII, a character encoding

play01:22

standard we discussed way back in Episode 4.

play01:24

So, in ASCII, our first value, 72, maps to the capital letter H. And in this way, we

play01:29

decode the whole file.

play01:31

Let’s look at a more complicated example: a WAVE File – also called a WAV – which

play01:35

stores audio.

play01:36

Before we can correctly read the data, we need to know some information, like the bit

play01:40

rate and whether it’s a single track or stereo.

play01:42

Data, about data, is called meta data.

play01:45

This metadata is stored at the front of the file, ahead of any actual data, in what’s

play01:49

known as a Header.

play01:50

Here’s what the first 44 bytes of a WAV file looks like.

play01:53

Some parts are always the same, like where it spells out W-A-V-E.

play01:58

Other parts contain numbers that change depending on the data contained within.

play02:01

The audio data comes right behind the metadata, and it’s stored as a long list of numbers.

play02:06

These values represent the amplitude of sound captured many times per second, and if you

play02:10

want a primer on sound, check out our video all about it in Crash Course Physics.

play02:14

Link in the dobblydoo.

play02:16

As an example, let’s look at a waveform of me saying: "hello!" Hello!

play02:19

Now that we’ve captured some sound, let’s zoom into a little snippet.

play02:23

A digital microphone, like the one in your computer or smartphone, samples the sound

play02:27

pressure thousands of times.

play02:28

Each sample can be represented as a number.

play02:31

Larger numbers mean higher sound pressure, what’s called amplitude.

play02:34

And these numbers are exactly what gets stored in a WAVE file!

play02:37

Thousands of amplitudes for every single second of audio!

play02:40

When it’s time to play this file, an audio program needs to actuate the computer's speakers

play02:44

such that the original waveform is emitted.

play02:47

“Hello!”

play02:47

So, now that you’re getting the hang of file formats, let’s talk about bitmaps or

play02:50

B-M-Ps, which store pictures.

play02:53

On a computer, PICtures are made up of little tiny square ELements called pixels.

play02:56

Each pixel is a combination of three colors: red, green and blue.

play03:00

These are called additive primary colors, and they can be mixed together to create any

play03:03

other color on our electronic displays.

play03:05

Now, just like WAV files, BMPs start with metadata, including key values like image

play03:10

width, image height, and color depth.

play03:12

As an example, let’s say the metadata specified an image 4 pixels wide, by 4 pixels tall,

play03:17

with a 24-bit color depth - that’s 8-bits for red, 8-bits for green, and 8-bits for blue.

play03:22

As a reminder, 8 bits is the same as one byte.

play03:25

The smallest number a byte can store is 0, and the largest is 255.

play03:28

Our image data is going to look something like this:

play03:31

Let’s look at the color of our first pixel.

play03:34

It has 255 for its red value, 255 for green and 255 for blue.

play03:39

This equates to full intensity red, full intensity green and full intensity blue.

play03:43

These colors blend together on your computer monitor to become white.

play03:46

So our first pixel is white!

play03:48

The next pixel has a Red-Green-Blue, or RGB value of 255, 255, 0.

play03:54

That’s the color yellow!

play03:56

The pixel after that has a RGB value of 0,0,0 - that’s zero intensity everything, which is black.

play04:02

And the next one is yellow.

play04:04

Because the metadata specified this was a 4 by 4 image, we know that we’ve reached

play04:08

the end of our first row of pixels.

play04:10

So, we need to drop down a row.

play04:12

The next RGB value is 255,255,0 – yellow again.

play04:16

Okay, let’s go ahead and read all the pixels in our 4x4 image… tada!

play04:21

A very low resolution pac-man!

play04:22

Obviously this is a simple example of a small image, but we could just as easily store this

play04:27

image in a BMP.

play04:28

I want to emphasize again that it doesn’t matter if it’s a text file, WAV, BMP, or

play04:32

fancier formats we don’t have time to discuss, like ZIPs and PPTs.

play04:35

Under the hood, they’re all the same: long lists of numbers, stored as binary, on a storage device.

play04:40

File formats are the key to reading and understanding the data inside.

play04:43

Now that you understand files a little better, let’s move on to how computers go about

play04:47

storing them.

play04:48

Even though the underlying storage medium might be a strip of tape, a drum, a disk,

play04:52

or integrated circuits... hardware and software abstractions let us think of storage as a

play04:56

long line of little buckets that store values.

play04:58

In the early days, when computers only performed one computation like calculating artillery

play05:03

range tables – the entire storage operated like one big file.

play05:07

Data started at the beginning of storage, and then filled it up in order as output was

play05:11

produced, up to the storage capacity.

play05:13

However, as computational power and storage capacity improved, it became possible, and

play05:17

useful, to store more than one file at a time.

play05:20

The simplest option is to store files back-to-back.

play05:23

This can work... but how does the computer know where files begin and end?

play05:27

Storage devices have no notion of files – they’re just a mechanism for storing lots of bits.

play05:31

So, for this to work, we need to have a special file that records where other ones are located.

play05:37

This goes by many names, but a good general term is Directory File.

play05:40

Most often, it’s kept right at the front of storage, so we always know where to access it.

play05:45

Location zero!

play05:47

Inside the Directory File are the names of all the other files in storage.

play05:50

In our example, they each have a name, followed by a period, and end with what’s called

play05:54

a File Extension, like “BMP” or “WAV”.

play05:57

Those further assist programs in identifying file types.

play06:01

The Directory File also stores metadata about these files, like when they were created and

play06:05

last modified, who the owner is, and if it can be read, written or both.

play06:09

But most importantly, the directory file contains where these files begin in storage, and how

play06:14

long they are.

play06:15

If we want to add a file, remove a file, change a filename, or similar, we have to update

play06:19

the information in the Directory File.

play06:21

It’s like the Table of Contents in a book, if you make a chapter shorter, or move it

play06:25

somewhere else, you have to update the table of contents, otherwise the page numbers won’t match!

play06:30

The Directory File, and the maintenance of it, is an example of a very basic File System,

play06:35

the part of an Operating System that manages and keep track of stored files.

play06:39

This particular example is a called a Flat File System, because they’re all stored at one level.

play06:44

It’s flat!

play06:45

Of course, packing files together, back-to-back, is a bit of a problem, because if we want

play06:48

to add some data to let’s say “todo.txt”, there’s no room to do it without overwriting

play06:53

part of “carrie.bmp”.

play06:55

So modern File Systems do two things.

play06:57

First, they store files in blocks.

play06:59

This leaves a little extra space for changes, called slack space.

play07:02

It also means that all file data is aligned to a common size, which simplifies management.

play07:07

In a scheme like this, our Directory File needs to keep track of what block each one

play07:11

is stored in.

play07:12

The second thing File Systems do, is allow files to be broken up into chunks and stored

play07:16

across many blocks.

play07:17

So let’s say we open “todo.txt”, and we add a few more items then the file becomes

play07:22

too big to be saved in its one block.

play07:24

We don’t want to overwrite the neighboring one, so instead, the File System allocates

play07:28

an unused block, which can accommodate extra data.

play07:30

With a File System scheme like this, the Directory File needs to store not just one block per

play07:35

file, but rather a list of blocks per file.

play07:37

In this way, we can have files of variable sizes that can be easily expanded and shrunk,

play07:42

simply by allocating and deallocating blocks.

play07:44

If you watched our episode on Operating Systems, this should sound a lot like Virtual Memory.

play07:49

Conceptually it’s very similar!

play07:51

Now let’s say we want to delete “carrie.bmp”.

play07:53

To do that, we can simply remove the entry from the Directory File.

play07:56

This, in turn, causes one block to become free.

play07:59

Note that we didn’t actually erase the file’s data in storage, we just deleted the record of it.

play08:04

At some point, that block will be overwritten with new data, but until then, it just sits there.

play08:08

This is one way that computer forensic teams can “recover” data from computers even

play08:12

though people think it has been deleted. Crafty!

play08:15

Ok, let’s say we add even more items to our todo list, which causes the File System

play08:20

to allocate yet another block to the file, in this case, recycling the block freed from

play08:24

carrie.bmp.

play08:25

Now our “todo.txt” is stored across 3 blocks, spaced apart, and also out of order.

play08:30

Files getting broken up across storage like this is called fragmentation.

play08:34

It’s the inevitable byproduct of files being created, deleted and modified.

play08:38

For many storage technologies, this is bad news.

play08:41

On magnetic tape, reading todo.txt into memory would require seeking to block 1, then fast

play08:46

forwarding to block 5, and then rewinding to block 3 – that’s a lot of back and forth!

play08:50

In real world File Systems, large files might be stored across hundreds of blocks, and you

play08:54

don’t want to have to wait five minutes for your files to open.

play08:57

The answer is defragmentation!

play08:59

That might sound like technobabble, but the process is really simple, and once upon a

play09:03

time it was really fun to watch!

play09:05

The computer copies around data so that files have blocks located together in storage and

play09:10

in the right order.

play09:11

After we’ve defragged, we can read our todo file, now located in blocks 1 through 3, in

play09:16

a single, quick read pass.

play09:17

So far, we’ve only been talking about Flat File Systems, where they’re all stored in

play09:21

one directory.

play09:23

This worked ok when computers only had a little bit of storage, and you might only have a

play09:27

dozen or so files.

play09:28

But as storage capacity exploded, like we discussed last episode, so did the number

play09:33

of files on computers.

play09:34

Very quickly, it became impractical to store all files together at one level.

play09:38

Just like documents in the real world, it’s handy to store related files together in folders.

play09:42

Then we can put connected folders into folders, and so on.

play09:46

This is a Hierarchical File System, and its what your computer uses.There are a variety

play09:51

of ways to implement this, but let’s stick with the File System example we’ve been

play09:54

using to convey the main idea.

play09:56

The biggest change is that our Directory File needs to be able to point not just to files,

play10:00

but also other directories.

play10:01

To keep track of what’s a file and what’s a directory, we need some extra metadata.

play10:05

This Directory File is the top-most one, known as the Root Directory.

play10:09

All other files and folders lie beneath this directory along various file paths.

play10:14

We can see inside of our “Root” Directory File that we have 3 files and 2 subdirectories:

play10:19

music and photos.

play10:21

If we want to see what’s stored in our music directory, we have to go to that block and

play10:25

read the Directory File located there; the format is the same as our root directory.

play10:29

There’s a lot of great songs in there!

play10:31

In addition to being able to create hierarchies of unlimited depth, this method also allows

play10:35

us to easily move around files.

play10:37

So, if we wanted to move “theme.wav” from our root directory to the music directory,

play10:42

we don’t have to re-arrange any blocks of data.

play10:44

We can simply modify the two Directory Files, removing an entry from one and adding it to another.

play10:50

Importantly, the theme.wav file stays in block 5.

play10:53

So that’s a quick overview of the key principles of File Systems.

play10:56

They provide yet another way to move up a new level of abstraction.

play11:06

File systems allow us to hide the raw bits stored on magnetic tape, spinning disks and

play11:11

the like, and they let us think of data as neatly organized and easily accessible files.

play11:15

We even started talking about users, not programmers, manipulating data, like opening files and

play11:20

organizing them, foreshadowing where the series will be going in a few episodes.

play11:24

I’ll see you next week.

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
File SystemsData StorageComputer ScienceFile FormatsMetadataOrganizationText FilesAudio FilesBMP ImagesDefragmentation
هل تحتاج إلى تلخيص باللغة الإنجليزية؟