Compression: Crash Course Computer Science #21
Summary
TLDR本视频由Crash Course Computer Science出品,由Carrie Anne主持,深入探讨了数据压缩技术。视频首先解释了文件的概念,随后指出了传统文件格式如文本、波形和位图的局限性,即它们不够高效。接着,视频介绍了压缩技术,包括无损压缩和有损压缩,通过实例如Pac-Man图像展示了如何使用运行长度编码和霍夫曼编码来减少数据大小。此外,还讨论了音频和视频压缩技术,解释了如何通过去除人类不易感知的细节来实现数据的大幅度减少,同时保持了可接受的感知质量。视频最后强调了压缩技术对于存储和传输大量数据的重要性,特别是在流媒体服务中的应用。
Takeaways
- 🗜️ 文件压缩是将数据编码成更少比特数以减少文件大小的技术,这使得存储和传输更加高效。
- 🔴 运行长度编码(Run-Length Encoding)是一种简单的压缩技术,通过减少文件中重复或冗余信息的数量来实现压缩。
- 📈 霍夫曼编码是一种利用数据频率来生成更紧凑表示的无损压缩技术,它通过构建霍夫曼树来实现。
- 🎨 无损压缩技术不会丢失任何信息,解压后的数据与压缩前完全相同。
- 🧩 有损压缩技术通过去除或减少人感知不明显的信息来减小文件大小,例如JPEG和MP3。
- 👂 有损音频压缩利用人耳对不同频率的敏感度不同,去除或降低人耳不易察觉的频率的精度。
- 👀 有损图像压缩通过识别人类视觉系统对细节的不敏感,去除一些细节以减少数据量。
- 📹 视频压缩利用帧与帧之间的时间冗余,通过复制和重用数据块来减少传输的数据量。
- 🚫 有损压缩在压缩过重时可能导致数据错误,从而产生视频播放中的异常效果。
- 📈 压缩技术允许用户以高效的方式存储图片、音乐和视频,对于数据传输和存储至关重要。
- 🌐 压缩技术使得在线流媒体服务(如YouTube)能够以较小的成本传输大量数据。
- 🎬 CuriosityStream是一个提供纪录片和非虚构作品的流媒体服务,推荐观看“Miniverse”,了解太阳系的奇妙。
Q & A
为什么我们需要对文件进行压缩?
-文件压缩可以减小文件的大小,使我们能够在不填满硬盘的情况下存储更多的文件,并且能够更快地传输它们,避免了例如等待电子邮件附件下载时的不便。
什么是无损压缩,它有什么特点?
-无损压缩是一种压缩技术,它允许数据在压缩和解压缩后能够完全恢复到原始状态,没有任何信息损失。这意味着解压缩后的数据与压缩前的数据完全相同,比特对比特。
运行长度编码(Run-Length Encoding)是如何工作的?
-运行长度编码是一种简单的数据压缩方式,它通过识别文件中连续出现的相同值的序列(运行),并用一个表示长度的额外字节来代替这个序列,从而减少重复或冗余信息。
霍夫曼编码(Huffman Coding)是如何生成有效的编码的?
-霍夫曼编码通过构建一个霍夫曼树来生成有效的编码。它首先列出所有可能的数据块及其频率,然后在每一轮中选择频率最低的两个块,将它们合并为一个新节点,并记录这个新节点的总频率。这个过程重复进行,直到所有块都被合并到一个树中。然后,使用这个按频率排序的树,通过给每个分支标记0或1来生成所需的编码。
为什么霍夫曼编码的代码是前缀免费的?
-霍夫曼编码的代码是前缀免费的,因为每一条从树根到叶节点的路径是唯一的,这意味着没有任何一个代码是以另一个完整的代码开始的,从而避免了代码之间的冲突。
有损压缩和无损压缩的主要区别是什么?
-有损压缩允许在压缩过程中丢失一些信息,通常这些信息是人的视觉或听觉不易察觉的。而无损压缩则保证了压缩和解压缩后的数据与原始数据完全相同,没有任何信息的丢失。
为什么我们可以在有损压缩中丢弃一些数据而不显著影响用户体验?
-有损压缩利用了人类感知系统的局限性,例如在音频和图像压缩中,人耳对某些频率的声音不敏感,人眼对细微的颜色变化也不敏感。通过丢弃或减少这些不易感知的细节,可以在不显著影响用户体验的情况下显著减少文件大小。
JPEG图像压缩是如何工作的?
-JPEG图像压缩通过将图像分割成8x8像素的块,然后丢弃许多高频空间数据来工作。这样做保留了视觉的本质,但可能只使用了原始数据的一小部分,从而实现了高压缩率。
视频压缩中的时间冗余是什么?
-时间冗余是指在视频的连续帧之间,许多像素是相同的,不需要在每一帧中重新传输这些像素。视频格式可以利用这一点,通过仅传输数据来编码帧与帧之间的差异,而不是重新传输所有像素,从而提高压缩效率。
为什么压缩技术对于数据存储和传输非常重要?
-压缩技术允许用户以高效的方式存储图片、音乐和视频,没有它,流式传输YouTube上喜爱的Carpool Karaoke视频几乎是不可能的,因为带宽和传输如此大量数据的经济性会受到限制。
为什么Skype通话有时听起来像机器人在说话?
-当音频信号质量或带宽变差时,压缩算法会移除更多的数据,进一步降低精度,这可能导致Skype通话听起来像机器人在说话,这是因为压缩过程中丢失了一些细节信息。
Outlines
📚 文件压缩基础
本段介绍了文件压缩的重要性和基本概念。Carrie Anne 首先回顾了文件和基本文件格式,然后指出了这些格式的局限性,即它们不够高效。为了解决这个问题,她引入了压缩的概念,解释了如何通过使用更少的比特来编码数据,从而实现文件大小的减少。接着,她用一个4x4像素的Pac-Man图像作为例子,展示了如何通过运行长度编码(RLE)来减少冗余信息,以及如何通过构建霍夫曼树来生成更紧凑的表示,从而实现无损压缩。
🔊 有损压缩与感知编码
第二段深入讲解了有损压缩技术,特别是如何利用人类感知系统的局限性来减少数据量。首先,通过声音的示例,说明了人类对不同频率的听觉敏感度不同,因此可以在不显著影响听觉体验的情况下丢弃或降低某些频率的精度。然后,通过JPEG图像压缩的例子,展示了如何通过丢弃8x8像素块中的高频空间数据来实现图像的有损压缩。此外,还讨论了视频压缩技术,它们利用帧与帧之间的时间冗余,通过复制和变换数据块来减少所需的数据量。
🎥 视频压缩与应用
最后一段讨论了视频压缩的重要性,并举了MPEG-4视频压缩标准为例,说明了压缩技术如何使视频文件大大减小,同时保持可接受的图像质量。还提到了压缩过度可能导致的图像质量问题,如错误的运动应用导致的怪异效果。最后,强调了压缩技术对于数据存储和传输的重要性,举例说明了如果没有压缩技术,流媒体服务将难以为继。此外,还提到了CuriosityStream流媒体服务,并推荐了一部关于太阳系的纪录片。
Mindmap
Keywords
💡压缩
💡无损压缩
💡有损压缩
💡感知编码
💡Huffman树
💡Run-Length Encoding
💡位图
💡元数据
💡心理物理学
💡JPEG
💡MP3
Highlights
文件压缩是减少数据存储和传输时间的关键技术。
压缩技术通过使用比原始表示更少的比特来编码数据。
Run-Length Encoding(RLE)通过减少重复信息来压缩数据。
使用RLE,Pac-man图像的数据从48字节减少到24字节,压缩了50%。
无损压缩(lossless compression)允许数据在压缩后完全恢复,不丢失任何信息。
Huffman编码是一种根据数据块出现频率生成高效编码的方法。
Huffman树构建过程选择频率最低的两个块,将它们合并为一个新块。
通过Huffman树,Pac-man图像数据被压缩到仅14位,远小于原始的48字节。
压缩文件格式如GIF、PNG、PDF和ZIP通常结合了去除冗余和使用更紧凑表示的方法。
有损压缩(lossy compression)通过去除或降低与人类感知不敏感的信息的精度来减少文件大小。
音频压缩利用人耳对不同频率的敏感度不同,去除超声波等无法听到的频率。
MP3等压缩音频文件比未压缩的音频格式如WAV或FLAC小得多。
JPEG图像压缩通过丢弃8x8像素块中的高频空间数据来减少文件大小。
视频压缩通过利用帧之间的时间冗余和像素差异来提高压缩效率。
高级视频压缩格式如MPEG-4可以将视频文件压缩到原始大小的1/20到1/200。
压缩技术对于存储和传输大量数据至关重要,它使得在线视频流媒体服务如YouTube成为可能。
Crash Course Computer Science由CuriosityStream赞助,后者是一个提供大量纪录片和非虚构作品的流媒体服务。
Transcripts
This episode is brought to you by Curiosity Stream.
Hi, I'm Carrie Anne, and welcome to Crash Course Computer Science!
Last episode we talked about Files, bundles of data, stored on a computer, that are formatted
and arranged to encode information, like text, sound or images.
We even discussed some basic file formats, like text, wave, and bitmap.
While these formats are perfectly fine and still used today, their simplicity also means
they’re not very efficient.
Ideally, we want files to be as small as possible, so we can store lots of them without filling
up our hard drives, and also transmit them more quickly.
Nothing is more frustrating than waiting for an email attachment to download. Ugh!
The answer is compression, which literally squeezes data into a smaller size.
To do this, we have to encode data using fewer bits than the original representation.
That might sound like magic, but it’s actually computer science!
INTRO
Lets return to our old friend from last episode, Mr. Pac-man!
This image is 4 pixels by 4 pixels.
As we discussed, image data is typically stored as a list of pixel values.
To know where rows end, image files have metadata, which defines properties like dimensions.
But, to keep it simple today, we’re not going to worry about it.
Each pixel’s color is a combination of three additive primary colors: red, green and blue.
We store each of those values in one byte, giving us a range of 0 to 255 for each color.
If you mix full intensity red, green and blue - that’s 255 for all three values - you
get the color white.
If you mix full intensity red and green, but no blue (it’s 0), you get yellow.
We have 16 pixels in our image, and each of those needs 3 bytes of color data.
That means this image’s data will consume 48 bytes of storage.
But, we can compress the data and pack it into a smaller number of bytes than 48!
One way to compress data is to reduce repeated or redundant information.
The most straightforward way to do this is called Run-Length Encoding.
This takes advantage of the fact that there are often runs of identical values in files.
For example, in our pac-man image, there are 7 yellow pixels in a row.
Instead of encoding redundant data: yellow pixel, yellow pixel, yellow pixel, and so
on, we can just say “there’s 7 yellow pixels in a row” by inserting an extra byte
that specifies the length of the run, like so:
And then we can eliminate the redundant data behind it.
To ensure that computers don’t get confused with which bytes are run lengths and which
bytes represent color, we have to be consistent in how we apply this scheme.
So, we need to preface all pixels with their run-length.
In some cases, this actually adds data, but on the whole, we’ve dramatically reduced
the number of bytes we need to encode this image.
We’re now at 24 bytes, down from 48.
That’s 50% smaller!
A huge saving!
Also note that we haven’t lost any data.
We can easily expand this back to the original form without any degradation.
A compression technique that has this characteristic is called lossless compression, because we
don’t lose anything.
The decompressed data is identical to the original before compression, bit for bit.
Let's take a look at another type of lossless compression, where blocks of data are replaced
by more compact representations.
This is sort of like “don’t forget to be awesome” being replaced by DFTBA.
To do this, we need a dictionary that stores the mapping from codes to data.
Lets see how this works for our example.
We can view our image as not just a string of individual pixels, but as little blocks
of data.
For simplicity, we’re going to use pixel pairs, which are 6 bytes long, but blocks
can be any size.
In our example, there are only four pairings: White-yellow, black-yellow, yellow-yellow
and white-white.
Those are the data blocks in our dictionary we want to generate compact codes for.
What’s interesting, is that these blocks occur at different frequencies.
There are 4 yellow-yellow pairs, 2 white-yellow pairs, and 1 each of black-yellow and white-white.
Because yellow-yellow is the most common block, we want that to be substituted for the most
compact representation.
On the other hand, black-yellow and white-white, can be substituted for something longer because
those blocks are infrequent.
One method for generating efficient codes is building a Huffman Tree, invented by David
Huffman while he was a student at MIT in the 1950s.
His algorithm goes like this.
First, you layout all the possible blocks and their frequencies.
At every round, you select the two with the lowest frequencies.
Here, that’s Black-Yellow and White-White, each with a frequency of 1.
You combine these into a little tree... ...which have a combined frequency of 2, so we record
that.
And now one step of the algorithm done.
Now we repeat the process.
This time we have three things to choose from.
Just like before, we select the two with the lowest frequency, put them into a little tree,
and record the new total frequency of all the sub items.
Ok, we’re almost done.
This time it’s easy to select the two items with the lowest frequency because there are
only two things left to pick.
We combine these into a tree, and now we’re done!
Our tree looks like this, and it has a very cool property: it’s arranged by frequency,
with less common items lower down.
So, now we have a tree, but you may be wondering how this gets us to a dictionary.
Well, we use our frequency-sorted tree to generate the codes we need by labeling each
branch with a 0 or a 1, like so:
With this, we can write out our code dictionary.
Yellow-yellow is encoded as just a single 0.
White-yellow is encoded as 1 0 (“one zero”)
Black-Yellow is 1 1 0
and finally white-white is 1 1 1.
The really cool thing about these codewords is that there’s no way to have conflicting
codes, because each path down the tree is unique.
This means our codes are prefix-free, that is no code starts with another complete code.
Now, let’s return to our image data and compress it!
Our first pixel pair, white-yellow, is substituted for the bits “1 0”.
The next pair is black-yellow, which is substituted for “1 1 0”.
Next is yellow-yellow with the incredibly compact substitution of just “0”.
And this process repeats for the rest of the image:
So instead of 48 bytes of image data ...this process has encoded it into 14 bits -- NOT
BYTES -- BITS!!
That’s less than 2 bytes of data!
But, don’t break out the champagne quite yet!
This data is meaningless unless we also save our code dictionary.
So, we’ll need to append it to the front of the image data, like this.
Now, including the dictionary, our image data is 30 bytes long.
That’s still a significant improvement over 48 bytes.
The two approaches we discussed, removing redundancies and using more compact representations,
are often combined, and underlie almost all lossless compressed file formats, like GIF,
PNG, PDF and ZIP files.
Both run-length encoding and dictionary coders are lossless compression techniques.
No information is lost; when you decompress, you get the original file.
That’s really important for many types of files.
Like, it’d be very odd if I zipped up a word document to send to you, and when you
decompressed it on your computer, the text was different.
But, there are other types of files where we can get away with little changes, perhaps
by removing unnecessary or less important information, especially information that human
perception is not good at detecting.
And this trick underlies most lossy compression techniques.
These tend to be pretty complicated, so we’re going to attack this at a conceptual level.
Let’s take sound as an example.
Your hearing is not perfect.
We can hear some frequencies of sound better than others.
And there are some we can’t hear at all, like ultrasound.
Unless you’re a bat.
Basically, if we make a recording of music, and there’s data in the ultrasonic frequency
range, we can discard it, because we know that humans can’t hear it.
On the other hand, humans are very sensitive to frequencies in the vocal range, like people
singing, so it’s best to preserve quality there as much as possible.
Deep bass is somewhere in between.
Humans can hear it, but we’re less attuned to it.
We mostly sense it.
Lossy audio compressors takes advantage of this, and encode different frequency bands
at different precisions.
Even if the result is rougher, it’s likely that users won’t perceive the difference.
Or at least it doesn’t dramatically affect the experience.
And here comes the hate mail from the audiophiles!
You encounter this type of audio compression all the time.
It’s one of the reasons you sound different on a cellphone versus in person.
The audio data is being compressed, allowing more people to take calls at once.
As the signal quality or bandwidth get worse, compression algorithms remove more data, further
reducing precision, which is why Skype calls sometimes sound like robots talking.
Compared to an uncompressed audio format, like a WAV or FLAC (there we go, got the audiophiles back)
compressed audio files, like MP3s, are often 10 times smaller.
That’s a huge saving!
And it’s why I’ve got a killer music collection on my retro iPod.
Don’t judge.
This idea of discarding or reducing precision in a manner that aligns with human perception
is called perceptual coding, and it relies on models of human perception,
which come from a field of study called Psychophysics.
This same idea is the basis of lossy compressed image formats, most famously JPEGs.
Like hearing, the human visual system is imperfect.
We’re really good at detecting sharp contrasts, like the edges of objects, but our perceptual
system isn’t so hot with subtle color variations.
JPEG takes advantage of this by breaking images up into blocks of 8x8 pixels, then throwing
away a lot of the high-frequency spatial data.
For example, take this photo of our directors dog - Noodle.
So cute!
Let’s look at patch of 8x8 pixels.
Pretty much every pixel is different from its neighbor, making it hard to compress with
loss-less techniques because there’s just a lot going on.
Lots of little details.
But human perception doesn’t register all those details.
So, we can discard a lot of that detail, and replace it with a simplified patch like this.
This maintains the visual essence, but might only use 10% of the data.
We can do this for all the patches in the image and get this result.
You can still see it’s a dog, but the image is rougher.
So, that’s an extreme example, going from a slightly compressed JPEG to a highly compressed
one, one-eighth the original file size.
Often, you can get away with a quality somewhere in between, and perceptually, it’s basically
the same as the original.
The one on the left is one-third the file size of the one on the right.
That’s a big savings for essentially the same thing.
Can you tell the difference between the two?
Probably not, but I should mention that video compression plays a role in that too, since
I’m literally being compressed in a video right now.
Videos are really just long sequences of images, so a lot of what I said about them applies
here too.
But videos can do some extra clever stuff, because between frames, a lot of pixels are
going to be the same.
Like this whole background behind me!
This is called temporal redundancy.
We don’t need to re-transmit those pixels every frame of the video.
We can just copy patches of data forward.
When there are small pixel differences, like the readout on this frequency generator behind
me, most video formats send data that encodes just the difference between patches, which
is more efficient than re-transmitting all the pixels afresh, again taking advantage
of inter-frame similarity.
The fanciest video compression formats go one step further.
They find patches that are similar between frames, and not only copy them forward, with
or without differences, but also can apply simple effects to them, like a shift or rotation.
They can also lighten or darken a patch between frames.
So, if I move my hand side to side like this the video compressor will identify the similarity,
capture my hand in one or more patches, then just move these patches around between frames.
You’re actually seeing my hand from the past… kinda freaky, but it uses a lot less data.
MPEG-4 videos, a common standard, are often 20 to 200 times smaller than the original,
uncompressed file.
However, encoding frames as translations and rotations of patches from previous frames
can go horribly wrong when you compress too heavily, and there isn’t enough space to
update pixel data inside of the patches.
The video player will forge ahead, applying the right motions, even if the patch data
is wrong.
And this leads to some hilarious and trippy effects, which I’m sure you’ve seen.
Overall, it’s extremely useful to have compression techniques for all the types of data I discussed today.
(I guess our imperfect vision and hearing are “useful,” too.)
And it’s important to know about compression because it allows users to store pictures,
music, and videos in efficient ways.
Without it, streaming your favorite Carpool Karaoke videos on YouTube would be nearly
impossible, due to bandwidth and the economics of transmitting that volume of data for free.
And now when your Skype calls sound like they’re being taken over by demons, you’ll know
what’s really going on.
I’ll see you next week.
Hey guys, this week’s episode was brought to you by CuriosityStream which is a streaming
service full of documentaries and nonfiction titles from some really great filmmakers,
including exclusive originals.
Now I normally give computer science recommendations since this is Crash Course Computer Science and all
and Curiosity Stream has a ton of great ones. But you absolutely have to check
out “Miniverse” starring everyone’s favorite space-station-singing-Canadian astronaut,
Chris Hadfield, as he takes a roadtrip across the Solar System scaled down the the size
of the United States.
It’s basically 50 minutes of Chris and his passengers geeking out about our amazing planetary
neighbors and you don’t want to miss it.
So get unlimited access today, and your first two months are free if you sign up at curiositystream.com/crashcourse
and use the promo code "crashcourse" during the sign up process.
Voir Plus de Vidéos Connexes
Lecture 6: Noisy Channel Coding (I): Inference and Information Measures for Noisy Channels
Psychology of Computing: Crash Course Computer Science #38
Why Your Low End Isn't Punchy!
Lecture 1 Video 1: Motivation and the basic problem
Computer Vision: Crash Course Computer Science #35
Advanced CPU Designs: Crash Course Computer Science #9
5.0 / 5 (0 votes)