Representing Numbers and Letters with Binary: Crash Course Computer Science #4

CrashCourse
15 Mar 201710:45

Summary

TLDR本视频深入探讨了计算机如何存储和表示数值数据,以及数学在其中的作用。首先介绍了二进制系统,解释了如何使用二进制位(bit)来表示数字,并通过乘以不同的权重(如1、2、4等)来构建更大的数值。视频还解释了字节(byte)的概念,即8位二进制数,以及更大的数据单位,如千字节(kilobyte)、兆字节(megabyte)和吉字节(gigabyte)。此外,讨论了计算机如何使用首位来表示数字的正负,并利用剩余的位数来存储数值本身。视频还介绍了浮点数和IEEE 754标准,以及如何用数字表示文本,包括ASCII编码和Unicode编码的发展。最后,强调了所有计算机数据,无论是文本、视频还是操作系统,归根结底都是由长序列的1和0组成的二进制数。

Takeaways

  • 💡 计算机使用二进制(即0和1)来存储和表示所有类型的数据,这称为位(bit)。
  • 🔢 二进制是基于2的数字系统,与我们熟悉的基于10的十进制系统类似,但只使用两个可能的值(0和1)。
  • 🎲 通过增加二进制位的数量,可以表示更大的数字。例如,8位二进制数可以表示从0到255的256个不同的值。
  • 🖥️ 8位计算机和图形等老旧系统由于只能处理256种不同的值,因此在功能上有所限制。
  • 🔍 字节是常见的数据测量单位,等于8位。较大的数据单位包括千字节(KB)、兆字节(MB)和吉字节(GB)。
  • 🔣 计算机使用二进制数来表示文本,最初通过ASCII编码,该编码使用7位可以表示128个不同的符号。
  • 🌍 Unicode编码系统应对全球多语言需求,使用16位可表示超过一百万个字符,包括所有国际字符和表情符号。
  • 🧮 计算机可以处理正数和负数,通常使用第一个二进制位来表示符号(正或负)。
  • 📊 浮点数表示非整数值,使用IEEE 754标准来存储这些数值,它类似于科学记数法。
  • 🌐 在计算机科学中,不仅仅数字和文本,连声音和图片等多媒体也是通过二进制序列来编码和处理。

Q & A

  • 计算机是如何使用二进制来表示数值数据的?

    -计算机使用二进制来表示数值数据,通过将数值转换为二进制形式,即用1和0代替十进制中的0到9。每个二进制位(bit)代表一个数值,例如101在二进制中代表十进制的5。计算机通过增加更多的二进制位来表示更大的数值,类似于十进制数的扩展。

  • 什么是字节,它与二进制数有什么关系?

    -字节(byte)是计算机中用来计量存储容量的基本单位,1字节等于8位二进制数(bit)。由于每个二进制位可以表示两种状态(1或0),8位二进制数可以表示2的8次方,即256种不同的状态,这为计算机提供了足够的空间来存储各种信息。

  • 计算机如何表示正负数?

    -计算机使用最高位(最左边的位)来表示数值的正负。如果该位是1,表示负数;如果是0,表示正数。对于32位整数,剩余的31位用于表示数值本身,这允许计算机表示大约正负二十亿的数值范围。

  • 什么是浮点数,它在计算机中如何被表示?

    -浮点数是用于表示小数或分数的数值,例如12.7或3.14。计算机使用IEEE 754标准来表示浮点数,该标准类似于科学记数法,将数值分为尾数(significand)和指数(exponent)。在32位浮点数中,第1位表示符号,接下来的8位表示指数,剩余的23位表示尾数。

  • ASCII编码是如何工作的?

    -ASCII(美国信息交换标准代码)是一种7位编码系统,可以表示128种不同的值。它能够编码大写字母、小写字母、数字0到9以及一些标点符号和特殊字符。例如,小写字母'a'在ASCII中表示为数字97,大写字母'A'是65。ASCII使得不同计算机系统之间能够交换数据,增强了数据的互操作性。

  • 为什么需要Unicode编码?

    -由于ASCII主要设计用于英文,它不能有效地表示其他语言中的字符,尤其是那些拥有成千上万字符的语言,如中文和日文。为了解决这个问题,Unicode在1992年被设计出来,它使用至少16位的编码空间,可以表示超过一百万个字符,涵盖了几乎所有语言的每个字符,包括数学符号和表情符号。

  • 计算机如何存储和处理文本信息?

    -计算机使用数字来表示文本信息。例如,ASCII编码通过为每个字母、数字和符号分配一个数字来实现这一点。计算机将这些数字转换为二进制形式,然后存储和处理这些二进制序列。

  • 什么是比特(bit)?

    -比特(bit)是二进制数的一个位,是计算机中数据存储的最小单位。每个比特可以表示两种状态:1或0。通过组合多个比特,计算机可以表示更复杂的数据和指令。

  • 计算机中的内存地址为什么需要使用64位?

    -随着计算机内存的增长,达到千兆字节(GB)和太字节(TB)的规模,需要更多的位数来唯一地标记内存中的位置。64位内存地址允许计算机访问2的64次方个不同的内存位置,这足以应对当前和未来可预见的内存需求。

  • 为什么说计算机中的所有数据最终都是由1和0组成的?

    -计算机的所有数据,无论是文本、图像、音频还是视频,都是通过二进制形式存储的。这是因为计算机的逻辑电路只能理解两种状态:开(1)和关(0)。因此,所有的数据都被转换为一系列的1和0,这些序列随后被计算机的处理器解读和执行。

  • 什么是“mojibake”?

    -“Mojibake”是一个日语词汇,意为“乱码”或“混合编码”。它通常用来描述由于字符编码不兼容导致的文本显示问题,比如在使用一种编码系统编写的文本在另一种不兼容的编码系统下打开时出现乱码。

  • 计算机如何表示颜色?

    -计算机使用二进制数来表示颜色,这通常涉及到使用特定的位数来表示颜色的红色、绿色和蓝色(RGB)分量。例如,32位颜色图形使用8位来表示红色,8位表示绿色,8位表示蓝色,剩余的8位可以用于表示透明度(Alpha通道)。

Outlines

00:00

📚 数字存储与表示基础

Carrie Anne在本节中介绍了计算机如何存储和表示数值数据。她解释了二进制数的工作原理,如何通过增加二进制位来表示更大的数字,类似于十进制系统。她举例说明了如何将二进制数转换为十进制数,并且解释了位(bit)和字节(byte)的概念,以及它们在存储容量单位(如千字节、兆字节、吉字节)中的应用。此外,还讨论了计算机如何使用二进制表示正负整数,以及浮点数的概念和IEEE 754标准。

05:00

🌐 文本表示与编码系统

在第二段中,讨论了计算机如何使用数字来表示文本。首先介绍了ASCII编码,它是一种7位编码系统,能够表示128个不同的字符,包括大小写英文字母、数字和一些符号。随后,提到了ASCII的局限性,即它主要设计用于英文,并且扩展到8位可以表示更多的字符,包括用于不同国家语言的字符。然而,随着亚洲语言的计算机使用增加,需要一种能够表示成千上万字符的编码系统。Unicode因此被提出,它使用16位编码,为每种语言的每个字符提供了一个唯一的编码,包括数学符号和表情符号。最后,强调了所有形式的数字信息,包括文本消息、视频和操作系统,都归结为长序列的1和0。

10:04

🔍 下周预告:计算的开始

在视频的结尾部分,Carrie Anne预告了下周的内容,即计算机如何开始操作二进制序列,这将是计算的真正开始。她感谢观众的观看,并告知下周再见。

Mindmap

Keywords

💡逻辑门

逻辑门是实现布尔代数运算的基本电子设备,它们可以处理二进制值:真和假。在视频中,逻辑门被用来构建计算机科学的基础,因为它们是构建更复杂计算系统的关键组件。逻辑门的组合可以执行更复杂的数学运算和数据存储。

💡二进制

二进制是一种数学系统,仅使用两个数字:1和0。在视频中,二进制被用来表示数字和数据,因为计算机使用二进制系统来存储和处理信息。二进制系统的核心在于通过增加位数来表示更大的数值,类似于十进制系统。

💡位(bit)

位是计算机科学中的一个基本概念,代表一个二进制数字,即1或0。视频中提到,每个二进制数字称为一个“位”,它们是计算机存储和处理数据的基础。例如,8位的数字可以表示从0到255的256个不同的值。

💡字节(byte)

字节是计算机存储的基本单位,由8个位组成。视频中解释了字节的概念,并指出1千字节(kilobyte)等于1000个字节,或者8000个位。字节是衡量数据大小的常用单位,如千字节(KB)、兆字节(MB)、吉字节(GB)和太字节(TB)。

💡浮点数

浮点数是一种用于表示小数或分数的数学形式,其特点是小数点可以浮动。视频中提到,浮点数在计算机中非常重要,因为它们允许计算机处理非整数,如12.7和3.14。IEEE 754标准是一种常用的浮点数表示方法,它类似于科学记数法。

💡ASCII码

ASCII(美国信息交换标准代码)是一种字符编码标准,用于将字母、数字和符号转换为7位二进制数。视频中提到,ASCII可以编码128个不同的字符,包括大写字母、小写字母、数字和一些符号。ASCII的出现极大地促进了不同计算机系统之间的数据交换。

💡Unicode

Unicode是一种国际标准,用于为世界上所有书写系统中的大多数字符提供一个唯一的编码。视频中解释说,Unicode使用16位编码,可以表示超过一百万个字符,包括所有语言中的字符、数学符号和表情符号。Unicode的目的是解决不同国家和语言之间的字符编码问题。

💡内存地址

内存地址是计算机内存中用于标识特定位置的数字标签。视频中指出,随着计算机内存的增长,使用64位内存地址变得必要,以标记和访问数万亿字节的数据。内存地址对于计算机存储和检索数据至关重要。

💡有符号数和无符号数

有符号数和无符号数是计算机中用于表示整数的两种类型。有符号数使用位模式的一部分来表示正负,而无符号数则使用所有位来表示数值。视频中提到,大多数计算机使用最高位(第一位)来表示符号:1代表负数,0代表正数。

💡32位和64位计算机

32位和64位计算机指的是它们的处理器一次可以处理的数据位数。视频中解释说,32位计算机可以处理的最大数值接近43亿,而64位计算机可以处理的最大数值约为9.2千万亿。64位计算机的出现是为了处理更大的数据量和更复杂的计算任务。

💡二进制加法

二进制加法是计算机中执行的基本算术运算之一,它遵循与十进制加法相同的原则,但只使用两个数字1和0。视频中通过将二进制数101(十进制中的5)与101(十进制中的3)相加得到11001010(十进制中的202)的例子,展示了二进制加法的过程。

Highlights

计算机如何存储和表示数值数据,涉及到数学的应用。

逻辑门使用晶体管来评估布尔语句,布尔代数中只有真和假两个二进制值。

通过添加更多的二进制位来表示更大的数值,类似于我们熟悉的十进制系统。

十进制系统中,每个数位有10个可能的值,而二进制系统是基数为2,每个数位只有1和0两个可能的值。

二进制数101表示1个4、0个2和1个1,总和为5。

二进制数10110111转换为十进制数是183。

二进制加法与十进制加法类似,通过逐位相加并进位来完成。

每个二进制位(1或0)称为一个“比特”(bit)。

8位二进制数的范围从0到255,这被称为一个字节(byte)。

数据大小的单位如千字节(KB)、兆字节(MB)和吉字节(GB)基于字节的倍数。

32位计算机可以处理的数值范围接近43亿,而64位计算机可以表示约9.2千万亿。

计算机使用第一位来表示数字的符号,0表示正数,1表示负数。

计算机必须为其内存中的位置(地址)标记标签,以便存储和检索值。

浮点数可以表示小数,如12.7和3.14,IEEE 754标准是最常见的表示方法。

ASCII是一种7位编码系统,能够存储128个不同的值,包括大小写字母、数字和符号。

Unicode是1992年创建的,用以统一不同的国际编码方案,使用16位编码超过一百万的字符。

文件格式如MP3或GIF使用二进制数来编码声音或像素颜色。

所有文本消息、视频、网页和操作系统在底层都是由长序列的1和0组成的。

Transcripts

play00:03

Hi I’m Carrie Anne, this is Crash Course Computer Science

play00:06

and today we’re going to talk about how computers store and represent numerical data.

play00:10

Which means we’ve got to talk about Math!

play00:12

But don’t worry.

play00:13

Every single one of you already knows exactly what you need to know to follow along.

play00:16

So, last episode we talked about how transistors can be used to build logic gates, which can

play00:20

evaluate boolean statements.

play00:22

And in boolean algebra, there are only two, binary values: true and false.

play00:26

But if we only have two values, how in the world do we represent information beyond just

play00:29

these two values?

play00:30

That’s where the Math comes in.

play00:32

INTRO

play00:41

So, as we mentioned last episode, a single binary value can be used to represent a number.

play00:46

Instead of true and false, we can call these two states 1 and 0 which is actually incredibly useful.

play00:52

And if we want to represent larger things we just need to add more binary digits.

play00:55

This works exactly the same way as the decimal numbers that we’re all familiar with.

play00:59

With decimal numbers there are "only" 10 possible values a single digit can be; 0 through 9,

play01:04

and to get numbers larger than 9 we just start adding more digits to the front.

play01:07

We can do the same with binary.

play01:09

For example, let’s take the number two hundred and sixty three.

play01:12

What does this number actually represent?

play01:14

Well, it means we’ve got 2 one-hundreds, 6 tens, and 3 ones.

play01:18

If you add those all together, we’ve got 263.

play01:21

Notice how each column has a different multiplier.

play01:24

In this case, it’s 100, 10, and 1.

play01:27

Each multiplier is ten times larger than the one to the right.

play01:30

That's because each column has ten possible digits to work with, 0 through 9, after which

play01:35

you have to carry one to the next column.

play01:37

For this reason, it’s called base-ten notation, also called decimal since deci means ten.

play01:42

AND Binary works exactly the same way, it’s just base-two.

play01:45

That’s because there are only two possible digits in binary – 1 and 0.

play01:49

This means that each multiplier has to be two times larger than the column to its right.

play01:53

Instead of hundreds, tens, and ones, we now have fours, twos and ones.

play01:57

Take for example the binary number: 101.

play02:00

This means we have 1 four, 0 twos, and 1 one.

play02:04

Add those all together and we’ve got the number 5 in base ten.

play02:07

But to represent larger numbers, binary needs a lot more digits.

play02:10

Take this number in binary 10110111.

play02:12

We can convert it to decimal in the same way.

play02:14

We have 1 x 128, 0 x 64, 1 x 32, 1 x 16, 0 x 8, 1 x 4, 1 x 2, and 1 x 1.

play02:24

Which all adds up to 183.

play02:26

Math with binary numbers isn’t hard either.

play02:29

Take for example decimal addition of 183 plus 19.

play02:32

First we add 3 + 9, that’s 12, so we put 2 as the sum and carry 1 to the ten’s column.

play02:37

Now we add 8 plus 1 plus the 1 we carried, thats 10, so the sum is 0 carry 1.

play02:43

Finally we add 1 plus the 1 we carried, which equals 2.

play02:46

So the total sum is 202.

play02:48

Here’s the same sum but in binary.

play02:50

Just as before, we start with the ones column.

play02:52

Adding 1+1 results in 2, even in binary.

play02:55

But, there is no symbol "2" so we use 10 and put 0 as our sum and carry the 1.

play03:00

Just like in our decimal example.

play03:02

1 plus 1, plus the 1 carried, equals 3 or 11 in binary, so we put the sum as 1 and we

play03:08

carry 1 again, and so on.

play03:09

We end up with 11001010, which is the same as the number 202 in base ten.

play03:14

Each of these binary digits, 1 or 0, is called a “bit”.

play03:17

So in these last few examples, we were using 8-bit numbers with their lowest value of zero

play03:21

and highest value is 255, which requires all 8 bits to be set to 1.

play03:26

Thats 256 different values, or 2 to the 8th power.

play03:30

You might have heard of 8-bit computers, or 8-bit graphics or audio.

play03:34

These were computers that did most of their operations in chunks of 8 bits.

play03:38

But 256 different values isn’t a lot to work with, so it meant things like 8-bit games

play03:42

were limited to 256 different colors for their graphics.

play03:46

And 8-bits is such a common size in computing, it has a special word: a byte.

play03:51

A byte is 8 bits.

play03:53

If you’ve got 10 bytes, it means you’ve really got 80 bits.

play03:55

You’ve heard of kilobytes, megabytes, gigabytes and so on.

play03:58

These prefixes denote different scales of data.

play04:01

Just like one kilogram is a thousand grams, 1 kilobyte is a thousand bytes…. or really

play04:06

8000 bits.

play04:07

Mega is a million bytes (MB), and giga is a billion bytes (GB).

play04:11

Today you might even have a hard drive that has 1 terabyte (TB) of storage.

play04:15

That's 8 trillion ones and zeros.

play04:17

But hold on!

play04:18

That’s not always true.

play04:19

In binary, a kilobyte has two to the power of 10 bytes, or 1024.

play04:24

1000 is also right when talking about kilobytes, but we should acknowledge it isn’t the only

play04:28

correct definition.

play04:29

You’ve probably also heard the term 32-bit or 64-bit computers – you’re almost certainly

play04:34

using one right now.

play04:35

What this means is that they operate in chunks of 32 or 64 bits.

play04:39

That’s a lot of bits!

play04:40

The largest number you can represent with 32 bits is just under 4.3 billion.

play04:45

Which is thirty-two 1's in binary.

play04:47

This is why our Instagram photos are so smooth and pretty – they are composed of millions

play04:52

of colors, because computers today use 32-bit color graphics

play04:56

Of course, not everything is a positive number - like my bank account in college.

play05:00

So we need a way to represent positive and negative numbers.

play05:03

Most computers use the first bit for the sign: 1 for negative, 0 for positive numbers, and

play05:08

then use the remaining 31 bits for the number itself.

play05:11

That gives us a range of roughly plus or minus two billion.

play05:14

While this is a pretty big range of numbers, it’s not enough for many tasks.

play05:17

There are 7 billion people on the earth, and the US national debt is almost 20 trillion dollars after all.

play05:23

This is why 64-bit numbers are useful.

play05:25

The largest value a 64-bit number can represent is around 9.2 quintillion!

play05:30

That’s a lot of possible numbers and will hopefully stay above the US national debt for a while!

play05:34

Most importantly, as we’ll discuss in a later episode, computers must label locations

play05:38

in their memory, known as addresses, in order to store and retrieve values.

play05:43

As computer memory has grown to gigabytes and terabytes – that’s trillions of bytes

play05:47

– it was necessary to have 64-bit memory addresses as well.

play05:50

In addition to negative and positive numbers, computers must deal with numbers that are

play05:53

not whole numbers, like 12.7 and 3.14, or maybe even stardate: 43989.1.

play06:00

These are called “floating point” numbers, because the decimal point can float around

play06:04

in the middle of number.

play06:05

Several methods have been developed to represent floating point numbers.

play06:08

The most common of which is the IEEE 754 standard.

play06:11

And you thought historians were the only people bad at naming things!

play06:14

In essence, this standard stores decimal values sort of like scientific notation.

play06:19

For example, 625.9 can be written as 0.6259 x 10^3.

play06:25

There are two important numbers here: the .6259 is called the significand.

play06:30

And 3 is the exponent.

play06:31

In a 32-bit floating point number, the first bit is used for the sign of the number -- positive

play06:35

or negative.

play06:36

The next 8 bits are used to store the exponent and the remaining 23 bits are used to store

play06:41

the significand.

play06:42

Ok, we’ve talked a lot about numbers, but your name is probably composed of letters,

play06:46

so it’s really useful for computers to also have a way to represent text.

play06:49

However, rather than have a special form of storage for letters,

play06:53

computers simply use numbers to represent letters.

play06:56

The most straightforward approach might be to simply number the letters of the alphabet:

play06:59

A being 1, B being 2, C 3, and so on.

play07:02

In fact, Francis Bacon, the famous English writer, used five-bit sequences to encode

play07:07

all 26 letters of the English alphabet to send secret messages back in the 1600s.

play07:11

And five bits can store 32 possible values – so that’s enough for the 26 letters,

play07:16

but not enough for punctuation, digits, and upper and lower case letters.

play07:20

Enter ASCII, the American Standard Code for Information Interchange.

play07:24

Invented in 1963, ASCII was a 7-bit code, enough to store 128 different values.

play07:29

With this expanded range, it could encode capital letters, lowercase letters, digits

play07:33

0 through 9, and symbols like the @ sign and punctuation marks.

play07:37

For example, a lowercase ‘a’ is represented by the number 97, while a capital ‘A’ is 65.

play07:42

A colon is 58 and a closed parenthesis is 41.

play07:45

ASCII even had a selection of special command codes, such as a newline character to tell

play07:49

the computer where to wrap a line to the next row.

play07:52

In older computer systems, the line of text would literally continue off the edge of the

play07:56

screen if you didn’t include a new line character!

play07:59

Because ASCII was such an early standard, it became widely used, and critically, allowed

play08:02

different computers built by different companies to exchange data.

play08:06

This ability to universally exchange information is called “interoperability”.

play08:10

However, it did have a major limitation: it was really only designed for English.

play08:15

Fortunately, there are 8 bits in a byte, not 7, and it soon became popular to use codes

play08:19

128 through 255, previously unused, for "national" characters.

play08:25

In the US, those extra numbers were largely used to encode additional symbols, like mathematical

play08:30

notation, graphical elements, and common accented characters.

play08:33

On the other hand, while the Latin characters were used universally, Russian computers used

play08:37

the extra codes to encode Cyrillic characters, and Greek computers, Greek letters, and so on.

play08:42

And national character codes worked pretty well for most countries.

play08:45

The problem was, if you opened an email written in Latvian on a Turkish computer, the result

play08:49

was completely incomprehensible.

play08:51

And things totally broke with the rise of computing in Asia, as languages like Chinese and Japanese

play08:56

have thousands of characters.

play08:58

There was no way to encode all those characters in 8-bits!

play09:00

In response, each country invented multi-byte encoding schemes, all of which were mutually incompatible.

play09:06

The Japanese were so familiar with this encoding problem that they had a special name for it:

play09:11

"mojibake", which means "scrambled text".

play09:13

And so it was born – Unicode – one format to rule them all.

play09:17

Devised in 1992 to finally do away with all of the different international schemes

play09:21

it replaced them with one universal encoding scheme.

play09:23

The most common version of Unicode uses 16 bits with space for over a million codes -

play09:28

enough for every single character from every language ever used –

play09:32

more than 120,000 of them in over 100 types of script

play09:36

plus space for mathematical symbols and even graphical characters like Emoji.

play09:40

And in the same way that ASCII defines a scheme for encoding letters as binary numbers,

play09:43

other file formats – like MP3s or GIFs – use

play09:46

binary numbers to encode sounds or colors of a pixel in our photos, movies, and music.

play09:50

Most importantly, under the hood it all comes down to long sequences of bits.

play09:55

Text messages, this YouTube video, every webpage on the internet, and even your computer’s

play10:00

operating system, are nothing but long sequences of 1s and 0s.

play10:03

So next week, we’ll start talking about how your computer starts manipulating those

play10:07

binary sequences, for our first true taste of computation.

play10:10

Thanks for watching. See you next week.

Rate This

5.0 / 5 (0 votes)

相关标签
计算机科学数据存储二进制字节浮点数文本编码ASCIIUnicodeCrash Course基础数学信息表示