Representing Numbers and Letters with Binary: Crash Course Computer Science #4
Summary
TLDR本视频深入探讨了计算机如何存储和表示数值数据,以及数学在其中的作用。首先介绍了二进制系统,解释了如何使用二进制位(bit)来表示数字,并通过乘以不同的权重(如1、2、4等)来构建更大的数值。视频还解释了字节(byte)的概念,即8位二进制数,以及更大的数据单位,如千字节(kilobyte)、兆字节(megabyte)和吉字节(gigabyte)。此外,讨论了计算机如何使用首位来表示数字的正负,并利用剩余的位数来存储数值本身。视频还介绍了浮点数和IEEE 754标准,以及如何用数字表示文本,包括ASCII编码和Unicode编码的发展。最后,强调了所有计算机数据,无论是文本、视频还是操作系统,归根结底都是由长序列的1和0组成的二进制数。
Takeaways
- 💡 计算机使用二进制(即0和1)来存储和表示所有类型的数据,这称为位(bit)。
- 🔢 二进制是基于2的数字系统,与我们熟悉的基于10的十进制系统类似,但只使用两个可能的值(0和1)。
- 🎲 通过增加二进制位的数量,可以表示更大的数字。例如,8位二进制数可以表示从0到255的256个不同的值。
- 🖥️ 8位计算机和图形等老旧系统由于只能处理256种不同的值,因此在功能上有所限制。
- 🔍 字节是常见的数据测量单位,等于8位。较大的数据单位包括千字节(KB)、兆字节(MB)和吉字节(GB)。
- 🔣 计算机使用二进制数来表示文本,最初通过ASCII编码,该编码使用7位可以表示128个不同的符号。
- 🌍 Unicode编码系统应对全球多语言需求,使用16位可表示超过一百万个字符,包括所有国际字符和表情符号。
- 🧮 计算机可以处理正数和负数,通常使用第一个二进制位来表示符号(正或负)。
- 📊 浮点数表示非整数值,使用IEEE 754标准来存储这些数值,它类似于科学记数法。
- 🌐 在计算机科学中,不仅仅数字和文本,连声音和图片等多媒体也是通过二进制序列来编码和处理。
Q & A
计算机是如何使用二进制来表示数值数据的?
-计算机使用二进制来表示数值数据,通过将数值转换为二进制形式,即用1和0代替十进制中的0到9。每个二进制位(bit)代表一个数值,例如101在二进制中代表十进制的5。计算机通过增加更多的二进制位来表示更大的数值,类似于十进制数的扩展。
什么是字节,它与二进制数有什么关系?
-字节(byte)是计算机中用来计量存储容量的基本单位,1字节等于8位二进制数(bit)。由于每个二进制位可以表示两种状态(1或0),8位二进制数可以表示2的8次方,即256种不同的状态,这为计算机提供了足够的空间来存储各种信息。
计算机如何表示正负数?
-计算机使用最高位(最左边的位)来表示数值的正负。如果该位是1,表示负数;如果是0,表示正数。对于32位整数,剩余的31位用于表示数值本身,这允许计算机表示大约正负二十亿的数值范围。
什么是浮点数,它在计算机中如何被表示?
-浮点数是用于表示小数或分数的数值,例如12.7或3.14。计算机使用IEEE 754标准来表示浮点数,该标准类似于科学记数法,将数值分为尾数(significand)和指数(exponent)。在32位浮点数中,第1位表示符号,接下来的8位表示指数,剩余的23位表示尾数。
ASCII编码是如何工作的?
-ASCII(美国信息交换标准代码)是一种7位编码系统,可以表示128种不同的值。它能够编码大写字母、小写字母、数字0到9以及一些标点符号和特殊字符。例如,小写字母'a'在ASCII中表示为数字97,大写字母'A'是65。ASCII使得不同计算机系统之间能够交换数据,增强了数据的互操作性。
为什么需要Unicode编码?
-由于ASCII主要设计用于英文,它不能有效地表示其他语言中的字符,尤其是那些拥有成千上万字符的语言,如中文和日文。为了解决这个问题,Unicode在1992年被设计出来,它使用至少16位的编码空间,可以表示超过一百万个字符,涵盖了几乎所有语言的每个字符,包括数学符号和表情符号。
计算机如何存储和处理文本信息?
-计算机使用数字来表示文本信息。例如,ASCII编码通过为每个字母、数字和符号分配一个数字来实现这一点。计算机将这些数字转换为二进制形式,然后存储和处理这些二进制序列。
什么是比特(bit)?
-比特(bit)是二进制数的一个位,是计算机中数据存储的最小单位。每个比特可以表示两种状态:1或0。通过组合多个比特,计算机可以表示更复杂的数据和指令。
计算机中的内存地址为什么需要使用64位?
-随着计算机内存的增长,达到千兆字节(GB)和太字节(TB)的规模,需要更多的位数来唯一地标记内存中的位置。64位内存地址允许计算机访问2的64次方个不同的内存位置,这足以应对当前和未来可预见的内存需求。
为什么说计算机中的所有数据最终都是由1和0组成的?
-计算机的所有数据,无论是文本、图像、音频还是视频,都是通过二进制形式存储的。这是因为计算机的逻辑电路只能理解两种状态:开(1)和关(0)。因此,所有的数据都被转换为一系列的1和0,这些序列随后被计算机的处理器解读和执行。
什么是“mojibake”?
-“Mojibake”是一个日语词汇,意为“乱码”或“混合编码”。它通常用来描述由于字符编码不兼容导致的文本显示问题,比如在使用一种编码系统编写的文本在另一种不兼容的编码系统下打开时出现乱码。
计算机如何表示颜色?
-计算机使用二进制数来表示颜色,这通常涉及到使用特定的位数来表示颜色的红色、绿色和蓝色(RGB)分量。例如,32位颜色图形使用8位来表示红色,8位表示绿色,8位表示蓝色,剩余的8位可以用于表示透明度(Alpha通道)。
Outlines
📚 数字存储与表示基础
Carrie Anne在本节中介绍了计算机如何存储和表示数值数据。她解释了二进制数的工作原理,如何通过增加二进制位来表示更大的数字,类似于十进制系统。她举例说明了如何将二进制数转换为十进制数,并且解释了位(bit)和字节(byte)的概念,以及它们在存储容量单位(如千字节、兆字节、吉字节)中的应用。此外,还讨论了计算机如何使用二进制表示正负整数,以及浮点数的概念和IEEE 754标准。
🌐 文本表示与编码系统
在第二段中,讨论了计算机如何使用数字来表示文本。首先介绍了ASCII编码,它是一种7位编码系统,能够表示128个不同的字符,包括大小写英文字母、数字和一些符号。随后,提到了ASCII的局限性,即它主要设计用于英文,并且扩展到8位可以表示更多的字符,包括用于不同国家语言的字符。然而,随着亚洲语言的计算机使用增加,需要一种能够表示成千上万字符的编码系统。Unicode因此被提出,它使用16位编码,为每种语言的每个字符提供了一个唯一的编码,包括数学符号和表情符号。最后,强调了所有形式的数字信息,包括文本消息、视频和操作系统,都归结为长序列的1和0。
🔍 下周预告:计算的开始
在视频的结尾部分,Carrie Anne预告了下周的内容,即计算机如何开始操作二进制序列,这将是计算的真正开始。她感谢观众的观看,并告知下周再见。
Mindmap
Keywords
💡逻辑门
💡二进制
💡位(bit)
💡字节(byte)
💡浮点数
💡ASCII码
💡Unicode
💡内存地址
💡有符号数和无符号数
💡32位和64位计算机
💡二进制加法
Highlights
计算机如何存储和表示数值数据,涉及到数学的应用。
逻辑门使用晶体管来评估布尔语句,布尔代数中只有真和假两个二进制值。
通过添加更多的二进制位来表示更大的数值,类似于我们熟悉的十进制系统。
十进制系统中,每个数位有10个可能的值,而二进制系统是基数为2,每个数位只有1和0两个可能的值。
二进制数101表示1个4、0个2和1个1,总和为5。
二进制数10110111转换为十进制数是183。
二进制加法与十进制加法类似,通过逐位相加并进位来完成。
每个二进制位(1或0)称为一个“比特”(bit)。
8位二进制数的范围从0到255,这被称为一个字节(byte)。
数据大小的单位如千字节(KB)、兆字节(MB)和吉字节(GB)基于字节的倍数。
32位计算机可以处理的数值范围接近43亿,而64位计算机可以表示约9.2千万亿。
计算机使用第一位来表示数字的符号,0表示正数,1表示负数。
计算机必须为其内存中的位置(地址)标记标签,以便存储和检索值。
浮点数可以表示小数,如12.7和3.14,IEEE 754标准是最常见的表示方法。
ASCII是一种7位编码系统,能够存储128个不同的值,包括大小写字母、数字和符号。
Unicode是1992年创建的,用以统一不同的国际编码方案,使用16位编码超过一百万的字符。
文件格式如MP3或GIF使用二进制数来编码声音或像素颜色。
所有文本消息、视频、网页和操作系统在底层都是由长序列的1和0组成的。
Transcripts
Hi I’m Carrie Anne, this is Crash Course Computer Science
and today we’re going to talk about how computers store and represent numerical data.
Which means we’ve got to talk about Math!
But don’t worry.
Every single one of you already knows exactly what you need to know to follow along.
So, last episode we talked about how transistors can be used to build logic gates, which can
evaluate boolean statements.
And in boolean algebra, there are only two, binary values: true and false.
But if we only have two values, how in the world do we represent information beyond just
these two values?
That’s where the Math comes in.
INTRO
So, as we mentioned last episode, a single binary value can be used to represent a number.
Instead of true and false, we can call these two states 1 and 0 which is actually incredibly useful.
And if we want to represent larger things we just need to add more binary digits.
This works exactly the same way as the decimal numbers that we’re all familiar with.
With decimal numbers there are "only" 10 possible values a single digit can be; 0 through 9,
and to get numbers larger than 9 we just start adding more digits to the front.
We can do the same with binary.
For example, let’s take the number two hundred and sixty three.
What does this number actually represent?
Well, it means we’ve got 2 one-hundreds, 6 tens, and 3 ones.
If you add those all together, we’ve got 263.
Notice how each column has a different multiplier.
In this case, it’s 100, 10, and 1.
Each multiplier is ten times larger than the one to the right.
That's because each column has ten possible digits to work with, 0 through 9, after which
you have to carry one to the next column.
For this reason, it’s called base-ten notation, also called decimal since deci means ten.
AND Binary works exactly the same way, it’s just base-two.
That’s because there are only two possible digits in binary – 1 and 0.
This means that each multiplier has to be two times larger than the column to its right.
Instead of hundreds, tens, and ones, we now have fours, twos and ones.
Take for example the binary number: 101.
This means we have 1 four, 0 twos, and 1 one.
Add those all together and we’ve got the number 5 in base ten.
But to represent larger numbers, binary needs a lot more digits.
Take this number in binary 10110111.
We can convert it to decimal in the same way.
We have 1 x 128, 0 x 64, 1 x 32, 1 x 16, 0 x 8, 1 x 4, 1 x 2, and 1 x 1.
Which all adds up to 183.
Math with binary numbers isn’t hard either.
Take for example decimal addition of 183 plus 19.
First we add 3 + 9, that’s 12, so we put 2 as the sum and carry 1 to the ten’s column.
Now we add 8 plus 1 plus the 1 we carried, thats 10, so the sum is 0 carry 1.
Finally we add 1 plus the 1 we carried, which equals 2.
So the total sum is 202.
Here’s the same sum but in binary.
Just as before, we start with the ones column.
Adding 1+1 results in 2, even in binary.
But, there is no symbol "2" so we use 10 and put 0 as our sum and carry the 1.
Just like in our decimal example.
1 plus 1, plus the 1 carried, equals 3 or 11 in binary, so we put the sum as 1 and we
carry 1 again, and so on.
We end up with 11001010, which is the same as the number 202 in base ten.
Each of these binary digits, 1 or 0, is called a “bit”.
So in these last few examples, we were using 8-bit numbers with their lowest value of zero
and highest value is 255, which requires all 8 bits to be set to 1.
Thats 256 different values, or 2 to the 8th power.
You might have heard of 8-bit computers, or 8-bit graphics or audio.
These were computers that did most of their operations in chunks of 8 bits.
But 256 different values isn’t a lot to work with, so it meant things like 8-bit games
were limited to 256 different colors for their graphics.
And 8-bits is such a common size in computing, it has a special word: a byte.
A byte is 8 bits.
If you’ve got 10 bytes, it means you’ve really got 80 bits.
You’ve heard of kilobytes, megabytes, gigabytes and so on.
These prefixes denote different scales of data.
Just like one kilogram is a thousand grams, 1 kilobyte is a thousand bytes…. or really
8000 bits.
Mega is a million bytes (MB), and giga is a billion bytes (GB).
Today you might even have a hard drive that has 1 terabyte (TB) of storage.
That's 8 trillion ones and zeros.
But hold on!
That’s not always true.
In binary, a kilobyte has two to the power of 10 bytes, or 1024.
1000 is also right when talking about kilobytes, but we should acknowledge it isn’t the only
correct definition.
You’ve probably also heard the term 32-bit or 64-bit computers – you’re almost certainly
using one right now.
What this means is that they operate in chunks of 32 or 64 bits.
That’s a lot of bits!
The largest number you can represent with 32 bits is just under 4.3 billion.
Which is thirty-two 1's in binary.
This is why our Instagram photos are so smooth and pretty – they are composed of millions
of colors, because computers today use 32-bit color graphics
Of course, not everything is a positive number - like my bank account in college.
So we need a way to represent positive and negative numbers.
Most computers use the first bit for the sign: 1 for negative, 0 for positive numbers, and
then use the remaining 31 bits for the number itself.
That gives us a range of roughly plus or minus two billion.
While this is a pretty big range of numbers, it’s not enough for many tasks.
There are 7 billion people on the earth, and the US national debt is almost 20 trillion dollars after all.
This is why 64-bit numbers are useful.
The largest value a 64-bit number can represent is around 9.2 quintillion!
That’s a lot of possible numbers and will hopefully stay above the US national debt for a while!
Most importantly, as we’ll discuss in a later episode, computers must label locations
in their memory, known as addresses, in order to store and retrieve values.
As computer memory has grown to gigabytes and terabytes – that’s trillions of bytes
– it was necessary to have 64-bit memory addresses as well.
In addition to negative and positive numbers, computers must deal with numbers that are
not whole numbers, like 12.7 and 3.14, or maybe even stardate: 43989.1.
These are called “floating point” numbers, because the decimal point can float around
in the middle of number.
Several methods have been developed to represent floating point numbers.
The most common of which is the IEEE 754 standard.
And you thought historians were the only people bad at naming things!
In essence, this standard stores decimal values sort of like scientific notation.
For example, 625.9 can be written as 0.6259 x 10^3.
There are two important numbers here: the .6259 is called the significand.
And 3 is the exponent.
In a 32-bit floating point number, the first bit is used for the sign of the number -- positive
or negative.
The next 8 bits are used to store the exponent and the remaining 23 bits are used to store
the significand.
Ok, we’ve talked a lot about numbers, but your name is probably composed of letters,
so it’s really useful for computers to also have a way to represent text.
However, rather than have a special form of storage for letters,
computers simply use numbers to represent letters.
The most straightforward approach might be to simply number the letters of the alphabet:
A being 1, B being 2, C 3, and so on.
In fact, Francis Bacon, the famous English writer, used five-bit sequences to encode
all 26 letters of the English alphabet to send secret messages back in the 1600s.
And five bits can store 32 possible values – so that’s enough for the 26 letters,
but not enough for punctuation, digits, and upper and lower case letters.
Enter ASCII, the American Standard Code for Information Interchange.
Invented in 1963, ASCII was a 7-bit code, enough to store 128 different values.
With this expanded range, it could encode capital letters, lowercase letters, digits
0 through 9, and symbols like the @ sign and punctuation marks.
For example, a lowercase ‘a’ is represented by the number 97, while a capital ‘A’ is 65.
A colon is 58 and a closed parenthesis is 41.
ASCII even had a selection of special command codes, such as a newline character to tell
the computer where to wrap a line to the next row.
In older computer systems, the line of text would literally continue off the edge of the
screen if you didn’t include a new line character!
Because ASCII was such an early standard, it became widely used, and critically, allowed
different computers built by different companies to exchange data.
This ability to universally exchange information is called “interoperability”.
However, it did have a major limitation: it was really only designed for English.
Fortunately, there are 8 bits in a byte, not 7, and it soon became popular to use codes
128 through 255, previously unused, for "national" characters.
In the US, those extra numbers were largely used to encode additional symbols, like mathematical
notation, graphical elements, and common accented characters.
On the other hand, while the Latin characters were used universally, Russian computers used
the extra codes to encode Cyrillic characters, and Greek computers, Greek letters, and so on.
And national character codes worked pretty well for most countries.
The problem was, if you opened an email written in Latvian on a Turkish computer, the result
was completely incomprehensible.
And things totally broke with the rise of computing in Asia, as languages like Chinese and Japanese
have thousands of characters.
There was no way to encode all those characters in 8-bits!
In response, each country invented multi-byte encoding schemes, all of which were mutually incompatible.
The Japanese were so familiar with this encoding problem that they had a special name for it:
"mojibake", which means "scrambled text".
And so it was born – Unicode – one format to rule them all.
Devised in 1992 to finally do away with all of the different international schemes
it replaced them with one universal encoding scheme.
The most common version of Unicode uses 16 bits with space for over a million codes -
enough for every single character from every language ever used –
more than 120,000 of them in over 100 types of script
plus space for mathematical symbols and even graphical characters like Emoji.
And in the same way that ASCII defines a scheme for encoding letters as binary numbers,
other file formats – like MP3s or GIFs – use
binary numbers to encode sounds or colors of a pixel in our photos, movies, and music.
Most importantly, under the hood it all comes down to long sequences of bits.
Text messages, this YouTube video, every webpage on the internet, and even your computer’s
operating system, are nothing but long sequences of 1s and 0s.
So next week, we’ll start talking about how your computer starts manipulating those
binary sequences, for our first true taste of computation.
Thanks for watching. See you next week.
5.0 / 5 (0 votes)