Representing Numbers and Letters with Binary: Crash Course Computer Science #4

CrashCourse
15 Mar 201710:45

Summary

TLDRThis Crash Course Computer Science episode delves into how computers store and represent numerical data, introducing binary and its application in computing. It explains the concept of bits and bytes, the significance of 8-bit and 64-bit computing, and how binary is used to represent both positive and negative numbers. The video also covers floating-point numbers, the ASCII and Unicode systems for encoding text, and the importance of binary in various file formats. It sets the stage for understanding computation and the manipulation of binary data.

Takeaways

  • 😲 Computers use binary digits (bits) to represent data, with two states: 1 and 0, similar to true and false in boolean algebra.
  • 📚 Binary numbers work on the base-2 system, where each digit represents a power of 2, allowing for the representation of larger numbers with more digits.
  • 🔢 The script explains how to convert binary numbers to decimal and vice versa, demonstrating basic binary addition.
  • 💾 A byte is defined as 8 bits, and larger units like kilobytes, megabytes, and gigabytes are powers of 1024 bytes, not the traditional 1000.
  • 🔄 The first bit in a binary number often represents the sign of a number, with 0 for positive and 1 for negative, in a system that can handle both positive and negative integers.
  • 🌐 32-bit and 64-bit computers operate in chunks of 32 or 64 bits, respectively, which significantly increases the range of numbers they can represent.
  • 🌈 Computers use 32-bit color graphics to display a wide range of colors, which is why images can be so smooth and detailed.
  • 📈 Floating point numbers, like 12.7 or 3.14, are represented using methods such as the IEEE 754 standard, which is similar to scientific notation.
  • 🔤 ASCII is a 7-bit code that can represent 128 different values, including letters, digits, and symbols, enabling basic text representation in computing.
  • 🌍 Unicode was created in 1992 to provide a universal encoding scheme for text in any language, using 16 bits to cover over a million codes.
  • 🎥 All digital media, including text, images, audio, and video, are ultimately composed of long sequences of binary 1s and 0s.

Q & A

  • How do computers store and represent numerical data?

    -Computers store and represent numerical data using binary digits, or bits, which can be either 1 or 0. They use a base-two system, similar to how the decimal system uses base-ten, but with powers of two instead of ten.

  • What is a binary value and how is it useful?

    -A binary value is a single digit in the binary number system, which can either be 1 or 0. It is useful because it allows computers to represent numbers and information in a simple form that can be easily manipulated by electronic components.

  • How does the binary number system represent larger numbers than just 1 and 0?

    -To represent larger numbers, binary uses multiple binary digits. Each digit represents a power of two, with each position to the left representing a larger power, similar to how the decimal system uses powers of ten.

  • What is the significance of the number 256 in the context of binary numbers?

    -The number 256 is significant because it represents the total number of different values that can be represented using 8 bits, which is 2 to the power of 8 (2^8). This is also the size of a byte in computing.

  • What is a bit and how does it relate to bytes and kilobytes?

    -A bit is a binary digit, which can be either 1 or 0. A byte is 8 bits, and kilobyte is 1000 bytes or, in binary terms, 1024 bytes, which is 2 to the power of 10 (2^10) bytes.

  • How do computers represent positive and negative numbers?

    -Most computers use the two's complement system, where the first bit is used for the sign of the number, with 1 representing negative and 0 representing positive. The remaining bits are used to store the magnitude of the number.

  • What is the range of numbers that can be represented with 32-bit and 64-bit systems?

    -With 32 bits, the largest number that can be represented is just under 4.3 billion, while a 64-bit system can represent numbers up to around 9.2 quintillion.

  • What is the IEEE 754 standard and why is it used?

    -The IEEE 754 standard is a method used to represent floating-point numbers in computers. It stores decimal values in a format similar to scientific notation, with a sign bit, exponent, and significand (or mantissa), allowing for efficient representation of both very large and very small numbers.

  • How does ASCII represent text?

    -ASCII (American Standard Code for Information Interchange) represents text by assigning a unique 7-bit binary number to each character, including letters, digits, and symbols. This allows for the encoding of 128 different values.

  • What is the purpose of Unicode and how does it differ from ASCII?

    -Unicode was devised to create a universal encoding scheme that could represent characters from all languages. Unlike ASCII, which is limited to 128 characters, Unicode uses 16 bits, allowing for over a million codes to represent characters from every written language, along with symbols and emojis.

  • How do computers manipulate binary sequences for computation?

    -Computers manipulate binary sequences through logic gates and arithmetic operations, such as addition, subtraction, multiplication, and division. These operations are performed on bits, which are the fundamental units of data in computing.

Outlines

00:00

🔢 Binary Numbers and Data Representation

The video introduces the concept of how computers use binary numbers to store and represent numerical data. It explains the basics of binary, comparing it to the decimal system, and how numbers like 263 in decimal are represented in binary. The script delves into the idea of bits and bytes, explaining the significance of 8-bit numbers and their range, and how larger numbers are represented in 32-bit and 64-bit systems. It also touches on the representation of positive and negative numbers using the sign bit and the range of values these systems can represent.

05:00

🌐 Floating Point Numbers and Character Encoding

This paragraph discusses the representation of floating point numbers using the IEEE 754 standard, which is akin to scientific notation. It explains how 32-bit floating point numbers use the first bit for the sign, 8 bits for the exponent, and 23 bits for the significand. The script then transitions to the representation of text in computers, starting with the simple numbering of letters and moving to ASCII, which is a 7-bit code that can encode 128 different values. ASCII's limitations for non-English languages led to the use of 8-bit codes for national characters. The rise of Unicode in 1992 is highlighted as a universal encoding scheme that can represent over a million characters from all languages.

10:04

🖥️ Computation and Data Manipulation

The final paragraph teases the upcoming discussion about computation, hinting at how computers will start manipulating binary sequences for computation. It provides a brief overview of how data, such as text messages, videos, web pages, and operating systems, are fundamentally long sequences of bits, setting the stage for the next episode where the actual process of computation will be explored.

Mindmap

Keywords

💡Binary

Binary refers to a numeral system that uses two symbols, typically 0 and 1, to represent all possible values. It's the fundamental language of computers and is central to the video's theme of how computers store and represent data. In the script, binary is used to explain the representation of numbers, with examples like the binary number 101 equating to 5 in decimal.

💡Boolean Algebra

Boolean Algebra is a branch of algebra that deals with binary values (true and false) and logical operations such as AND, OR, and NOT. It is foundational in computer science for evaluating logical statements and is mentioned in the script in the context of how transistors can be used to build logic gates.

💡Bit

A bit is the basic unit of information in computing, represented as either a 0 or a 1. It is the smallest piece of data that a computer can process. The script explains that each binary digit in operations like addition is called a bit, and that 8 bits make up a byte.

💡Byte

A byte is a unit of digital information that most commonly consists of eight bits. It is used to quantify storage and transmission capacity in computing. The script introduces the byte as a common size in computing, with 1 kilobyte being 1024 bytes.

💡Floating Point Numbers

Floating Point Numbers are representations of real numbers in computer systems, allowing for the storage of non-whole numbers like decimals or fractions. The script discusses how computers deal with such numbers using standards like IEEE 754, which uses significand and exponent to represent them.

💡Significand

In the context of floating point numbers, the significand is the part of the number that represents the significant digits. It is used in conjunction with an exponent to express the number's value in scientific notation, as illustrated in the script with the example of 625.9 being represented as 0.6259 x 10^3.

💡Exponent

In mathematics and computer science, an exponent is a constant that indicates the power to which a number, called the base, is to be raised. In floating point representation, it is used to multiply the significand, as explained in the script with the IEEE 754 standard.

💡ASCII

ASCII stands for American Standard Code for Information Interchange. It is a character encoding standard used to represent text in computers, assigning a unique binary number to each character. The script explains ASCII as a 7-bit code that can encode 128 different values, including letters, digits, and symbols.

💡Unicode

Unicode is a computing industry standard for consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The script introduces Unicode as a response to the limitations of ASCII and other encoding schemes, offering a universal encoding with space for over a million codes.

💡Interpretability

Interpretability, in the context of the script, refers to the ability of different computers to exchange and understand data encoded in a standard format like ASCII. It is a critical aspect of data communication and is mentioned in the script in relation to the universal exchange of information.

💡Memory Address

A memory address is a reference to a specific location in a computer's memory where data or instructions can be stored or retrieved. The script discusses the importance of memory addresses in storing and retrieving values, and how the size of these addresses has grown with the increase in computer memory capacity.

Highlights

Computers store and represent numerical data using binary values, which are the foundation for understanding computer science.

Transistors can be used to build logic gates, which evaluate boolean statements with only two binary values: true and false.

Binary numbers are an extension of boolean algebra, allowing representation of larger values by adding more binary digits.

Binary operates in base-two, with each column's multiplier being two times larger than the one to its right, unlike decimal which is base-ten.

Binary numbers can be converted to decimal and vice versa, following a mathematical process similar to decimal addition.

Each binary digit, or 'bit', is the smallest unit of data in computing, with 8 bits making up a byte.

A byte, consisting of 8 bits, is a standard unit of data size in computing, with larger units like kilobytes, megabytes, and gigabytes.

Computers use the first bit in a binary number to represent the sign of a number, with 1 for negative and 0 for positive.

64-bit computers can represent a vast range of numbers, up to 9.2 quintillion, which is essential for large-scale data processing.

Floating point numbers, represented by standards like IEEE 754, allow computers to handle non-whole numbers.

ASCII, the American Standard Code for Information Interchange, was invented in 1963 to encode 128 different values, including letters, digits, and symbols.

ASCII's limitation to English led to the use of the 8th bit for encoding 'national' characters in various languages.

The rise of Unicode in 1992 provided a universal encoding scheme for characters from all languages, using 16 bits for over a million codes.

Unicode supports more than 120,000 characters from over 100 types of script, including mathematical symbols and graphical characters like Emoji.

All digital content, including text messages, videos, and operating systems, are composed of long sequences of binary numbers.

Understanding binary sequences is crucial for the manipulation of data, which will be explored in the next episode on computation.

Transcripts

play00:03

Hi I’m Carrie Anne, this is Crash Course Computer Science

play00:06

and today we’re going to talk about how computers store and represent numerical data.

play00:10

Which means we’ve got to talk about Math!

play00:12

But don’t worry.

play00:13

Every single one of you already knows exactly what you need to know to follow along.

play00:16

So, last episode we talked about how transistors can be used to build logic gates, which can

play00:20

evaluate boolean statements.

play00:22

And in boolean algebra, there are only two, binary values: true and false.

play00:26

But if we only have two values, how in the world do we represent information beyond just

play00:29

these two values?

play00:30

That’s where the Math comes in.

play00:32

INTRO

play00:41

So, as we mentioned last episode, a single binary value can be used to represent a number.

play00:46

Instead of true and false, we can call these two states 1 and 0 which is actually incredibly useful.

play00:52

And if we want to represent larger things we just need to add more binary digits.

play00:55

This works exactly the same way as the decimal numbers that we’re all familiar with.

play00:59

With decimal numbers there are "only" 10 possible values a single digit can be; 0 through 9,

play01:04

and to get numbers larger than 9 we just start adding more digits to the front.

play01:07

We can do the same with binary.

play01:09

For example, let’s take the number two hundred and sixty three.

play01:12

What does this number actually represent?

play01:14

Well, it means we’ve got 2 one-hundreds, 6 tens, and 3 ones.

play01:18

If you add those all together, we’ve got 263.

play01:21

Notice how each column has a different multiplier.

play01:24

In this case, it’s 100, 10, and 1.

play01:27

Each multiplier is ten times larger than the one to the right.

play01:30

That's because each column has ten possible digits to work with, 0 through 9, after which

play01:35

you have to carry one to the next column.

play01:37

For this reason, it’s called base-ten notation, also called decimal since deci means ten.

play01:42

AND Binary works exactly the same way, it’s just base-two.

play01:45

That’s because there are only two possible digits in binary – 1 and 0.

play01:49

This means that each multiplier has to be two times larger than the column to its right.

play01:53

Instead of hundreds, tens, and ones, we now have fours, twos and ones.

play01:57

Take for example the binary number: 101.

play02:00

This means we have 1 four, 0 twos, and 1 one.

play02:04

Add those all together and we’ve got the number 5 in base ten.

play02:07

But to represent larger numbers, binary needs a lot more digits.

play02:10

Take this number in binary 10110111.

play02:12

We can convert it to decimal in the same way.

play02:14

We have 1 x 128, 0 x 64, 1 x 32, 1 x 16, 0 x 8, 1 x 4, 1 x 2, and 1 x 1.

play02:24

Which all adds up to 183.

play02:26

Math with binary numbers isn’t hard either.

play02:29

Take for example decimal addition of 183 plus 19.

play02:32

First we add 3 + 9, that’s 12, so we put 2 as the sum and carry 1 to the ten’s column.

play02:37

Now we add 8 plus 1 plus the 1 we carried, thats 10, so the sum is 0 carry 1.

play02:43

Finally we add 1 plus the 1 we carried, which equals 2.

play02:46

So the total sum is 202.

play02:48

Here’s the same sum but in binary.

play02:50

Just as before, we start with the ones column.

play02:52

Adding 1+1 results in 2, even in binary.

play02:55

But, there is no symbol "2" so we use 10 and put 0 as our sum and carry the 1.

play03:00

Just like in our decimal example.

play03:02

1 plus 1, plus the 1 carried, equals 3 or 11 in binary, so we put the sum as 1 and we

play03:08

carry 1 again, and so on.

play03:09

We end up with 11001010, which is the same as the number 202 in base ten.

play03:14

Each of these binary digits, 1 or 0, is called a “bit”.

play03:17

So in these last few examples, we were using 8-bit numbers with their lowest value of zero

play03:21

and highest value is 255, which requires all 8 bits to be set to 1.

play03:26

Thats 256 different values, or 2 to the 8th power.

play03:30

You might have heard of 8-bit computers, or 8-bit graphics or audio.

play03:34

These were computers that did most of their operations in chunks of 8 bits.

play03:38

But 256 different values isn’t a lot to work with, so it meant things like 8-bit games

play03:42

were limited to 256 different colors for their graphics.

play03:46

And 8-bits is such a common size in computing, it has a special word: a byte.

play03:51

A byte is 8 bits.

play03:53

If you’ve got 10 bytes, it means you’ve really got 80 bits.

play03:55

You’ve heard of kilobytes, megabytes, gigabytes and so on.

play03:58

These prefixes denote different scales of data.

play04:01

Just like one kilogram is a thousand grams, 1 kilobyte is a thousand bytes…. or really

play04:06

8000 bits.

play04:07

Mega is a million bytes (MB), and giga is a billion bytes (GB).

play04:11

Today you might even have a hard drive that has 1 terabyte (TB) of storage.

play04:15

That's 8 trillion ones and zeros.

play04:17

But hold on!

play04:18

That’s not always true.

play04:19

In binary, a kilobyte has two to the power of 10 bytes, or 1024.

play04:24

1000 is also right when talking about kilobytes, but we should acknowledge it isn’t the only

play04:28

correct definition.

play04:29

You’ve probably also heard the term 32-bit or 64-bit computers – you’re almost certainly

play04:34

using one right now.

play04:35

What this means is that they operate in chunks of 32 or 64 bits.

play04:39

That’s a lot of bits!

play04:40

The largest number you can represent with 32 bits is just under 4.3 billion.

play04:45

Which is thirty-two 1's in binary.

play04:47

This is why our Instagram photos are so smooth and pretty – they are composed of millions

play04:52

of colors, because computers today use 32-bit color graphics

play04:56

Of course, not everything is a positive number - like my bank account in college.

play05:00

So we need a way to represent positive and negative numbers.

play05:03

Most computers use the first bit for the sign: 1 for negative, 0 for positive numbers, and

play05:08

then use the remaining 31 bits for the number itself.

play05:11

That gives us a range of roughly plus or minus two billion.

play05:14

While this is a pretty big range of numbers, it’s not enough for many tasks.

play05:17

There are 7 billion people on the earth, and the US national debt is almost 20 trillion dollars after all.

play05:23

This is why 64-bit numbers are useful.

play05:25

The largest value a 64-bit number can represent is around 9.2 quintillion!

play05:30

That’s a lot of possible numbers and will hopefully stay above the US national debt for a while!

play05:34

Most importantly, as we’ll discuss in a later episode, computers must label locations

play05:38

in their memory, known as addresses, in order to store and retrieve values.

play05:43

As computer memory has grown to gigabytes and terabytes – that’s trillions of bytes

play05:47

– it was necessary to have 64-bit memory addresses as well.

play05:50

In addition to negative and positive numbers, computers must deal with numbers that are

play05:53

not whole numbers, like 12.7 and 3.14, or maybe even stardate: 43989.1.

play06:00

These are called “floating point” numbers, because the decimal point can float around

play06:04

in the middle of number.

play06:05

Several methods have been developed to represent floating point numbers.

play06:08

The most common of which is the IEEE 754 standard.

play06:11

And you thought historians were the only people bad at naming things!

play06:14

In essence, this standard stores decimal values sort of like scientific notation.

play06:19

For example, 625.9 can be written as 0.6259 x 10^3.

play06:25

There are two important numbers here: the .6259 is called the significand.

play06:30

And 3 is the exponent.

play06:31

In a 32-bit floating point number, the first bit is used for the sign of the number -- positive

play06:35

or negative.

play06:36

The next 8 bits are used to store the exponent and the remaining 23 bits are used to store

play06:41

the significand.

play06:42

Ok, we’ve talked a lot about numbers, but your name is probably composed of letters,

play06:46

so it’s really useful for computers to also have a way to represent text.

play06:49

However, rather than have a special form of storage for letters,

play06:53

computers simply use numbers to represent letters.

play06:56

The most straightforward approach might be to simply number the letters of the alphabet:

play06:59

A being 1, B being 2, C 3, and so on.

play07:02

In fact, Francis Bacon, the famous English writer, used five-bit sequences to encode

play07:07

all 26 letters of the English alphabet to send secret messages back in the 1600s.

play07:11

And five bits can store 32 possible values – so that’s enough for the 26 letters,

play07:16

but not enough for punctuation, digits, and upper and lower case letters.

play07:20

Enter ASCII, the American Standard Code for Information Interchange.

play07:24

Invented in 1963, ASCII was a 7-bit code, enough to store 128 different values.

play07:29

With this expanded range, it could encode capital letters, lowercase letters, digits

play07:33

0 through 9, and symbols like the @ sign and punctuation marks.

play07:37

For example, a lowercase ‘a’ is represented by the number 97, while a capital ‘A’ is 65.

play07:42

A colon is 58 and a closed parenthesis is 41.

play07:45

ASCII even had a selection of special command codes, such as a newline character to tell

play07:49

the computer where to wrap a line to the next row.

play07:52

In older computer systems, the line of text would literally continue off the edge of the

play07:56

screen if you didn’t include a new line character!

play07:59

Because ASCII was such an early standard, it became widely used, and critically, allowed

play08:02

different computers built by different companies to exchange data.

play08:06

This ability to universally exchange information is called “interoperability”.

play08:10

However, it did have a major limitation: it was really only designed for English.

play08:15

Fortunately, there are 8 bits in a byte, not 7, and it soon became popular to use codes

play08:19

128 through 255, previously unused, for "national" characters.

play08:25

In the US, those extra numbers were largely used to encode additional symbols, like mathematical

play08:30

notation, graphical elements, and common accented characters.

play08:33

On the other hand, while the Latin characters were used universally, Russian computers used

play08:37

the extra codes to encode Cyrillic characters, and Greek computers, Greek letters, and so on.

play08:42

And national character codes worked pretty well for most countries.

play08:45

The problem was, if you opened an email written in Latvian on a Turkish computer, the result

play08:49

was completely incomprehensible.

play08:51

And things totally broke with the rise of computing in Asia, as languages like Chinese and Japanese

play08:56

have thousands of characters.

play08:58

There was no way to encode all those characters in 8-bits!

play09:00

In response, each country invented multi-byte encoding schemes, all of which were mutually incompatible.

play09:06

The Japanese were so familiar with this encoding problem that they had a special name for it:

play09:11

"mojibake", which means "scrambled text".

play09:13

And so it was born – Unicode – one format to rule them all.

play09:17

Devised in 1992 to finally do away with all of the different international schemes

play09:21

it replaced them with one universal encoding scheme.

play09:23

The most common version of Unicode uses 16 bits with space for over a million codes -

play09:28

enough for every single character from every language ever used –

play09:32

more than 120,000 of them in over 100 types of script

play09:36

plus space for mathematical symbols and even graphical characters like Emoji.

play09:40

And in the same way that ASCII defines a scheme for encoding letters as binary numbers,

play09:43

other file formats – like MP3s or GIFs – use

play09:46

binary numbers to encode sounds or colors of a pixel in our photos, movies, and music.

play09:50

Most importantly, under the hood it all comes down to long sequences of bits.

play09:55

Text messages, this YouTube video, every webpage on the internet, and even your computer’s

play10:00

operating system, are nothing but long sequences of 1s and 0s.

play10:03

So next week, we’ll start talking about how your computer starts manipulating those

play10:07

binary sequences, for our first true taste of computation.

play10:10

Thanks for watching. See you next week.

Rate This

5.0 / 5 (0 votes)

Related Tags
Binary NumbersComputer ScienceData RepresentationDecimal SystemFloating PointIEEE 754ASCII EncodingUnicode StandardNumerical ComputationDigital Storage