Floating Point Numbers: IEEE 754 Standard | Single Precision and Double Precision Format

ALL ABOUT ELECTRONICS
5 Dec 202322:01

Summary

TLDRThis video from ALL ABOUT ELECTRONICS dives into the IEEE 754 standard for floating-point numbers. It explains the 32-bit single precision and 64-bit double precision formats, detailing how numbers are stored with one bit for the sign, variable bits for the exponent, and the rest for the mantissa. The video covers biased representation for exponent storage, the trade-off between range and precision, and the significance of all zeros or ones in the exponent field. It also touches on the conversion between decimal and IEEE format, emphasizing the importance of understanding floating-point representation in electronics and computing.

Takeaways

  • 😀 IEEE 754 is a standard for representing floating-point numbers in computer memory.
  • 🔑 In IEEE 754, the floating-point number is divided into three parts: sign bit, exponent, and mantissa.
  • 💡 The standard includes five different formats ranging from 16 bits (half precision) to 256 bits for storing floating-point numbers.
  • 🎯 Single precision format uses 32 bits, with 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa.
  • 🌟 The exponent is stored using a biased representation, where a bias value is added to the actual exponent value to handle negative exponents.
  • 📉 The bias for single precision is 127, allowing an exponent range from -126 to +127 after bias subtraction.
  • 🔢 The mantissa stores the fractional part of the normalized binary number, excluding the leading '1' that is implicit in normalized numbers.
  • ⚖️ The largest number representable in single precision is approximately 3.4 x 10^38, and the smallest is about 1.1 x 10^-38.
  • 🔄 Floating-point numbers offer a wider range compared to fixed-point numbers but at the cost of precision, with single precision offering about 7 decimal digits of precision.
  • 📚 The video also covers the conversion process between decimal numbers and their IEEE 754 single precision binary representation.

Q & A

  • What is the IEEE standard used for?

    -The IEEE standard, specifically IEEE 754, is used for the storage of floating-point numbers in computer memory.

  • How many bits are reserved for the sign bit in a floating-point number stored in single precision format?

    -In the single precision format, 1 bit is reserved for the sign bit.

  • What is the purpose of the exponent in floating-point representation?

    -The exponent in floating-point representation is used to store the power to which the base (usually 2) is raised, allowing the representation of both large and small numbers efficiently.

  • How many bits are used to store the mantissa in single precision floating-point format?

    -In the single precision floating-point format, 23 bits are used to store the mantissa.

  • What is the biased representation used for in the IEEE 754 standard?

    -The biased representation is used to store the exponent in the IEEE 754 standard, allowing for a continuous range of exponent values including both positive and negative numbers.

  • What is the value of the bias used for the 8-bit exponent in single precision format?

    -The value of the bias used for the 8-bit exponent in single precision format is 127.

  • How is the mantissa stored in the IEEE 754 standard?

    -In the IEEE 754 standard, the mantissa is stored as the fractional part of the normalized binary number, excluding the leading 1 that is implicit in the normalized form.

  • What is the significance of all zeroes and all ones in the exponent field in IEEE 754?

    -In IEEE 754, exponent fields of all zeroes and all ones are reserved for special purposes, such as representing special values like infinity and NaN (Not a Number).

  • What is the range of the exponent in the single precision format after subtracting the bias?

    -After subtracting the bias, the range of the exponent in the single precision format is from -126 to +127.

  • How does the IEEE 754 standard handle the representation of negative zero?

    -The IEEE 754 standard represents negative zero as a special case, where the sign bit is set to 1 and the exponent and mantissa fields are all zeroes.

  • What is the largest number that can be represented in the single precision format?

    -The largest number that can be represented in the single precision format is approximately 3.4 x 10^38.

  • What is the smallest positive normalized number that can be represented in the single precision format?

    -The smallest positive normalized number that can be represented in the single precision format is approximately 1.1 x 10^-38.

Outlines

00:00

🔢 Introduction to IEEE 754 Standard for Floating Point Numbers

The paragraph introduces the IEEE 754 standard used for representing floating point numbers in computer systems. It explains the concept of normalization of binary numbers and how they are stored in memory with a focus on the sign bit, exponent, and mantissa. The IEEE 754 standard defines the bit allocation for these components and introduces different formats like half precision (16 bits), single precision (32 bits), double precision (64 bits), and extended precisions (128 and 256 bits). The video focuses on single and double precision formats, detailing the single precision format which uses 32 bits with 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa. The significance of the biased representation for the exponent is also discussed, highlighting its advantages over other representations like 2's complement and sign magnitude.

05:00

🔑 Understanding Biased Representation in IEEE 754

This section delves into the biased representation used for the exponent in the IEEE 754 standard. It clarifies how the bias allows for a continuous range of exponent values, simplifying the comparison of floating point numbers. The paragraph also discusses the special cases where all bits in the exponent are zero or one, which are reserved for special purposes. A step-by-step example is provided to demonstrate how to decode a 32-bit number in single precision format to its actual decimal value, including subtracting the bias from the stored exponent value and converting the mantissa part to the true binary number.

10:02

📉 Conversion from Decimal to IEEE Single Precision Format

The paragraph explains how to convert a decimal number into its IEEE single precision binary representation. It walks through the process of converting a decimal number to binary, normalizing the binary number, and then encoding it into the 32-bit format with attention to the sign bit, biased exponent, and mantissa. An example is given to convert the decimal number 12.625 into its IEEE format, detailing each step from binary conversion to normalization and then to the final 32-bit representation, including the hexadecimal representation of the number.

15:05

🔍 Range and Precision of Single Precision Format

This section discusses the range and precision of numbers that can be represented in the IEEE single precision format. It contrasts the range of floating point numbers with fixed point numbers, highlighting that floating point provides a much broader range but at the cost of precision. The paragraph also explains the distribution of representable numbers in the floating point format, noting that they are not uniformly distributed like in fixed point. It provides the largest and smallest numbers that can be represented in single precision, emphasizing the trade-off between range and precision.

20:11

🌐 Overview of IEEE Double Precision Format

The final paragraph provides an overview of the IEEE double precision format, which uses 64 bits for representing floating point numbers. It outlines the bit allocation with 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa. The bias for the exponent in double precision is calculated, and the range of representable numbers is discussed, including the smallest and largest values. The paragraph concludes by emphasizing the increased precision offered by the double precision format due to the larger mantissa size, equating to 16 decimal digits of precision.

Mindmap

Keywords

💡IEEE Standard

The IEEE Standard refers to a set of guidelines and procedures developed by the Institute of Electrical and Electronics Engineers (IEEE). In the context of the video, it specifically refers to the IEEE 754 standard used for representing floating-point numbers in computers. This standard is crucial for ensuring consistency in how floating-point numbers are stored and processed across different systems and platforms.

💡Floating Point Numbers

Floating point numbers are a way of representing real numbers in computers, allowing for the representation of both very large and very small numbers. The video discusses how these numbers are normalized and stored according to the IEEE 754 standard, emphasizing the importance of understanding their representation for accurate data processing in electronics and computer science.

💡Sign Bit

The sign bit is the most significant bit in the representation of a floating-point number and indicates whether the number is positive or negative. The video explains that in the IEEE 754 standard, a sign bit of '0' represents a positive number, while a '1' represents a negative number, which is fundamental to understanding the binary representation of floating-point numbers.

💡Exponent

In the context of floating-point numbers, the exponent represents the power to which the base (usually 2 in computer science) is raised. The video discusses how the exponent is stored using a biased representation in IEEE 754, which allows for a range of values that includes both positive and negative numbers, essential for the accurate representation of scaled values.

💡Mantissa

The mantissa, also known as the significand, is the significant digits of a floating-point number, excluding the exponent. The video explains that in the IEEE 754 standard, the mantissa is stored as the fractional part of the normalized binary number, which contributes to the precision of the floating-point representation.

💡Biased Representation

Biased representation is a method of encoding signed numbers such that the negative values are represented in a way that they can be directly used in arithmetic operations. In the video, it is mentioned that the exponent in IEEE 754 is stored using biased representation, where a bias (e.g., 127 for single precision) is added to the actual exponent value to allow for a continuous range of exponent values.

💡Single Precision

Single precision is a format for floating-point numbers defined by the IEEE 754 standard, using 32 bits in total, with 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa. The video provides examples of how numbers are represented in single precision, highlighting its use in computer memory and calculations.

💡Double Precision

Double precision is another format for floating-point numbers as per IEEE 754, using 64 bits in total. It offers higher precision than single precision with 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa. The video contrasts double precision with single precision, emphasizing its use when greater precision is required.

💡Normalization

Normalization in the context of floating-point numbers refers to the process of expressing a number in its simplest form, where there is only one significant digit (usually 1) before the binary point. The video explains that normalization is a prerequisite for storing floating-point numbers in IEEE 754 format, ensuring consistency in representation.

💡Precision

Precision in the context of floating-point numbers refers to the number of digits accurately representable in the number's significand or mantissa. The video discusses how the precision of floating-point numbers is determined by the number of bits allocated to the mantissa, with single precision offering up to 7 significant decimal digits and double precision offering up to 16.

Highlights

Introduction to IEEE standard for floating point numbers.

Explanation of floating point number storage in memory with sign bit, exponent, and mantissa.

IEEE 754 standard for storing floating point numbers in different formats like half precision, single precision, and double precision.

Single precision format uses 32 bits with 1 bit for sign, 8 bits for exponent, and 23 bits for mantissa.

Biased representation for storing exponent values in IEEE 754 standard.

Calculation of bias for 8-bit biased representation and its significance.

Advantages of biased representation for continuity and ease of comparison in floating point numbers.

Method of comparing two floating point numbers based on sign bit, exponent, and mantissa.

Example of finding the actual value of a 32-bit number in single precision format.

Conversion of normalized binary number to true binary number in floating point representation.

Example of representing a decimal number in IEEE 32-bit format with steps for normalization and bit allocation.

Conversion of 32-bit number to hexadecimal format for storage.

Range of numbers representable in single precision format from 1.1 x 10^-38 to 3.4 x 10^38.

Comparison between floating point and fixed point representation in terms of range and precision.

Double precision format explained with 64 bits allocation and its benefits in terms of range and precision.

IEEE 754 standard's handling of special cases with all zeros or all ones in the exponent field.

Conclusion summarizing the basics of IEEE 754 standard and floating point number storage.

Transcripts

play00:06

Hey friends, welcome to the YouTube channel ALL ABOUT ELECTRONICS.

play00:10

So in this video, we will learn about the IEEE standard which is used for the floating

play00:15

point numbers.

play00:17

So in the previous video, we have seen that how to normalize any binary number and how

play00:22

to represent it in the floating point format.

play00:25

And then after, we have also briefly seen that how this floating point number is stored

play00:29

in the memory.

play00:31

So we have seen that while storing this floating point number, one bit is reserved for the

play00:35

sign bit, and the few bits are reserved for the exponent.

play00:39

And then the remaining bits are reserved for the mantissa.

play00:43

Now for storing this floating point number, a certain standard has been defined.

play00:48

So this standard defines, like in how many bits this floating point number will be stored.

play00:53

And out of the total number of reserved bits, how many bits will be used for the exponent

play00:57

as well as the mantissa.

play00:59

And apart from that, it also defines, like in which format this mantissa and the exponent

play01:04

will be stored.

play01:06

Because if you see this exponent, then it can be either positive or the negative.

play01:11

That means this exponent should be stored in such a way that both positive as well as

play01:14

the negative numbers will get covered.

play01:17

So this IEEE 754 is one such IEEE standard, which is commonly used for storing the floating

play01:23

point numbers.

play01:25

Now in this IEEE standard, depending on how many bits are used for storing this floating

play01:29

point number, we have total 5 different formats.

play01:33

For example, in this half precision, 16 bits are used for storing the floating point number.

play01:38

Similarly, when the floating point number is stored in the 32 bits, then that format

play01:43

is known as the single precision format.

play01:46

Likewise, in the double precision format, 64 bits are used for storing the floating

play01:50

point number.

play01:52

And likewise, this floating point number can also be represented in 128 or 256 bits.

play01:59

So out of these 5 different formats, the single precision and the double precision formats

play02:03

are the commonly used formats.

play02:05

And in this video also, we will talk about these two formats.

play02:09

So first, let's see this single precision format.

play02:13

So in this single precision format, the floating point number is stored in the 32 bits.

play02:19

So out of the 32 bits, the 1 bit is reserved to indicate the sign of the number.

play02:24

So if this bit is 0, then it means that the number is positive.

play02:29

And for the negative numbers, this sign bit will be equal to 1.

play02:33

Then after, the next 8 bits are reserved to store the exponent value.

play02:37

And then after, the remaining 23 bits are used to store the mantissa.

play02:42

So we know that when we normalize the binary number, then the significant digit just before

play02:47

the binary point will always remain the 1.

play02:50

Therefore, this 1 is not stored, and only this fractional part is stored.

play02:56

That means in this single precision format, once the binary number is normalized, then

play03:01

the first 23 bits of this fractional part will be stored in this mantissa.

play03:06

But now let's see how this exponent part is stored.

play03:09

So as you can see, in this 8-bit, we can represent any unsigned integer between the 0 and the

play03:15

255.

play03:17

But here as you can see, this exponent part can be either positive or negative.

play03:22

So the question is how to represent the negative values of the exponent.

play03:26

So we know that there are different ways for representing the negative numbers.

play03:31

Like it can be represented in the 2's complement form, or even it can be represented in the

play03:36

sign magnitude form.

play03:38

And similarly, it can also be represented in the biased representation.

play03:43

So here, in this IEEE 754 standard, the exponent value is represented using the biased representation.

play03:50

So first, let's understand what is the biased representation, and here why the exponent

play03:56

is stored in this format.

play03:58

So in this biased representation, the fixed offset or the bias is added to the number

play04:03

in such a way that the negative numbers become positive.

play04:07

For example, if we have an 8-bit number, then the value of the bias should be equal to

play04:12

2^(n-1) -1, where n represents the number of bits.

play04:18

For example, for the 8 bits, if you calculate the value of the bias, then that will come

play04:23

out as 127.

play04:25

Now in this 8-bit biased representation, the bias is added in such a way that the negative

play04:31

number will get shifted towards the positive side.

play04:34

And for the 8 bits, if we see the range of the exponent, then it will be from 0 to 255.

play04:40

That means if we want to see the actual range of the exponent, which can be represented

play04:44

in the 8 bits, then we need to subtract this bias.

play04:48

So after subtracting the bias, if we see the actual range, then it will be from minus 127

play04:53

to plus 128.

play04:55

So this table shows the actual numbers as well as the value of the number after adding

play05:00

the bias.

play05:01

And as you can see, the last column shows the corresponding bias representation.

play05:07

Now in this IEEE single-precision format, the exponent value of all zeroes and all ones

play05:12

is reserved for the special purpose.

play05:15

That means if we see the available range for the exponent, then it will be from minus 126

play05:20

to plus 127.

play05:22

Now the advantage of this bias representation is that the numbers from the negative to positive

play05:27

are changing in the specific order.

play05:30

That means here we have a continuity.

play05:32

And here as you can see, the numbers are changing in the ascending order.

play05:37

On the other hand, if you see the other representations, which is used for the negative numbers, that

play05:42

is 2's complement and the sine magnitude form, then that is not the case.

play05:47

That means as we go from the negative to positive numbers, then there is no continuity in the

play05:51

representation.

play05:52

Moreover, in the sine magnitude representation, we have two different representations for

play05:57

the zero.

play05:59

So if we use any of the two representations for the exponent, then there is a discontinuity

play06:04

in the number representation.

play06:06

And because of that, comparing the two floating point numbers becomes difficult.

play06:11

So first let's understand how we compare the two floating point numbers.

play06:15

So during the comparison, first we compare the sign bit.

play06:19

So if these sign bits are equal for the two numbers, then we will compare the exponent

play06:23

part.

play06:24

And if that is also equal, then we will compare the mantissa part.

play06:29

And from that, we can get to know that which number is greater than the other one.

play06:33

Now for the exponent value, if there is a continuity in the number representation, then

play06:38

comparing the two numbers becomes much easier.

play06:41

For example, let's say we want to compare these two floating point numbers.

play06:46

So here, just by comparing the exponent of the two numbers, we can say that the number

play06:50

2 is greater than the number 1.

play06:53

That means when we want to compare the two floating point numbers, then this biased representation

play06:57

is very useful.

play06:59

And that is why the exponent is represented in this biased representation.

play07:03

Alright, so now let's take a couple of examples and let's see that if any number is stored

play07:09

in this IEEE single precision format, then how to find its actual value.

play07:14

So let's say, this is a 32-bit number which is stored in this single precision format.

play07:20

So we know that in this single precision format, the MSB is the sign bit.

play07:25

Then the next 8-bit represents the exponent value, and the remaining 23-bit represents

play07:30

the Mantissa value.

play07:31

So here, since the MSB is 0, so we can say that the given number is the positive number.

play07:38

So now, let's find the actual value of this exponent.

play07:42

So here, the value of the exponent is equal to 10000101.

play07:48

And in the binary, that is equivalent to 133.

play07:52

So here, this value which is stored in the exponent is along with the bias.

play07:57

So if we want to know the actual value of this exponent, then we need to subtract the

play08:00

bias from this 133.

play08:03

And we know that for this single precision format, the value of the bias is equal to

play08:07

127.

play08:08

That means if we see the actual value of the exponent for the given number, then that is

play08:12

equal to 6.

play08:14

That means here, this exponential part is equal to 2 to the power 6.

play08:20

Now we know that in the normalized binary form, before this Mantissa, there will be a

play08:24

binary point.

play08:26

And the digit before the binary point will always remain 1.

play08:30

So here, in this fractional part, we can remove all these zeroes.

play08:34

That means this will be the significand of the given number.

play08:38

And along with the exponent value, this is the normalized binary number.

play08:42

So now, let's convert this normalized binary number into the true binary number.

play08:47

So here, since the exponent is equal to 6, so we will shift this radix point towards

play08:52

the right side by 6 bits.

play08:55

That means now, in a true binary format, this number is equal to 1001111.

play09:01

And in the decimal, that is equal to 79.

play09:05

So we can say that the given 32-bit number in a single precision format corresponds to

play09:10

79.

play09:11

So, similarly, let's take another example.

play09:15

So here, for the given 32-bit floating point number, let's find out the equivalent decimal

play09:20

number.

play09:21

So as we know, in this 32-bit format, the MSB represents the sign bit.

play09:26

And here, since it is 1, so the given number is the negative number.

play09:31

So now, let's find out the actual value of the exponent.

play09:35

So here, the exponent is equal to 10000011.

play09:40

And in the decimal, that corresponds to 131.

play09:45

So like I said, while storing the value of the exponent, the bias of 127 is added to

play09:50

the actual exponent.

play09:52

So now, in this exponent value, if we subtract the bias, then we will get the actual value

play09:57

of the exponent.

play09:58

So here, that is equal to 4.

play10:01

So we can say that here the exponential part is equal to 2 to the power 4.

play10:06

So similarly, now let's see the Mantissa part.

play10:10

So here, this is our Mantissa part.

play10:13

And we know that in the normalized binary form, just before this Mantissa, there is a

play10:17

binary point.

play10:19

And the digit before this binary point will always remain 1.

play10:23

So here, in this fractional part, we can remove all these zeroes.

play10:27

That means now, this will be our significand.

play10:31

And along with the exponent, this is our normalized binary number corresponding to this 32-bit

play10:36

number.

play10:37

So here, since the exponent is equal to 4, so we will shift this radix point, or this

play10:42

binary point towards the right side by 4 bits.

play10:46

And after shifting this binary point, this will be our binary number.

play10:50

That is 11100.11.

play10:53

So now, if we just see the integer part, then this 11100 in the decimal corresponds to 28.

play11:00

And similarly, this 0.11 in the decimal corresponds to 0.75.

play11:06

That means the overall number will be equal to 28.75.

play11:10

But here, since the sign bit is equal to 1, so the given number is the negative.

play11:15

So we can say that for the given 32-bit number, the equivalent decimal number is equal to

play11:20

minus 28.75.

play11:23

So in this way, if we have been given any 32-bit number in a single-position format,

play11:28

then we can easily find the equivalent decimal number.

play11:32

So now, let's see the other way around, and let's find out how to represent any decimal

play11:37

number in this IEEE 32-bit format.

play11:40

So let's say, we want to represent this 12.625 in this IEEE format.

play11:46

And for that, first of all, let's find the equivalent binary number.

play11:50

So here, this 12 in the binary corresponds to 1100.

play11:55

And this 0.625 corresponds to 0.101.

play11:59

That means if we see the equivalent binary number, then that is equal to 1100.101.

play12:06

So now, as a second step, let's normalize this binary number, and let's write it in

play12:10

this floating-point representation.

play12:13

So in a normalized number, we should have only one significant digit before the binary

play12:17

point.

play12:18

And we know that, that too should be equal to 1.

play12:22

So here, for that, we need to shift this binary point towards the left side by 3 bits.

play12:28

And here, since we are shifting the binary point towards the left side, so the exponent

play12:32

value will increase.

play12:34

So in this case, since we are shifting it by 3 bits, so the value of the exponent will

play12:39

increase by 3.

play12:41

And after shifting, this will be our normalized binary number.

play12:45

So now, let's see how to represent this normalized binary number in the 32-bit format.

play12:51

So here, since the number is positive, so this sign bit will remain 0.

play12:57

Now here, we know that this 1 just before the binary point is not stored in this 32-bit

play13:03

format.

play13:04

And here, only this fractional part is stored.

play13:07

So here, this fractional part is 100101.

play13:12

So first, let's copy this, and let's write it in the mantissa part.

play13:17

And then after, let's fill the next 17 bits by 0.

play13:22

So in this way, we got the 23 bits of the mantissa.

play13:26

So now, the only remaining part is the exponent.

play13:30

So here, as you can see, the exponent is equal to 3.

play13:34

So here, before storing this exponent value, first we need to add the bias.

play13:39

That means here, the stored value of the exponent will be 3 plus 127, that is equal to 130.

play13:47

And in the binary, that is equal to 10000010.

play13:53

So in this way, we got the sign, exponent and the mantissa part of this 32-bit number.

play13:58

So typically, these 32 or the 64-bit long numbers are stored in the hex format, that

play14:04

is the hexadecimal format.

play14:07

So here, to find the equivalent hexadecimal number for the given 32-bit number, let's

play14:11

make the group of 4 bits.

play14:14

So here, the first group will be equal to 0100.

play14:19

Then after, the next group will be equal to 0001.

play14:24

Similarly, then after, the next group is equal to 0100.

play14:29

And likewise, the next group will be equal to 1010.

play14:34

And after that, we will have 4 groups of zeros.

play14:39

So here, this 0100 corresponds to 4 in the hexadecimal.

play14:44

Likewise, this 0001 corresponds to 1.

play14:48

Similarly, this 0100 corresponds to 4, while the 1010 corresponds to A.

play14:56

And then after, we will have the 4 zeros.

play15:00

So we can say that, for the given decimal number, the equivalent 32-bit number in the

play15:05

IEEE format is equal to 414A0000.

play15:09

And as you can see over here, this number is shown in the hex format.

play15:13

Alright.

play15:15

So now, let's see the largest and the smallest number that can be represented in this single

play15:20

precision format.

play15:22

So in this IEEE single precision format, the largest value of the exponent will be

play15:26

equal to 127, while the minimum value will be equal to –126.

play15:32

So with the largest and the smallest values of the exponent, if we see the floating point

play15:36

number in a normalized form, then this is how it will look like.

play15:40

So in this format, for the largest number, in the mantissa part, all the bits should be

play15:45

equal to 1.

play15:47

And similarly, for the smallest number, this mantissa part should be equal to 0.

play15:53

So if you see the mantissa part of this largest number, then it is slightly less than 2.

play15:58

And it can be given as (2 - 2^-23).

play16:03

And further, it will get multiplied by this term.

play16:07

So if we calculate the value of this term, then it is roughly equal to 3.4 x 10^38.

play16:14

And similarly, for the smallest number, this mantissa part is equal to 0.

play16:19

That means the smallest representable number is equal to 2^-126.

play16:24

And that is roughly equal to 1.1 x 10^-38.

play16:30

That means in this IEEE single precision format, if you see the largest and the smallest

play16:34

numbers, then they are in the order of 10^38 and 10^-38 respectively.

play16:41

On the other hand, if you see the 32-bit fixed point representation, then for the signed integers,

play16:47

the maximum positive number is equal to (2^31 -1), which is roughly equal to 2.1 x10^9.

play16:57

So if we compare this fixed point with the floating point numbers with the same bits,

play17:01

then this floating point number covers much more range.

play17:05

And just by looking at it, the question arises, why this floating point number covers greater

play17:09

range.

play17:11

So the thing is, this floating point number covers the greater range but at the cost of

play17:15

precision.

play17:17

So for example, in this 32-bit floating point representation, the 23 bits are reserved for

play17:22

the mantissa.

play17:24

That means in the 32 bits, the precision that we can achieve is up to 23 bits.

play17:29

Or in the decimal, that is equivalent to the 7 significant digits after the decimal point.

play17:35

So if we want to represent any number beyond this 7-digit precision, then it cannot be

play17:40

represented accurately in this 32-bit format.

play17:43

For example, if we want to represent this number, then in 32-bit, it cannot be represented

play17:49

accurately.

play17:51

Because if we normalize this number, then it will be in this form.

play17:55

That means here, after the decimal point, we will have the 9 significant digits.

play18:00

But like I said, in this 32-bit floating point format, we can only represent up to the 7

play18:06

significant digits.

play18:07

So here, the last two digits will get rounded to the nearby decimal number, and then it

play18:12

will be stored in the 32-bit format.

play18:15

That means in the floating point numbers, we are achieving the greater range but at

play18:19

the cost of precision.

play18:21

Now the thing is, with the 32 bits, the total number of distinctly representable numbers

play18:26

is equal to 2 to the power 32.

play18:29

So in this floating point number, or specifically in this single precision format, all these

play18:34

distinctly representable numbers are spreaded over the entire range.

play18:38

So in these 32 bits, these are the smallest representable non-zero numbers in the normalized

play18:43

form.

play18:44

That means here, we cannot represent any normalized number smaller than these two numbers.

play18:50

Similarly, these are the largest positive and negative numbers which can be represented

play18:54

in this 32-bit format.

play18:57

And beyond this range, we cannot represent any number.

play19:00

So as you can see, in this floating point number, the numbers are spreaded over the entire

play19:05

range.

play19:06

That means here, they are not distributed uniformly.

play19:10

On the other hand, in the fixed-point representation, all the numbers are distributed uniformly.

play19:15

That means the spacing between the numbers is equal.

play19:18

And that is why, this fixed-point number covers the smaller range.

play19:23

So in short, the floating point number covers the greater range at the cost of precision.

play19:28

And if we want more precision, then we can go for the double precision format.

play19:33

So in this double precision format, the floating point number is stored in the 64 bits.

play19:38

So out of the 64 bits, the 1-bit is reserved for the sign bit, while the next 11 bits represents

play19:44

the exponent.

play19:46

And then, the remaining 52 bits are reserved for the mantissa part.

play19:51

So here, since the 11 bits are reserved for the exponent, so the value of the bias will

play19:56

be equal to (2^10 -1).

play19:59

That means for the double precision format, the value of the bias will be equal to 1023.

play20:05

So once again, in this 11-bit exponent, all the 0s and all the 1s are reserved for the

play20:10

special purpose.

play20:12

So excluding that, if you see the range of this exponent, then it will be between the

play20:16

2046 and the 1.

play20:19

So here, to get the actual value of the exponent, we need to subtract the bias from this value.

play20:25

That means in this 64-bit format, the maximum value of the exponent which we can represent

play20:30

is equal to 1023.

play20:32

That is equal to 2 to the power 1023.

play20:35

And similarly, the minimum value of the exponent is equal to minus 1022.

play20:41

So correspondingly, if we see the smallest representable number, then that is around

play20:45

2.2 x 10^ (-308).

play20:49

And similarly, the largest representable number is around 1.79 x 10^308

play20:57

So as you can see, this double precision format covers the huge range.

play21:01

And here, it also provides the better precision.

play21:05

Because here, the mantissa has 52 bits.

play21:07

Or in the decimal, that is equivalent to the 16 decimal digits.

play21:12

That means if we require the greater precision, then we can go for this double precision format.

play21:17

So that is all about the IEEE single precision and the double precision formats.

play21:21

Now, so far we have seen that in this floating point format, the exponent values of all the

play21:26

zeros and all the ones are reserved for the special purpose.

play21:30

So in the next video, we will see that when the values of the exponent is equal to all

play21:34

zeros or all ones, then what it signifies and how to interpret it.

play21:40

But I hope in this video, you understood the basics of this IEEE 754 standard and how the

play21:45

floating point numbers are stored in this IEEE standard.

play21:48

So if you have any question or suggestion, then do let me know here in the comment section

play21:53

below.

play21:54

If you like this video, hit the like button and subscribe to the channel for more such

play21:57

videos.

Rate This

5.0 / 5 (0 votes)

相关标签
IEEE StandardFloating PointBinary NumbersComputer ScienceData RepresentationMantissaExponentBias RepresentationSingle PrecisionDouble Precision
您是否需要英文摘要?