Floating Point Numbers

Robert P. Webber, Scott McElfresh, Longwood University
 
Why are we working on this? Once we are not restricted to integer values, there are not only infinitely many numbers, but infinitely many within a given range. Because a set number of bits, such as 8 or 32 or 64, can't represent infinitely many values, we have to do a bit more representational work to handle non-integer values. Furthermore, our techniques in this section give rise to some interesting new tradeoffs to think about. Skills in this section: Represent non-integers in scientific notation --- Convert numbers to/from IEEE normalized form --- Translate IEEE normalized form to/from bit patterns Concepts: Data representation, Limitations

Introduction

We have seen how decimal fractions can be converted to binary.  For instance, we can write  6.2510  as:
4 + 2 + 1/4 = 22 + 21 + 2-2
= 1*22 + 1*21 + 0*20 + 0*2-1 + 1*2-2
= 110.012

Teaching a computer how to do arithmetic using such binary fractions would be difficult.  One problem is that the binary point is not fixed; it needs to float.  If you multiply a two place fraction by another two place fraction, for instance, the result has four fractional places, not two.

Computer scientists realized that it would be easier to do floating point arithmetic if the numbers were written in scientific notation.  You may recall this notation from your science classes, where very large numbers and numbers that are very close to zero are written using powers of ten.  For example,

    1,234,000,000    =  1.234*109 ,

    0.0000567    =  5.67*10-5 .

Computers use binary notation and powers of two, of course.  The resulting format is called floating point notation.

A floating point number has three parts:  its sign, a fractional part, and an exponent:

    +/-    fractional_part * 2exponent

For instance, the decimal floating point number  5.16 * 213  has a positive sign, a fractional part of  5.16, and an exponent of  13.  It is equivalent to

    5.16 * 213 = 5.16 * 8192 = 42,279.72

in ordinary signed decimal form.

The Basic Ideas

Open this pdf to read about the basic idea of the process for converting to floating point representations.

Standardizations


Open this pdf to read about standardized formats for floating point numbers.

Magnitude and Precision


Magnitude refers to the raw size of the number; that is, how large or small it can be.  Precision refers to the number of digits of accuracy in a number.    Magnitude and precision measure different things.  Magnitude refers to the possible number of digits, precision to how many of those digits are accurate.

Often we don’t care much about the precision in very large numbers.  For instance, the 2007 United States population was 302 million  people.  We don’t really mean exactly 302,000,000 , of course.  The number is accurate to only three digits.  Indeed, it would not be possible to be much more accurate, since the exact population is constantly changing.

When a computer stores a number in integer format, every digit is accurate.  The largest integer that can be stored in  32  bits using two’s complement notation is  231 -1, which is  2,147,483,648.  Say an integer has value  15,431.  We can be sure that each digit is correct.  However, an integer such as 3,500,630,119  cannot be stored in  32  bits.  It is too large.

Floating point numbers generally do not have this precision property.  The magnitude is determined by the exponent, while the precision is determined by the fractional part.  All digits of a displayed floating point number may not be accurate.

In IEEE format, the precision is  24  bits, including the hidden bit.  This translates to about seven decimal digits of accuracy.  The magnitude, however, is much larger.  The biggest exponent in excess 127 notation that can be stored in eight bits is  127, so the largest number that can be represented has all  1’s  in the fractional part and  12710  in the exponent:  1.11…1 *2127 .  This number is approximately  3.4 * 1038 .

A number that is larger than  3.4 * 1038  cannot be stored in a computer using standard IEEE format.  This is a huge number, but it is possible to exceed it.  For example, there are exactly  35  legal choices for each chess move, but the total number of choices grows exponentially to produce more than  1050  possible board positions, a number too large for even a computer to hold.

It is important to realize that while the magnitude allows us to store numbers, the precision may mean that not all digits are accurate.  For instance, suppose the budget for a large corporation is  $632,785,417.25 .  This number can be stored in standard floating point, because its magnitude is much less than  3.4 * 1038 , but only about seven digits of accuracy will be preserved.  The stored value will be approximately  $632,785,400.00 .  The last several digits will probably be lost.

Other floating point formats are available that increase the precision (but probably not the magnitude).  Regardless, you should always remember that a computer generated number is only accurate to a maximum number of digits (seven for standard IEEE format).  Any digits beyond that maximum will not be reliable.

Exercises

  1. For each, write the decimal number in normalized form; that is, in the form  1.xx…x * 2exponent .

    1. 562
    2. 961
    3. 1055
    4. 2050
    5. -69
    6. -120
    7. 28.125
    8. -106.25
  2. For each,  find the excess 127 form of the base 10 number.

    1. 7
    2. 38
    3. -8
    4. -19

  3. For each, write the decimal number in IEEE floating point format

    1. 42.5
    2. 105.375
    3. -26  1/4
    4. -145.625
    5. 11/16
    6. -15/ 32
  4. For each, can the quantity be stored as a 32 bit integer?  as a standard IEEE floating point number?  In each case, if it can be stored, will the stored number be accurate?  Explain your answers.
    1. The number of seconds in an hour.
    2. The number of seconds in a day.
    3. The number of seconds in a week.
    4. The number of seconds in the month of March.
    5. The number of seconds in a non-leap year.
    6. The numbers of seconds in a century.
    7. The number 3.141592674 (the first ten digits of the number π (pi)).
    8. The number 2.718281828 (the first ten digits of the number e).
    9. The distance from the Sun to the Earth, expressed in miles.

Credits and licensing

This article is by Robert P. Webber and Scott McElfresh, licensed under a Creative Commons BY-SA 3.0 license.

Version 2016-Mar-14 10:00