Java Tech - Chapter 2 : Science & Engineering

Home : Course Map : Chapter 2 : Tech :

Floating Point - Part 1
Overview

JavaTech

Course Map

Chapter 2

Introduction
Essentials
Structure
Keywords
Primitive Types
Comments
Literals
Expressions
Operators
Statements
Casts & Mixing
Strings
Console Output
Demo
Exercises

Supplements
Conditional: if-else
Repetitions
Flow Control
Java vs C/C++
JVM Instructions 1

     About JavaTech
     Codes List
     Exercises
     Feedback
     References
     Resources
     Tips
     Topic Index
     Course Guide
     What's New

In this section we will review here some general aspects of floating point numbers. See the presentation by J. D. Darcy and other FP references for more in-depth discussions.

A floating-point number is represented in binary as

±b0.b₁b₂b₃...b_n-1 * 2^exponent

where b_i represents the i bit in the n bits of the significand (also called the mantissa). In addition, there is a bit to indicate the sign. A floating-point value is calculated as

(-1)^s ·(b₀ + b₁·2^-1 + b₂·2^-2 + b₃·2^-3 + ...+ b_n-1·2^-(n-1))·2^exponent

where s is a bit for the sign. Floating-point numbers involve a number of complications with which the processor designers must deal. These complications include

For fractional numbers and for very large or very small numbers, advanced processors provide floating point representations. In the bit representation for the Java float type:

1 bit	8 bits	23 bits
Sign	exponent	significand

and for double type

1 bit	11	52
Sign	exponent	significand

Floating point numbers on computers involve a number of complications:

Approximations
T he limited number of places in the significand means that only a finite number of fractional values can be represented exactly. Similarly, the finite width of the exponents limit the upper and lower size of the numbers.
Round-off
Arithmetic operations will often result in the need to round off the fractional values. A round-off (or truncation) algorithm must be chosen by the designer of the language. Round-offs can have a significant impact on a long calculation as the errors accumulate.
Overflows/Underflows
Similarly a calculation may result in a number that is smaller or larger value that the floating point type can represent. Again, the language designer must select a strategy for how to handle such situations.
Decimal-Binary Conversion
The computer represents numbers in base 2. This can result in loss of precision since often a binary fraction cannot exactly represent a given finite decimal fraction (0.1 for example). All finite binary fractions, however, can be converted to finite decimal fractions.

Java & Floating Point

To handle these FP issues, Java follows the IEEE 754 standard in most cases. In this standard :

Round-off takes the binary value nearest to the exact (or higher precision intermediate) value. If two binary values are equally close, then choose the even value; that is, the one with its last bit equal to 0.
Overflows/Underflows are represented by positive or negative infinity values. Similarly, for undefined numbers, such as 0/0, use a Not-a-Number (NaN) representation. No error messages are thrown for any of these cases.

Note that even simple calculations with FP can provide surprising results. For example, the following code

  float f = 0.0;
  for (int i=1; i <= 10; i++) {
    f += 0.1;
  }

does not result in exactly f = 1.0 (even if double is chosen for f) because, as mentioned above, 0.1 is not exact in binary format.

For similar reasons,

Avoid equality (a == b) tests between two floating point variables.
Instead, test with < , <= , >= , > .
However, in some situations it may be sensible to test for equality to 0.0 to avoid divide by zero errors.

In Java the float representation has a 23 bit significant and double has a 53 bit mantissa. This means that float gives 6 to 9 digits of decimal precision while double gives 15 to 17 digits of decimal precision.

In general, it is far safer to do calculations in double type. This helps to reduce round-off errors that lower the precision during the intermediate calculations. (You can always cast the final value to float if that is a more convenient size such as for I/O or storage.)

Remember the difference between precision and accuracy:

Precision - how fine a distinction can be made between two close values.
Accuracy - how close the value is to the correct value.

References & Web Resources

Chapter 2: Tech: Floating Point II provides more details about FP in Java.
Chapter 5: Tech discusses formatting of decimal values in I/O.
Chapter 10 : Tech: Arbitary Precision discusses the arbitrary precision class BigDecimal.
Ronal Mak, Java Number Cruncher...
More floating point references

Latest update: Oct. 15, 2004

            Tech
Arithmetic Ops
Math Class
More on Integers
FP : Overview
FP : Java
  Demo 1
More Mix/Cast
  Demo 2
Exercises

           Physics
Differential Eq.
Euler Method
  Demo 1
Predictor-Corrector
  Demo 2
Exercises

	Part I	Part II	Part III
Java Core	1 2 3 4 5 6 7 8 9 10 11 12	13 14 15 16 17 18 19 20 21	22 23 24
Supplements	1 2 3 4 5 6 7 8 9 10 11 12
Tech	1 2 3 4 5 6 7 8 9 10 11 12
Physics	1 2 3 4 5 6 7 8 9 10 11 12

Java is a trademark of Sun Microsystems, Inc.