Home : Course Map : Chapter 2 : Tech :
Floating-Point - Part 2
Floating-point in Java
JavaTech
Course Map
Chapter 2

Introduction
Essentials
Structure

Keywords
Primitive Types
Comments
Literals
Expressions
Operators
Statements
Casts & Mixing
Strings
Console Output 
   Demo
Exercises

    Supplements
Conditional: if-else
Repetitions
Flow Control

Java vs C/C++
JVM Instructions 1

     About JavaTech
     Codes List
     Exercises
     Feedback
     References
     Resources
     Tips
     Topic Index
     Course Guide
     What's New

This reference page gives various technical details of floating-point (FP) numbers in Java. This information is quite useful if you plan on doing extensive numerical calculations with Java. We recommend that newcomers to Java should just scan the info and come back to it later as needed.

Floating-Point Representations

Floating-point values in Java, which follows most of the standard IEEE 754 floating-point specifications, are represented by two types: the float and double. As shown previously, the bit representation for float goes as

1 bit 8 bits 23 bits
Sign exponent significand

and for double type

1 bit 11 52
Sign exponent significand

For float the 8 bits of the exponent give values in the range of 0-255. However, 0 and 255 are special values (discussed below), so the allow values range from 1 to 254. A bias of 127 is subtracted to give an unbiased exponent range of -126 to 127.

Similarly, for double the 11 bits of the exponent give values in the range of 0-2047. In this case, 0 and 2047 are special values (discussed below), so the allow values range from 1 to 2046. A bias of 1023 is subtracted to give an unbiased exponent range of -1022 to 1023.

The float representation gives 6 to 9 digits of decimal precision while double gives 15 to 17 digits of decimal precision.

When the exponent values are in their allowed unbiased ranges, the representations are said to be nomalized. In the normalized modes, the b0 value in

 (-1)s ·(b0 + b1·2-1 + b2·2-2 + b3·2-3 + ...+ bn-1·2-(n-1))·2exponent

is taken as 1 so that the effective number of bits is increased to 24 for float and 53 for double.

When the biased exponent is zero (i.e. all bits are zero), the value is is denormalized and the b0 value is taken as 0. The exponent is taken to be -126 for float and -1022 for double. The denormalized mode allows for a "smoother approach to zero" at the smallest value range.

The following shows the minimum and maximum values possible with these types in the two different modes:

  • float

    • Normalized

      -127 < exponent < +128

      min = 2-126 * 1.00000000000000000000000 = 1.17549435E-38
      max = 2+127 * 1.11111111111111111111111 = 3.4028235E+38

    • Denormalized

      exponent = -126

      min = 2-126 * 0.00000000000000000000001 = 1.4012985E-45
      max = 2-126 * 0.11111111111111111111111 = 1.1754942E-38

  • double

    • Normalized

      -1023 < exponent < +1024

      min = 2-1022 * 1.0000000000000000000000000000000000000000000000000000
          = 2.2250738585072014E-308

      max = 2+1023 * 1.1111111111111111111111111111111111111111111111111111
          = 1.7976931348623157E+308


    • Denormalized

      exponent = -1022

      min = 2-1022 * 0.0000000000000000000000000000000000000000000000000001
          = 4.9E-324

      max = 2-1022 * 0.1111111111111111111111111111111111111111111111111111
          = 2.225073858507201E-308

The normalized/denormalized modes are not usually something the programmer has to deal with but for numerical computing can be of possible importance.

Next we look at the other special floating-point values.

Floating-Point Special Values

Operations with floating-point never result in an exception thrown. (Exceptions are Java error conditions, to be discussed later.) For example, even if an operation results in a divide by zero there is no exception message. (An integer divided by zero does give an exception.)

Instead of error messages for abnormal operations, the floating-oint result is filled with one of several special floating-point values:

The special floating-point cases include:

  • +/- Zero : if the bits in both the exponent and the significand all equal 0, then the FP value is -0 or +0 depending on the sign bit.

    • Positive zero is produced by underflow form the positive direction, e.g.
        x = 2.0e-45 * 1.0e-10

    • Negative zero is produced by underflow from the negative direction, e.g.
        x = -2.0e-45 * 1.0e-10

  • +/-Infinity : if all the bits in the exponent equal 1 and all the bits in the significand equal 0, then the FP value is -Infinity or +Infinity depending on the sign

    • Positive infinity is produced by overflow of a positive value

    • Negative infinity is produced by overflow of a negative value

  • NaN : if all the bits in the exponent equal 1 and any of the bits in the significand equal 1, then the FP value is Not-a-Number and the sign value is ignored. Produced by operations such as a divide by zero and square root of -1.

Overflows, underflows and divide by zero in Java do not lead to error states. A division by zero leads to the +/-Infinity value unless the nominator equaled zero, in which case the NaN value appears. You can test for such values using methods from the floating-point wrapper classes (see Chapter 3: Java.) such as Double.isNaN(double x). Also, the NaN value can be checked for with the test  if ( x != x) statement which will fail for NaN values.

Finite floating-point numbers and the special values are ordered from smallest to largest as follows:

  • NEGATIVE_NFINITY
  • negative finite values
  • -ZERO and +ZERO compare as equal
  • positive finite values
  • POSITIVE_INFINITY

The positive and negative zero values act as

  • Positive zero and negative zero compare as equal
  • 1.0 / (positive zero) ==> POSITIVE_INFINITY
  • 1.0 / (negative zero) ==> NEGATIVE_INFINITY

The NaN values are unordered. This means that:

  • Numerical comparisons and tests for numerical equality result in false if either or both operands are NaN.

  • A test for numerical equality of a value against itself results in false if and only if the value is NaN.

  • A test for numerical inequality results in true if either operand is NaN
Extended Exponents

The JVM Specifications after version 1.1. allow for an implementation to include extended exponent versions of either or both the float and double types during intermediate calculations to avoid over/under flows.

  • N = number bits in mantissa
  • K = number bits exponent
  • Emax = maximum value of exponent
  • Emin = minimum size of exponent.

The table maps the floating-oint specifications allowed for the four types.

Parameter float float-extended-exponent double double-extended-exponent
N 24 24 53 53
K 8 > 10 11 > 14
Emax +127 > +1022 +1023 > +16382
Emin -126 < -1021 -1022 < -16381

 

The final accessible floating-point results will be in float or double types but intermediate floating-point values can use the larger extended exponent representations if the platform processor allows it. There is no access for the Java programmer to the extended exponent types.

The JVM does not support either the official IEEE 754 single extended or double extended format since these extended formats require extended precision, i.e. longer significand, in addition to the extended exponent ranges shown in the above table.

The documentation for a particular JVM should indicate whether it allows for the extended exponent options.

The modifier strictfp in front of a method will force the precision to remain at 64 bit for all calculations within that method. This is useful if one wants to ensure exactly the same results regardless of the platform or JVM implementation.

(This is not related to the strictMath class discussed in the Math class section.)

Floating Point Literals and Rounding Rules

Some more notes about Java floating-point include:

Literals

Literals default to double unless appended with f or F:

    float x=1.0;  // compile time error
    float x=1.0f; // OK
    double x=1.0; // OK

Floating-point rounding:

The JVM uses IEEE 754 round-to-nearest mode: inexact results are rounded to the nearest representable value, with ties going to the value with a zero least-significant bit.

Instructions that convert values of floating-point types to integer values will round towards zero.

Floating-Point Programming Notes

In general, it is safest to do floating-point calculations in double type. This helps to reduce round-off errors that can reduce precision during intermediate calculations. (You can always cast the final value to float if that is a more convenient size for I/O or storage.) There can be some performance tradeoff, since double operations involve more data transfer, but the size of the tradeoff depends on the JVM and the platform. (In Chapter 12 we discuss techniques for measuring code performance.)

The representations of the primitives are the same on all machines to insure the portability of the code. However, during calculations involving floating-point values, intermediate values can exceed the standard exponent ranges if allowed by the particular processor (see table above).

The strictfp modifier of classes or methods requires that the values remain within the range allowed by the Java specifications throughout the calculation to insure the same results on all platforms.

Floating-Point Demo

Here we use an applet to display results of several math expressions. To see outputs from the print statements run with an appletviewer or look in the browser's Java console. You can also run it as an application. Try to predict the results before looking at the output.

import java.applet.Applet;
import java.awt.*;

/** This applet tests various math expressions.
  * Run with appletviewer to see print out on
  * screen or with a browser Java console.
 **/
public class FPSpecialValues extends Applet {

  public void init() {
    // FP literals are double type by default.
    // Append F or f to make float or cast to float
    float x = 5.1f;
    float y = 0.0f;

    float div_by_zero = x/y;
    System.out.println ("Divide By Zero = x/y = " + div_by_zero + "\n");

    x = -1.0f;
    div_by_zero = x/y;
    System.out.println ("Divide negative by zero = x/y = " + div_by_zero +
                                    "\n");

    x = 2.0e-45f;
    y = 1.0e-10f;
    float positive_underflow = x*y;
    System.out.println ("Positive underflow = " + positive_underflow +
                                     "\n");

    x = -2.0e-45f;
    y = 1.0e-10f;
    float negative_underflow = x*y;
    System.out.println ("Negative underflow = " + negative_underflow +
                                      "\n");

    x = 1.0f;
    y = negative_underflow;
    float div_by_neg_zero = x/y;
    System.out.println ("Divide 1 by negative zero = " + div_by_neg_zero +
                                       "\n");

    x = 0.0f;
    y = 0.0f;
    float div_zero_by_zero = x/y;
    System.out.println ("Divide zero by zero = " + div_zero_by_zero + "\n")
  }

  public void paint (Graphics g) {
    g.drawString ("Math tests",20,20);
  }
}

 

References & Web Resources

Latest update: Oct. 15, 2004

            Tech
Arithmetic Ops
Math Class
More on Integers
FP : Overview
FP : Java  
  
Demo 1
More Mix/Cast
  Demo 2
Exercises

           Physics
Differential Eq.
Euler Method
  
Demo 1
Predictor-Corrector
  
Demo 2
Exercises

  Part I Part II Part III
Java Core 1  2  3  4  5  6  7  8  9  10  11  12 13 14 15 16 17
18 19 20
21
22 23 24
Supplements

1  2  3  4  5  6  7  8  9  10  11  12

Tech 1  2  3  4  5  6  7  8  9  10  11  12
Physics 1  2  3  4  5  6  7  8  9  10  11  12

Java is a trademark of Sun Microsystems, Inc.