Improving execution times in a Java program can be very important
part of making it a usable tool. There are various profiler
tools available commercially that give detailed information on the
time taken by various parts of a program. The Sun J2SE java program,
in fact, includes the options -Xprof
and -Xrunhprof
to produce time profiles.
There are some optimization steps that the programmer
can make to improve the performance. For example, if you are sure
that you will not be overriding a method then you should you use
the modifier final.
The compiler can then inline the method since the interpreter
will not have to search for possible overriding methods.
Inline puts the program code directly in the execution
flow and therefore avoids jumping to and back from another section
of memory.
For AWT components we saw that double buffering and
other techniques (see Chapter
11: Supplements: AWT Flicker ) reduce the number of calls to
the host windowing system and this speeds execution considerably.
(Swing components already employ double bufferiing.)
Avoiding multiple drawing calls reduces drawing times.
We mentioned in a Chapter 6:
Java : Drawing Polygons, a considerable speedup can be obtained
by using a single drawPolyline()
call instead of multiple drawLine()calls
when drawing a curve.
Timing with System.currentTimeMillis()
When you are unsure about the relative speed of different
coding techniques, you can test them with the static method System.currentTimeMillis().
This method returns a long value equal to the number of milliseconds
since a standard date in 1970. That number in itself is of little
value, but the difference between two millisecond time readings
can be meaningful. Thus, you can bracket a code section of interest
with this method and find the difference in milliseconds to get
an idea of how long that section of code takes to run.
On today's high performance machines, you often need
to loop over an operation a large number of times to produce a time
in the millisecond range. Such a timing measurement has come to
be known as a microbenchmark and is fraught with difficulties
for a variety of reasons including just-in-time and dynamic compilation
techniques, compilation warm-up periods, the use of separate compile
threads, background compilation, compiler optimizations, dead code
removal, and others.
Nevertheless, many people still rely on System.currentTimeMillis()
to measure execution time differences. We explain here how that
is normally done and attempt to ameliorate many of the concerns
that make microbenchmarks untrustworthy.
With multiprocessing happening in the operating system
and multithreading in the JVM, the measured times will vary from
run to run, so a statistical average is more meaningful than a single
run. It is wise to unload any unnecessary programs that might steal
time slices away from the program you are testing.
Demo Timing Application
Program
As an example, consider the java.util.Vector
class we discussed in Chapter
10 : Java. It is often very useful compared to a regular fixed-length
array because of Vector's
ability to remove and add elements. However, in situations where
high performance is required, you may find that an array is preferable
because Vector
offers noticeably slower performance than an array.
One reason that Vector
is slow is because it is synchronized
for thread safety. An alternative that can be used in most situations
is the ArrayList
class which does not have the synchronization overhead. In the test
code below we compare timings of object arrays, Vector,
and ArrayList
objects. For consistency, we populate each with Integer
objects.
In the main()
method shown below in the class TimeTest,
we read in some command line parameters and then call doTests()
several times. Inside doTests()
we call four methods that test fetching Integer
objects from four different container types - Vector,
ArrayList,
regular object arrays, and an ArrayList
collection object that takes advantage of the new generics feature
in Java 5.0 (see Chapter 10).
If you want to run this code on a pre-5.0 Java system, you'll need
to comment out the generics version.
The program uses the following methods to check the
four container types:
testv() - vector
testal() - ArrayList
testali() - ArrayList<Integer>
testa() - array
In each case, the fetches from the containers are
in a loop in order to make the total time large enough to measure.
We have to do something with the Integer
objects fetched from the containers or else the compiler will notice
that nothing is changing inside the loop and remove the loop altogether.
So for our timing loop we first cast the Object
type retrieved from the container to an Integer
and then we accumulate a total sum of all the values after converting
from Integer
type to int type using the intValue()
method on Integer.
TimeTest.java
|
import java.util.*;
/**
* Run timing tests on four types of code:
* 1 - Vector element access
* 2 - ArrayList element access
* 3 - ArrayList element access.
* 4 - Object array access (using Integer
objects
**/
public class TimeTest
{
private int fWarmup, fNum, fOuter, fSleep;
/** Fill timing parameters. **/
public TimeTest (int warmup, int sleep, int num,
int outer) {
fWarmup = warmup;
fSleep = sleep;
fNum = num;
fOuter = outer;
} // ctor
/** Run tests. **/
public static void main (String[] args) {
int warmup=0, sleep=0, num=0, outer=0,
total=0;
if (args.length == 5) {
warmup = Integer.parseInt
(args[0]);
sleep = Integer.parseInt
(args[1]);
num =
Integer.parseInt (args[2]);
outer = Integer.parseInt
(args[3]);
total = Integer.parseInt
(args[4]);
System.out.println ("Testing
with warmup = " + warmup +
", sleep
= " + sleep + ", num = " + num +
", outer
= " + outer + ", total = " + total);
}
else {
System.err.println
("Usage: java TimeTest warmup " +
"sleep loop
outer total");
System.exit
(1);
}
System.out.println ("V\tAL\tALI\tarray");
TimeTest tt = new TimeTest (warmup,
sleep, num, outer);
for (int i = 0; i < total; i++) {
tt.doTests ();
}
} // main
/** Carry out the different set of timing tests.**/
public void doTests () {
long vectime =
testv ();
long arlisttime = testal ();
long alitime =
testali ();
long arraytime = testa
();
System.out.println (vectime + "\t"
+ arlisttime +
"\t" + alitime + "\t"
+ arraytime);
} // doTests
/**
* Create a Vector and an array whose
elements reference
* Integer objects. A time comparison
is made between accessing
* the Integer objects with the Vector
and the array.
**/
public long testv () {
// Create Vector and fill elements
with Integer objects
Vector vec = new Vector (fNum);
for (int i=0; i < fNum; i++) {
vec.add (new
Integer (i));
}
// Now test access times by looping
through the Vector
// to access each Integer element.
// First warmup the hotspot compiler.
long sum = 0;
for (int i = 0; i < fWarmup; i++)
{
sum += ((Integer)(vec.elementAt
(i))).intValue ();
}
// And give it time to finish the
JIT compile
// in the background.
if (fSleep > 0) {
try {
Thread.sleep
(fSleep);
}
catch (InterruptedException
e)
{}
}
// Then do the loop for real and time
it.
long t1 = System.currentTimeMillis
();
for (int j = 0; j < fOuter; j++) {
for (int i
= 0; i < fNum; i++) {
sum
+= ((Integer)(vec.elementAt (i))).intValue ();
}
}
long t2 = System.currentTimeMillis
();
long elapsed = t2 - t1;
vec.clear ();
vec = null;
System.gc ();
return elapsed;
} // testv
/** Create Object array and fill elements with
Integer objects.**/
public long testa () {
Object[] array = new Object[fNum];
for (int i=0; i < fNum; i++){
array[i] =
new Integer(i);
}
// Now test access times by looping
through the array
// to access each Integer element.
// First warmup the hotspot compiler.
int sum = 0;
for (int i=0; i < fWarmup; i++) {
sum += ((Integer)
(array[i])).intValue();
}
// And give it time to finish the
JIT compile inthe background.
try {
Thread.sleep (fSleep);
}
catch (InterruptedException e) {}
// Then do the loop for real and time
it.
long t1 = System.currentTimeMillis();
for (int j=0; j < fOuter; j++) {
sum = 0;
for (int i=0;
i < fNum; i++) {
sum
+= ((Integer) (array[i])).intValue();
}
}
long t2 = System.currentTimeMillis();
long elapsed = t2 - t1;
array = null;
System.gc ();
return elapsed;
} // testa
/** Create ArrayList and fill elements with Integer
objects.**/
public long testal () {
ArrayList al = new ArrayList (fNum);
for (int i=0; i < fNum; i++){
al.add (new
Integer(i));
}
// Now test access times by looping
through the array list
// to access each Integer element.
// First warmup the hotspot compiler.
int sum = 0;
for (int i=0; i < fWarmup; i++) {
sum += ((Integer)(al.get
(i))).intValue();
}
// And give it time to finish the
JIT compile inthe background.
try {
Thread.sleep (fSleep);
}
catch (InterruptedException e) {}
// Then do the loop for real and time
it.
long t1 = System.currentTimeMillis();
for (int j=0; j < fOuter; j++) {
sum = 0;
for (int i=0;
i < fNum; i++) {
sum
+= ((Integer)(al.get (i))).intValue();
}
}
long t2=System.currentTimeMillis();
long elapsed = t2 - t1;
al.clear ();
al = null;
System.gc ();
return elapsed;
} // testal
/** Create ArrayList and fill elements
with Integer objects. **/
long testali () {
ArrayList ali = new ArrayList
(fNum);
for (int i=0; i < fNum; i++){
ali.add (i);
}
// Now test access times by looping
through the array list
// to access each Integer element.
// First warmup the hotspot compiler.
int sum = 0;
for (int i=0; i < fWarmup; i++) {
sum += ali.get
(i);
}
// And give it time to finish the
JIT compile in the background.
try {
Thread.sleep (fSleep);
}
catch (InterruptedException e) {}
// Then do the loop for real and time
it.
long t1 = System.currentTimeMillis
();
for (int j=0; j < fOuter; j++) {
sum = 0;
for (int i=0;
i < fNum; i++) {
sum
+= ali.get (i);
}
}
long t2 = System.currentTimeMillis();
long elapsed = t2 - t1;
ali.clear ();
ali = null;
System.gc ();
return elapsed;
} // testali
} // class TimeTest
|
In the testv()
method, for example, we first populate a Vector
with Integer
objects. Then we begin to fetch from the Vector
and accumulate the sum. We are using Sun's Hotspot(tm)
compiler, which uses adaptive optimization, so we want to give
it time to warm up, we first perform a short summation
loop without timing. The number of warm up loops is specified
by one of the command line parameters. After warming up we sleep
a short while.
The purpose of the sleep is to relinquish the processor
so that the Hotspot compiler can finish a background compile of
the loop if needed. Finally we begin the timing loop by calling
System.currentTimeMillis(),
followed by the loop itself (actually an inner and outer loop)
and then another call to System.currentTimeMillis()
so we can calculate the elapsed time. We then clear the Vector,
set the reference to null, and request garbage collection (System.gc
()). The elapsed time is returned to doTests()
where it is printed to the console.
The command line parameters adjust the number of
warm up loops, the sleep time, the number of inner and outer loops
to be timed, and the total number of times to run the tests. If
you download this code you can experiment with the various parameters
to see what effect they have on the timing results. The results
can vary widely based on the platform in use.
In a run of the program, via the command line we
set the five parameters as follows:
num warmup iterations
= 3000
sleep = 300 msecs
num inner loop interations = 3000
num outer loop iterations = 3000
num test runs
= 4
The following four runs show typical output variations:
C:\Java\...>java
TimeTest 3000 3000 3000 3000 4
Testing with warmup = 3000, sleep = 10, num = 3000, outer
= 3000, total = 4
V AL AL<I>
array
1032 203 187
47
1031 219 187
47
1062 218 203
110
1031 203 187
63
C:\Java\...>java TimeTest 3000 10 3000 3000 4
Testing with warmup = 3000, sleep = 10, num = 3000, outer
= 3000, total = 4
V AL AL<I>
array
1062 219 188
46
1406 328 343
109
1391 328 359
110
1407 328 360
109
C:\Java\...>java TimeTest 3000 10 3000 3000 4
Testing with warmup = 3000, sleep = 10, num = 3000, outer
= 3000, total = 4
V AL AL<I>
array
1047 204 360
110
1406 219
359 47
1406 344
343 109
1406 344
360 109
|
For this case we see that the Vector
is slowest, the ArrayList
is up to four to five times faster than the Vector,
the ArrayList<Integer>
is varies from faster to slower than the non-generic version, and
the simple object array is fastest of all; up to 14 times faster
than the Vector
and 2-4 times faster than the ArrayList.
While the exact times will vary considerably from
platform to platform, these general relavtive timings will hold.
So from these tests you would clearly not want to use a Vector
in a time critical code segment and use a plain array where time
is paramount. So cleary we see that such timing tests can be valuable
even if the absolute numbers are not especially meaningful.
Other performance tips
Here are some tips on performance coding. See the
reference section below for links to a number of sites with extensive
lists of performance coding recommendations.
-
Local variables run faster than instance variables.
- The long
and double
type variables typically require extra time to access and modify
since they contain additional bytes compared to int
and float.
However, significantly slower performance is not true of all JVMs
on all platforms since some take advantage of particular processor
capabilities.
- " The JVM bytecode is weighted towards int
operations so use int
type except where you specifically need to use one of the other
types. Math operations with the shorter integer types are widened
in the JVM to int.
For example, a sum of two byte values results in an int
value (that may be cast back to a byte
if the result is to be stored into a byte
variable).
- Use x += A;
assignment commands rather than x
= x + A; The first requires 1 instruction and the latter
4.
- If you are concatenating lots of strings, use StringBuffer
and its append()
method instead of the String
"+" operator. If a Java 5.0 JVM is available, use StringBuilder
instead.
- As demonstrated in the TimeTest
example above, the original Java utility classes like Vector,
Enumeration,
and Hashtable
are slow because they use synchronization for thread safety. Although
synchronization overhead has been significantly reduced in modern
versions of Java, if you do not require synchronization, then
the newer utilities from the Java Collections Framework are normally
superior. The new classes ArrayList
and HashMap
generally replace Vector
and Hashtable,
respectively, and Iterator
is preferred over Enumeration.
-
Avoid creating lots of objects if possible,
especially in loop in which speed is important. Creating an
object takes time and uses up memory resources.
-
Avoid method calls in a loop as much as possible.
That is, do your loops inside of methods rather than calling
methods inside of loops.
-
Performance can vary significantly among different
JVMs for the same platform. Do timing measurements for all the
JVMs that would potentially be used with your program
References & Web Resources
Latest update: Oct. 3, 2005
|