A histogram provides the frequency distribution of
values taken by a parameter of interest. These types of distributions
are often needed in science and engineering studies.
For example, say that an experiment counts the number
of cosmic rays passing through a detector every minute. We might
expect, say, a distribution of "hits" per minute to have
a mean around 10. We record the number of hits every minute over
a period of a few hours and then make a histogram of the values.
We create a histogram with, say, 10 bins in
which the first bin holds the number of times the hits measured
between 0 and 4, the second bin for between 5 and 8, and so forth,
up to the last bin which counts the number of times the hits were
between 37 and 40. The resulting Poisson type distribution would
peak below 10 and have a tail that falls to zero on the low side
and a long tail on the high side.
The following BasicHist.java
class provides some essential histogram features. The class attributes
include instance variables for the number of bins, an integer array
of bins, over and under flow counts, and the minimum and maximum
values that the histogram spans. The constructor creates an instance
of the class for a given set of bins and for a lower and upper parameter
range.
The three methods provide for adding an entry to the
histogram, clearing the histogram, and for obtaining the values
in the bins (including the over and underflows.)
BasicHist.java
|
/**
A simple histogram class to count the frequency of
* values of a parameter of interest.
**/
public class
BasicHist
{
int
[] bins;
int numBins;
int underflows;
int overflows;
double lo;
double hi;
double range;
/** The constructor will
create an array of a given
* number of bins. The range of the
histogram given
* by the upper and lower limt values.
**/
public
BasicHist
(int
numBins, double lo, double
hi) {
this.numBins
= numBins;
bins = new
int[numBins];
this.lo =
lo;
this.hi =
hi;
range = hi - lo;
}
/**
* Add an entry to a bin.
* Include if value is in the range
* lo <= x < hi
**/
public void
add(double x) {
if(
x >= hi) overflows++;
else if(
x < lo) underflows++;
else {
double
val = x - lo;
//
Casting to int will round off to lower
// integer value.
int
bin = (int)(numBins
* (val/range)
);
//
Increment the corresponding bin.
bins[bin]++;
}
}
/** Clear the histogram
bins. **/
public void
clear() {
for
(int
i=0; i < numBins; i++) {
bins[i] =
0;
overflows = 0;
underflows= 0;
}
}
/** Provide access to the
bin values. **/
public int
getValue(int
bin) {
if
(bin
< 0)
return
underflows;
else
if(
bin >= numBins)
return
overflows;
else
return
bins[bin];
}
} |
The applet below creates and instance of BasicHist
and uses it to provide a histogram of the distribution of values
generated by Gaussian random number generator. (We will discuss
details about random number generation in Java in Chapter
4: Tech.)
BasicHistApplet1.java
(Output goes to browser's Java
console.) |
public
class
BasicHistApplet1
extends
java.applet.Applet
{
public
void init() {
// Create
an instance of the Random class for
// producing our random values.
java.util.Random r = new
java.util.Random();
// Them
method nextGaussian in the class Random produces
// values centered at 0.0 and
with a standard deviation
// of 1.0.
// Create an instance of our
basic histogram class.
// Make it wide enough enough
to include most of the
// gaussian values.
BasicHist bh = new
BasicHist (10,-2.0,2.0);
// Fill
the histogram
for
(int i=0; i < 100; i++)
{
double val = r.nextGaussian
();
bh.add
(val);
}
// Print
out the frequency values in each bin.
for(int
i=0; i < 10; i++) {
System.out.println("Bin
" + i + " = "+ bh.getValue (i));
}
// Negative
bin values gives the underflows
System.out.println
("Underflows = "+ bh.getValue
(-1));
// Bin
values above the range give the overflows.
System.out.println
("Overflows = "+ bh.getValue
(10));
//------------------------------------------------
// and this line.
}
... standard code...
} |
When we run the above applet, an output similar to
the following will be produced:
Bin 0
= 3
Bin 1 = 8
Bin 2 = 12
Bin 3 = 14
Bin 4 = 15
Bin 5 = 17
Bin 6 = 9
Bin 7 = 9
Bin 8 = 7
Bin 9 = 3
Underflows = 1
Overflows = 2 |
We see that the distribution does in fact roughly follow the general
shape of a guassian centered in the middle bins.
Object Oriented vs Procedural
The histogram example provides a nice illustration of the power
and utility of object oriented program design.
If your program had 20 parameters to examine, you can simply create
20 instances of BasicHist,
each with its own number of bins and range limits relevant to that
parameter. If at some later point, we add new methods and instance
variables to the BasicHist,
you don't not need to modify the code in your program as long as
the changes are internal and don't affect the argument lists in
the methods that you invoke.
If you think about how to do histogramming in a purely procedural
code manner, you will start to appreciate the elegance of the object
oriented approach. In a procedural program, you would need to create
arrays to hold the histogram values. Most likely you would use a
2-D array with the first index for the given histogram, and the
second index corresponding to the bins.
Similarly, you could use arrays to hold the parameters of the histograms,
such as the number of bins, the lower and upper ranges, and so forth.
Functions to add entries to the histograms would require a lot of
bookkeeping to determine which histogram was needed. With classes,
we just create 20 instances of the BasicHist
type and each instance knows which histogram it is and you
don't have to worry about keeping track of the histogramming details
in your code.
Furthermore, if you wanted to use the histogram code in another
program, it would be messy to extract just that code from the program
and move it into the new one. The encapsulation aspect inherent
to the object approach makes reusability far easier that with procedural
code.
References & Web Resources
Latest update: Oct. 19.2004
|