Implementing the OCR Program
We will now see how the OCR program was implemented. There are several classes that make up the OCR application. The UML diagram is shown in Figure 7.2.

Figure 7.2: The classes of the OCR application
The purposes of each of the files that make up the OCR application are summarized below.
- Entry.java - The drawing area that allows the user to draw letters.
- KohonenNetwork.java - The Kohonen neural network (covered in Chapter 6).
- MainEntry.java - The main frame the application, this class starts the OCR application.
- Network.java - Generic neural network functions not specific to the Kohonen neural network(Covered in Chapter 6).
- NeuralReportable.java - The interface that specifies how a neural network reports its progress.
- Sample.java - Used to display a down sampled image.
- SampleData.java - Used to actually hold a down sampled image.
- TrainingSet.java - The training data for a neural network (covered in Chapter 6).
We will now examine each section of the program. We will begin by examining the process by which the user draws an image.
Drawing Images
Though not directly related to neural networks, the process by which the user is allowed to draw the characters is an important part of the OCR application. We will examine that process in this section. The code that allows the user to draw an image is contained in the Sample.java file, which can be seen in Listing 7.1.
Listing 7.1: Drawing images (Sample.java)
import javax.swing.*;
import java.awt.*;
import java.awt.event.*;
import java.awt.image.*;
public class Entry extends JPanel {
protected Image entryImage;
protected Graphics entryGraphics;
protected int lastX = -1;
protected int lastY = -1;
protected Sample sample;
protected int downSampleLeft;
protected int downSampleRight;
protected int downSampleTop;
protected int downSampleBottom;
protected double ratioX;
protected double ratioY;
protected int pixelMap[];
Entry()
{
enableEvents(AWTEvent.MOUSE_MOTION_EVENT_MASK|
AWTEvent.MOUSE_EVENT_MASK|
AWTEvent.COMPONENT_EVENT_MASK);
}
protected void initImage()
{
entryImage = createImage(getWidth(),getHeight());
entryGraphics = entryImage.getGraphics();
entryGraphics.setColor(Color.white);
entryGraphics.fillRect(0,0,getWidth(),getHeight());
}
public void paint(Graphics g)
{
if ( entryImage==null )
initImage();
g.drawImage(entryImage,0,0,this);
g.setColor(Color.black);
g.drawRect(0,0,getWidth(),getHeight());
g.setColor(Color.red);
g.drawRect(downSampleLeft,
downSampleTop,
downSampleRight-downSampleLeft,
downSampleBottom-downSampleTop);
}
protected void processMouseEvent(MouseEvent e)
{
if ( e.getID()!=MouseEvent.MOUSE_PRESSED )
return;
lastX = e.getX();
lastY = e.getY();
}
protected void processMouseMotionEvent(MouseEvent e)
{
if ( e.getID()!=MouseEvent.MOUSE_DRAGGED )
return;
entryGraphics.setColor(Color.black);
entryGraphics.drawLine(lastX,lastY,e.getX(),e.getY());
getGraphics().drawImage(entryImage,0,0,this);
lastX = e.getX();
lastY = e.getY();
}
public void setSample(Sample s)
{
sample = s;
}
public Sample getSample()
{
return sample;
}
protected boolean hLineClear(int y)
{
int w = entryImage.getWidth(this);
for ( int i=0;i<w;i++ ) {
if ( pixelMap[(y*w)+i] !=-1 )
return false;
}
return true;
}
public void clear()
{
this.entryGraphics.setColor(Color.white);
this.entryGraphics.fillRect(0,0,getWidth(),getHeight());
this.downSampleBottom =
this.downSampleTop =
this.downSampleLeft =
this.downSampleRight = 0;
repaint();
}
}
This class defines a number of properties. These properties are described here.
- downSampleBottom - The bottom of the clipping region, used during down sampling.
- downSampleLeft - The left side of the cropping region, used during down sampling.
- downSampleRight - The right side of the cropping region, used during down sampling.
- downSampleTop - The top side of the cropping region, used during down sampling.
- entryGraphics - A graphics object that allows drawing to the image that corresponds to the drawing area.
- entryImage - The image that holds the character that the user is drawing.
- lastX - The last x coordinate that the user was drawing at.
- lastY - The last y coordinate that the user was drawing at.
- pixelMap - Numeric pixel map that will be actually downsampled. This is taken directly from the entryImage.
- ratioX - The downsample ratio for the x dimension.
- ratioY - The downsample ratio for the y dimension.
- sample - The object that will contain the downsampled image.
Most of the actual drawing is handled by the processMouseMotionEvent. If the mouse is being drug, then a line will be drawn from the last reported mouse drag position to the current mouse position. It is not enough to simply draw a dot. The mouse moves faster than the program has time to accept all values for. By drawing the line, we will cover any missed pixels as best we can. The line is drawn to the off-screen image, and then updated to the users screen. This is done with the following lines of code.
entryGraphics.setColor(Color.black);
entryGraphics.drawLine(lastX,lastY,e.getX(),e.getY());
getGraphics().drawImage(entryImage,0,0,this);
lastX = e.getX();
lastY = e.getY();
As the program runs, this method is called repeadly. This causes whatever the user is drawing to be saved to the off-screen image. In the next section you will learn how to downsample an image. You will then see that the off-screen image from this section is accessed as an array of integers, allowing its "image data" to be worked with directly.
Downsampling the Image
Every time a letter is drawn for either training or recognition, it must be downsampled. In this section we will examine the process by which this downsampling occurs. However, before we discuss the downsampling process, we should discuss how these downsampled images are stored.
Storing Downsampled Images
Downsampled images are stored in the SampleData class. The SampleData class is shown in Listing 7.1.
Listing 7.2: Down sampled image data
public class SampleData implements Comparable,Cloneable {
protected boolean grid[][];
protected char letter;
public SampleData(char letter,int width,int height)
{
grid = new boolean[width][height];
this.letter = letter;
}
public void setData(int x,int y,boolean v)
{
grid[x][y]=v;
}
public boolean getData(int x,int y)
{
return grid[x][y];
}
public void clear()
{
for ( int x=0;x<grid.length;x++ )
for ( int y=0;y<grid[0].length;y++ )
grid[x][y]=false;
}
public int getHeight()
{
return grid[0].length;
}
public int getWidth()
{
return grid.length;
}
public char getLetter()
{
return letter;
}
public void setLetter(char letter)
{
this.letter = letter;
}
public int compareTo(Object o)
{
SampleData obj = (SampleData)o;
if ( this.getLetter()>obj.getLetter() )
return 1;
else
return -1;
}
public String toString()
{
return ""+letter;
}
public Object clone()
{
SampleData obj = new SampleData(letter,getWidth(),getHeight());
for ( int y=0;y<getHeight();y++ )
for ( int x=0;x<getWidth();x++ )
obj.setData(x,y,getData(x,y));
return obj;
}
}
As you can see this class represents a grid of 5X7. All down sampled images will be stored in this class. The SampleData class also includes methods to set and get the data associated with the downsampled grid. The SampleData class also contains a method, named clone, that will create an exact duplicate of this image.
Negating Size and Position
All images are downsampled before being used. This prevents the neural network from being confused by size and position. The drawing area is large enough that you could draw a letter at several different sizes. By downsampling the image down to a consistent size, it will not matter how large you draw the letter, as the downsampled image will always remain a consistent size. This section shows you how this is done.
When you draw an image, the first thing that is done is the program draws a box around the boundary of your letter. This allows the program to eliminate all of the white space around your letter. This process is done inside of the "downsample" method of the "Entry.java" class. As you drew a character this character was also drawn onto the "entryImage" instance variable of the Entry object. In order to crop this image, and eventually downsample it, we must grab the bit pattern of the image. This is done using a "PixelGrabber" class, as shown here.
int w = entryImage.getWidth(this);
int h = entryImage.getHeight(this);
PixelGrabber grabber = new PixelGrabber(entryImage,
0,0,w,h,true);
grabber.grabPixels();
pixelMap = (int[])grabber.getPixels();
After this code completes, the pixelMap variable, which is an array of int datatypes, now contains the bit pattern of the image. The next step is to crop the image and remove any white space. Cropping is implemented by dragging four imaginary lines from the top, left, bottom and right sides of the image. These lines will stop as soon as the cross an actual pixel. By doing this, these four lines snap to the outer edges of the image. The hLineClear and vLineClear methods both accept a parameter that indicates the line to scan, and returns true if that line is clear. The program works by calling hLineClear and vLineClear until they cross the outer edges of the image. The horizontal line method (hLineClear) is shown here.
protected boolean hLineClear(int y)
{
int w = entryImage.getWidth(this);
for ( int i=0;i<w;i++ ) {
if ( pixelMap[(y*w)+i] !=-1 )
return false;
}
return true;
}
As you can see horizontal line method accepts a y coordinate that specifies the horizontal line to check. The program then loops through each x coordinate on that row checking to see if there are any pixel values. The value of -1 indicates white, so it is ignored. The "findBounds" method uses "hLineClear" and "vLineClear" to calculate the four edges. The beginning of this method is shown here.
protected void findBounds(int w,int h)
{
// top line
for ( int y=0;y<h;y++ ) {
if ( !hLineClear(y) ) {
downSampleTop=y;
break;
}
}
// bottom line
for ( int y=h-1;y>=0;y-- ) {
if ( !hLineClear(y) ) {
downSampleBottom=y;
break;
}
}
Here you can see how the program calculates the top and bottom lines of the cropping rectangle. To calculate the top line of the cropping rectangle the program starts at 0 and continues to the bottom of the image. As soon as the first non-clear line is found, then the program establishes this as the top of the clipping rectangle. The same process, only in reverse, is carried out to determine the bottom of the image. The processes to determine the left and right boundaries are carried out in the same way.
Performing the Downsample
Now that the cropping has taken place, the image must be actually downsampled. This involves taking the image from a larger resolution to a 5X7 resolution. To see how to reduce an image to 5X7, think of an imaginary grid being drawn over top of the high-resolution image. This divides the image into quadrants, five across and seven down. If any pixel in a region is filled, then the corresponding pixel in the 5X7 downsampled image is also filled it. Most of the work done by this process is accomplished inside of the "downSampleQuadrant" method. This method is shown here.
protected boolean downSampleQuadrant(int x,int y)
{
int w = entryImage.getWidth(this);
int startX = (int)(downSampleLeft+(x*ratioX));
int startY = (int)(downSampleTop+(y*ratioY));
int endX = (int)(startX + ratioX);
int endY = (int)(startY + ratioY);
for ( int yy=startY;yy<=endY;yy++ ) {
for ( int xx=startX;xx<=endX;xx++ ) {
int loc = xx+(yy*w);
if ( pixelMap[ loc ]!= -1 )
return true;
}
}
return false;
}
The "downSampleRegion" method accepts the region number that should be calculated. First the starting and ending x and y coordinates must be calculated. To calculate the first x coordinate for the specified region first the "downSampleLeft" is used, this is the left side of the cropping rectangle. Then x is multiplied by "ratioX", which is the ratio of how many pixels make up each quadrant. This allows us to determine where to place "startX". The starting y position, "startY", is calculated by similar means. Next the program loops through every x and y covered by the specified quadrant. If even one pixel is determined to be filled, then the method returns true, which indicates that this region should be considered filled. The "downSampleRegion" method is called in succession for each region in the image. This results in a sample of the image, stored in the "SampleData" class. The class is a wrapper class that contains a 5X7 array of Boolean values. It is this structure that forms the input to both training and character recognition.
Using the Kohonen Neural Network
There are many different types of neural networks. Most are named after their creators. The neural network that will be used in this article is a Kohonen neural network. A Kohonen neural network is a two-level network, as seen in Figure 2. The character downsampled pattern that is drawn by the user is fed to the input neurons. There is one input neuron for every pixel in the downsampled image. Because the downsampled image is a 5X7 grid, there are 35 input neurons.
The output neurons are how the neural network communicates which letter it thinks the user drew. The number of output neurons always matches the number of unique letter samples that were provided. Since 26 letters were provided in the sample, then there will be 26 output neurons. If this program were modified to support multiple samples per individual letter, there would still be 26 output neurons, even if there were multiple samples per letter.
In addition to input and output neurons, there are also connections between the individual neurons. These connections are not all equal. Each connection is assigned a weight. The assignment of these weights is ultimately the only factor that will determine what the network will output for a given input pattern. In order to determine the total number of connections, you must multiply the number of input neurons by the number of output neurons. A neural network with 26 output neurons and 35 input neurons would have a total of 910 connection weights. The training process is dedicated to finding the correct values for these weights.
The recognition process begins when the user draws a character and then clicks the "Recognize" button. First the letter is downsampled to a 5X7 image. This downsampled image must be copied from its 2-diminsenal array to an array of doubles that will be fed to the input neurons.
entry.downSample();
double input[] = new double[5*7];
int idx=0;
SampleData ds = sample.getData();
for ( int y=0;y<ds.getHeight();y++ ) {
for ( int x=0;x<ds.getWidth();x++ ) {
input[idx++] = ds.getData(x,y)?.5:-.5;
}
}
The above code does this conversion. Neurons require floating point input. As a result, the program feeds it the value of 5 for a white pixel and -5 for a white pixel. This array of 35 values is fed to the input neurons. This is done by passing the input array to the Kohonen’s "winner" method. This will return which of the 35 neurons won, this is stored in the "best" integer.
int best = net.winner ( input , normfac , synth ) ;
char map[] = mapNeurons();
JOptionPane.showMessageDialog(this,
" " + map[best] + " (Neuron #"
+ best + " fired)","That Letter Is",
JOptionPane.PLAIN_MESSAGE);
Knowing the winning neuron is not too helpful, because it does not show you which letter was actually recognized. To line the neurons up to their recognized letters, each letter image that the network was trained from must be fed into the network and the winning neuron determined. For example, if you were to feed the training image for "J" into the neural network, and the winning neuron were neuron #4, you would know that neuron #4 is the neuron that had learned to recognize J’s pattern. This is done by calling the "mapNeurons" method. The mapNeurons method returns an array of characters. The index of each array element corresponds to the neuron number that recognizes that character.
Most of the actual work performed by the neural network is done in the winner method. The first thing that the winner method does is normalize the inputs. Calculate the output values of each output neuron. Whichever output neuron has the largest output value is considered the winner. First the "biggest" variable is set to a very small number to indicate that there is not yet a winner.
biggest = -1.E30;
for ( i=0 ; i<outputNeuronCount; i++ ) {
optr = outputWeights[i];
output[i] = dotProduct (input , optr ) * normfac[0]
+ synth[0] * optr[inputNeuronCount] ;
// Remap to bipolar(-1,1 to 0,1)
output[i] = 0.5 * (output[i] + 1.0) ;
if ( output[i] > biggest ) {
biggest = output[i] ;
win = i ;
}
Each output neuron’s weight is calculated by taking the dot product of each of the output neuron weights to the input neurons. The dot product is calculated by multiplying each of the input neuron’s input values against the weights between that input neuron and the output neuron. These weights were determined during training, which is discussed in the next section. The output is kept, and if it is the largest output so far, it is set as the "winning" neuron.
As you can see, getting the results from a neural network is a very quick process. Actually determining the weights of the neurons is the complex portion of the this process. Training the neural network is discussed in the following section.
Training the Neural Network
Learning is the process of selecting a neuron weight matrix that will correctly recognize input patterns. A Kohonen neural network learns by constantly evaluating and optimizing a weight matrix. To do this a starting weight matrix must be determined. This starting weight matrix is chosen by selecting random numbers. Of course this is a terrible choice for a weight matrix, but it gives a starting point to optimize from.
Once the initial random weight matrix is created the training can begin. First the weight matrix is evaluated to determine what its current error level is. This error is determined by how well the training inputs(the letters that you created) map to the output neurons. The error is calculated by the "evaluateErrors" method of the KohonenNetwork class. If the error level is low, say below 10%, the process is complete.
The training process begins when the user clicks the "Begin Training" button. The training process begins with the following code. This calculates the number of input and output neurons. First, the number of input neurons is determined from the size of the downsampled image. Since the height is seven and the width is five, the number of input neurons will be 35. The number of output neurons matches how many characters the program has been given.
This is the part of the program that would be modified if you wanted to cause the program to accept more than one sample per letter to train from. For example, if you wanted to accept 4 samples per letter, you would have to make sure that the output neuron count remained 26, even though 104 input samples were provided to train with(4 for each of the 26 letters).
int inputNeuron = MainEntry.DOWNSAMPLE_HEIGHT* MainEntry.DOWNSAMPLE_WIDTH; int outputNeuron = letterListModel.size();
Now that the size of the neural network has been determined, the training set and neural network must be constructed. The training set is constructed to hold the correct number of "samples". This will be the 26 letters provided.
TrainingSet set = new TrainingSet(inputNeuron,outputNeuron); set.setTrainingSetCount(letterListModel.size());
Next, the downsampled input images are copied to the training set. This is repeated for all 26 input patterns.
for ( int t=0;t<letterListModel.size();t++ ) {
int idx=0;
SampleData ds = (SampleData)letterListModel.getElementAt(t);
for ( int y=0;y<ds.getHeight();y++ ) {
for ( int x=0;x<ds.getWidth();x++ ) {
set.setInput(t,idx++,ds.getData(x,y)?.5:-.5);
}
}
}
Finally the neural network is constructed and the training set is assigned. With a training set assigned the "learn" method can be called. This will adjust the weight matrix until the network is trained.
net = new KohonenNetwork(inputNeuron,outputNeuron,this); net.setTrainingSet(set); net.learn();
The main loop of the learn method will now be discussed. The learn method will loop up to an unspecified number of iterations. Because this program only has one sample per output neuron it is unlikely that it will take more than one iteration. When the number of training samples matches the output neuron count, training happens very quickly.
n_retry = 0 ;
for ( iter=0 ; ; iter++ ) {
A method, called "evaluateErrors" is called to evaluate how well the current weights are working. This is determined by looking at how well the training data spreads across the output neurons. If many output neurons are activating for the same training pattern, then the weight set is not a good one. An error rate is calculated, which is based on how well the training sets are spreading across the output neurons.
evaluateErrors ( rate , learnMethod , won ,
bigerr , correc , work ) ;
Once the error is determined we must see if it is below the best error we’ve seen so far. If it is below that error, then this error is copied to the best error, and the neuron weights are also preserved.
totalError = bigerr[0] ;
if ( totalError < best_err ) {
best_err = totalError ;
copyWeights ( bestnet , this ) ;
}
The total number of winning neurons is then calculated. This will allow us to determine if no output neurons activated. Additionally, if the error is below the accepted quit error(10%), the training stops.
winners = 0 ;
for ( i=0;i<won.length;i++ )
if ( won[i]!=0 )
winners++;
if ( bigerr[0] < quitError )
break ;
If there are not an acceptable number of winners, one neuron is forced to win.
if ( (winners < outputNeuronCount) &&
(winners < train.getTrainingSetCount()) ) {
forceWin ( won ) ;
continue ;
}
Now that the first weight matrix has been evaluated, it is adjusted based on its error. The adjustment is slight, based on the correction that was calculated when the error was determined. This two-step process of adjusting the calculating the error and adjusting the weight matrix is continued until the error falls below 10%.
adjustWeights ( rate , learnMethod , won , bigcorr, correc ) ;
This is the process by which a neural network is trained. The method for adjusting the weights and calculating the error is shown in the "KohonenNetwork.java" file.




