Training Algorithm
Training occurs as the neuron connection weights are modified to produce more desirable results. There are several ways that this training can take place. In this chapter we will observe two simple methods for training the connection weights of the neural network. In Chapter 5, “Understanding Back propagation” we will see back propagation, which is a much more complex training algorithm. Back Propagation is the most common form of neural network training.
Neuron connection weights are not just modified in one pass. The process by which neuron weights are modified occurs over iterations. The neural network is presented with training data, and then the results are observed. These results must in some way change the connection weights in order for the neural network to be learn. The exact process by which this happens is determined by the learning algorithm.
These learning algorithms, which are commonly called learning rules, are almost always expressed as functions. What you want to know from this function is how should the weight between two neurons be changed. Consider a weight matrix between four neurons, such as we saw in Chapter 2. This is expressed as an array of doubles in Java.
double weights[][] = new double[4][4];
This gives us the weights between four neurons. Because Java array indexes begin with zero we shall refer to these neurons as neurons zero through three. Using the above array, the weight between neuron two and neuron three would be contained in the variable weights[2][3]. Therefore, we would like a learning function that would return the new weight between neurons i and j, such as
weights[i][j] += learningRule(...)
The hypothetical method learningRate calculates the change (delta) that must occur between the two neurons in order for learning to take place. We never discard the previous weight value altogether, rather we compute a delta value that is used to modify the original weight. It does not take just one modification for the neural network to learn. Once the weight of the neural network has been modified the network is presented with the training data again, and the process continues. These iterations continue until the neural network’s error rate has dropped to an acceptable level.
Another common input to the learning rule is the error. The error is the degree to which the actual output of the neural network differed from the anticipated output. If such an error is provided to the training function then the method is called supervised training. In supervised training the neural network is constantly adjusting the weights to attempt to better line up with the anticipated outputs that were provided.
Conversely, if no error was provided to the training function, then we are using an unsupervised training algorithm. In unsupervised training the neural network is not told what the “correct output” is. Unsupervised training leaves the neural network to determine this for itself. Often unsupervised training is used to allow the neural network to group the input data. Often the programmer does not know ahead of times exactly what these groups are. Figure 4.4 shows the flow chart of unsupervised training. Figure 4.5 shows the flow chart of supervised training.

Figure 4.4: Unsupervised Training

Figure 4.5: Supervised Training
We will now examine two common training algorithms. The first Hebb’s rule is used for unsupervised training, and does not take into account network error. The second, the delta rule, is used with supervised training and adjusts the weights so that the input to the neural network will more accurately produce the anticipated output. We will begin with Hebb's Rule.
Hebb's Rule
One of the most common learning algorithms is called Hebb’s Rule. This rule was developed by Donald Hebb to assist with unsupervised training. We previously examined a hypothetical learning rule given by the following expression.
weights[i][j] += learningRule(...)
Rules for training neural networks are almost always represented as algebraic formulas. Hebbs rule is expressed as:

In the above equation calculates the needed change(delta) in weights from the connection from neuron i to neuron j. The Greek letter mu(
) represents the learning rate. The activation of each neuron, when provided with the training pattern, is given as
. This equation can easily be translated into the following Java method.
double learningRule(double rate,double act_i,double act_j)
{
return(rate*act_i*act_j);
}
We will now examine how this training algorithm actually works. To see this we will consider a simple neural network with only two neurons. In this neural network these two neurons make up both the input and output layer. There is no hidden layer. Table 4.1 summarizes some of the possible scenarios using Hebbian training. Assume that the learning rate is one.
Table 4.1: Using Hebb’s Rule
| Neuron I Output (activation) |
Neuron J Output (activation) |
Hebb's Rule (R*I*J) |
Weight Modification |
| +1 | -1 | 1*1*-1 | -1 |
| -1 | +1 | 1*-1*1 | -1 |
| +1 | +1 | 1*1*1 | +1 |
As you can see from the above table, if the activations of neuron I was +1 and the activation of neuron J were -1 the neuron connection weight between neuron I and neuron J would be decreased by one.
Delta Rule
The delta rule is also known as the least mean squared error rule (LMS). Using this rule actual output of a neural network is compared against the anticipated output. Because the anticipated output is specified, using the delta rule is considered supervised training. Algebraically the delta rule is written as follows.

In the above equation calculates the needed change (delta) in weights from the connection from neuron i to neuron j. The Greek letter mu(
) represents the learning rate. The variable d represents the desired output of the neuron. The variable a represents the actual output of the neuron. The variable represents the ith component of the input vector. This equation can easily be translated into the following Java method.
protected double trainingFunction(
double rate,double input,double error)
{
return rate*input*error;
}
We will now examine how the delta training algorithm actually works. To see this we will look at the example program shown in Listing 4.2.
Listing 4.2: Using the Delta Rule
public class Delta {
/**
* Weight for neuron 1
*/
double w1;
/**
* Weight for neuron 2
*/
double w2;
/**
* Weight for neuron 3
*/
double w3;
/**
* Learning rate
*/
double rate = 0.5;
/**
* Current epoch #
*/
int epoch = 1;
/**
* @param i1 Input to neuron 1
* @param i2 Input to neuron 2
* @param i3 Input to neuron 3
* @return the output from the neural network
*/
protected double recognize(double i1,double i2,double i3)
{
double a = (w1*i1)+(w2*i2)+(w3*i3);
return(a*.5);
}
/**
* This method will calculate the error between the
* anticipated output and the actual output.
*
* @param actual The actual output from the neural network.
* @param anticipated The anticipated neuron output.
* @return The error.
*/
protected double getError(double actual,double anticipated)
{
return(anticipated-actual);
}
/**
* The learningFunction implements the delta rule.
* This method will return the weight adjustment for
* the specified input neuron.
*
* @param rate The learning rate
* @param input The input neuron we're processing
* @param error The error between the actual output
* and anticipated output.
* @return The amount to adjust the weight by.
*/
protected double trainingFunction(
double rate,double input,double error)
{
return rate*input*error;
}
/**
* Present a pattern and learn from it.
*
* @param i1 Input to neuron 1
* @param i2 Input to neuron 2
* @param i3 Input to neuron 3
* @param anticipated The anticipated output
*/
protected void presentPattern(
double i1,double i2,double i3,double anticipated)
{
double error;
double actual;
double delta;
// run the net as is on training data
// and get the error
System.out.print("Presented [" + i1 + "," + i2
+ "," + i3 + "]");
actual = recognize(i1,i2,i3);
error = getError(actual,anticipated);
System.out.print(" anticipated=" + anticipated);
System.out.print(" actual=" + actual);
System.out.println(" error=" + error);
// adjust weight 1
delta = trainingFunction(rate,i1,error);
w1+=delta;
// adjust weight 2
delta = trainingFunction(rate,i2,error);
w2+=delta;
// adjust weight 3
delta = trainingFunction(rate,i3,error);
w3+=delta;
}
/**
* Process one epoch. Here we learn from all three training
* samples and then update the weights based on error.
*/
protected void epoch()
{
System.out.println("***Beginning Epoch #" + epoch+"***");
presentPattern(0,0,1,0);
presentPattern(0,1,1,0);
presentPattern(1,0,1,0);
presentPattern(1,1,1,1);
epoch++;
}
/**
* This method loops through 100 epochs.
*/
public void run()
{
for ( int i=0;i<100;i++ ) {
epoch();
}
}
/**
* Main method just instanciates a delta object and calls
* run.
*
* @param args Not used
*/
public static void main(String args[])
{
Delta delta = new Delta();
delta.run();
}
}
This program will train for 100 iterations. It is designed to teach the neural network to recognize three patterns. These patterns are summarized as follows.
- For 001 output 0
- For 011 output 0
- For 101 output 0
- For 111 output 1
For each epoch you will be shown what the actual and anticipated results were. By epoch 100 the network is trained. The output from epoch 100 is shown here.
***Beginning Epoch #100*** Presented [0.0,0.0,1.0] anticipated=0.0 actual=-0.33333333131711973 error=0.33333333131711973 Presented [0.0,1.0,1.0] anticipated=0.0 actual=0.333333333558949 error=-0.333333333558949 Presented [1.0,0.0,1.0] anticipated=0.0 actual=0.33333333370649876 error=-0.33333333370649876 Presented [1.0,1.0,1.0] anticipated=1.0 actual=0.6666666655103011 error=0.33333333448969893
As you can see from the above display there are only two possible outputs 0.333 and 0.666. The output of 0.333 corresponds to 0 and the output of 0.666 corresponds to 1. A neural network will never produce exactly the required output, but through rounding it gets pretty close. While the delta rule is efficient at adjusting the weights it is not the most commonly used. In the next chapter we will examine the feed forward back propagation network, which is one of the most commonly used neural networks.




