Training occurs as the neuron connection weights are modified to produce more desirable results. There are several ways that training can take place. In the following sections we will discuss two simple methods for training the connection weights of a neural network. In chapter 5, we will examine backpropagation, which is a much more complex training algorithm.

    Neuron connection weights are not modified in a single pass. The process by which neuron weights are modified occurs over multiple iterations. The neural network is presented with training data and the results are then observed. Neural network learning occurs when these results change the connection weights. The exact process by which this happens is determined by the learning algorithm used.

    Learning algorithms, which are commonly called learning rules, are almost always expressed as functions. A learning function provides guidance on how a weight between two neurons should be changed. Consider a weight matrix containing the weights for the connections between four neurons, such as we saw in chapter 3, “Using a Hopfield Neural Network.” This is expressed as an array of doubles.

double weights[][] = new double[4][4];

    This matrix is used to store the weights between four neurons. Since Java array indexes begin with zero, we shall refer to these neurons as neurons zero through three. Using the above array, the weight between neuron two and neuron three would be contained in the location weights[2][3]. Therefore, we would like a learning function that will return the new weight between neurons “i” and “j,” such that

weights[i][j] += learningRule(...)

    The hypothetical method learningRule calculates the change (delta) that must occur between the two neurons in order for learning to take place. We never discard the previous weight value altogether; rather, we compute a delta value that is used to modify the original weight. It takes more than a single modification for the neural network to learn. Once the weights of the neural network have been modified, the network is again presented with the training data and the process continues. These iterations continue until the neural network’s error rate has dropped to an acceptable level.

    Another common input to the learning rule is the error. The error is the degree to which the actual output of the neural network differs from the anticipated output. If such an error is provided to the training function, then the method is called supervised training. In supervised training, the neural network is constantly adjusting the weights to attempt to better align the actual results with the anticipated outputs that were provided.

    Conversely, if no error was provided to the training function, then we are using an unsupervised training algorithm. Recall, in unsupervised training, the neural network is not told what the “correct” output is. Unsupervised training leaves the neural network to determine this for itself. Often, unsupervised training is used to allow the neural network to group the input data. The programmer does not know ahead of time exactly what the groupings will be.

    We will now examine two common training algorithms. The first, Hebb’s rule, is used for unsupervised training and does not take into account network error. The second, the delta rule, is used with supervised training and adjusts the weights so that the input to the neural network will more accurately produce the anticipated output. We will begin with Hebb’s Rule.

Hebb’s Rule

    One of the most common learning algorithms is called Hebb’s Rule. This rule was developed by Donald Hebb to assist with unsupervised training. We previously examined a hypothetical learning rule defined by the following expression:

weights[i][j] += learningRule(...)

    Rules for training neural networks are almost always represented as algebraic formulas. Hebb's rule is expressed in Equation 4.3.

Equation 4.3: Hebb’s Rule

Supervised training.

    The above equation calculates the needed change (delta) in the weight for the connection from neuron “i” to neuron “j.” The Greek letter mu (µ) represents the learning rate. The activation of each neuron is given as ai and aj. This equation can easily be translated into the following Java method.

protected double learningRule(
  double rate, double input, double output)
{
  return rate*input*output;
}

    We will now examine how this training algorithm actually works. To do this, we will consider a simple neural network with only two neurons. In this neural network, these two neurons make up both the input and output layer. There is no hidden layer. Table 4.1 summarizes some of the possible scenarios using Hebbian training. Assume that the learning rate is one.

Table 4.1: Using Hebb’s Rule

Case Neuron i Value Neuron j Output Hebb's Rule Weight Delta
Case 1 +1 -1 1*1*-1 -1
Case 2 -1 +1 1*-1*1 -1
Case 3 +1 +1 1*1*1 +1

    As you can see from the above table, if the activation of neuron “i” was +1 and the activation of neuron j was –1, the neuron connection weight between neuron “i” and neuron “j” would be decreased by one.

    Hebb's rule is unsupervised, so we are not training the neural network for some ideal output. Rather, Hebb's rule works by reinforcing what the neural network already knows. This is sometimes summarized with the catchy phrase: “Neurons that fire together, wire together.” That is, if the two neurons have similar activations, their weight is increased. If two neurons have dissimilar activations, their weight is decreased.

    An example of Hebb's rule is shown in Listing 4.2.

Listing 4.2: Using Hebb's Rule

/**
 * Introduction to Neural Networks with Java, 2nd Edition
 * Copyright 2008 by Heaton Research, Inc. 
 * http://www.heatonresearch.com/books/java-neural-2/
 * 
 * ISBN13: 978-1-60439-008-7  	 
 * ISBN:   1-60439-008-5
 *   
 * This class is released under the:
 * GNU Lesser General Public License (LGPL)
 * http://www.gnu.org/copyleft/lesser.html
 */
package com.heatonresearch.book.introneuralnet.ch4.hebb;

/**
 * Chapter 4: Machine Learning
 * 
 * Hebb: Learn, using Hebb's rule.
 * 
 * @author Jeff Heaton
 * @version 2.1
 */
public class Hebb {

	/**
	 * Main method just instanciates a delta object and calls run.
	 * 
	 * @param args
	 *            Not used
	 */
	public static void main(final String args[]) {
		final Hebb delta = new Hebb();
		delta.run();

	}

	/**
	 * Weight for neuron 1
	 */
	double w1;

	/**
	 * Weight for neuron 2
	 */
	double w2;

	/**
	 * Learning rate
	 */
	double rate = 1.0;

	/**
	 * Current epoch #
	 */
	int epoch = 1;

	public Hebb() {
		this.w1 = 1;
		this.w2 = -1;
	}

	/**
	 * Process one epoch. Here we learn from all three training samples and then
	 * update the weights based on error.
	 */

	protected void epoch() {
		System.out.println("***Beginning Epoch #" + this.epoch + "***");
		presentPattern(-1, -1);
		presentPattern(-1, 1);
		presentPattern(1, -1);
		presentPattern(1, 1);
		this.epoch++;
	}

	/**
	 * Present a pattern and learn from it.
	 * 
	 * @param i1
	 *            Input to neuron 1
	 * @param i2
	 *            Input to neuron 2
	 * @param i3
	 *            Input to neuron 3
	 * @param anticipated
	 *            The anticipated output
	 */
	protected void presentPattern(final double i1, final double i2) {
		double result;
		double delta;

		// run the net as is on training data
		// and get the error
		System.out.print("Presented [" + i1 + "," + i2 + "]");
		result = recognize(i1, i2);
		System.out.print(" result=" + result);

		// adjust weight 1
		delta = trainingFunction(this.rate, i1, result);
		this.w1 += delta;
		System.out.print(",delta w1=" + delta);

		// adjust weight 2
		delta = trainingFunction(this.rate, i2, result);
		this.w2 += delta;
		System.out.println(",delta w2=" + delta);

	}

	/**
	 * @param i1
	 *            Input to neuron 1
	 * @param i2
	 *            Input to neuron 2
	 * @param i3
	 *            Input to neuron 3
	 * @return the output from the neural network
	 */
	protected double recognize(final double i1, final double i2) {
		final double a = (this.w1 * i1) + (this.w2 * i2);
		return (a * .5);
	}

	/**
	 * This method loops through 10 epochs.
	 */
	public void run() {
		for (int i = 0; i < 5; i++) {
			epoch();
		}
	}

	/**
	 * The learningFunction implements the delta rule. This method will return
	 * the weight adjustment for the specified input neuron.
	 * 
	 * @param rate
	 *            The learning rate
	 * @param input
	 *            The input neuron we're processing
	 * @param error
	 *            The error between the actual output and anticipated output.
	 * @return The amount to adjust the weight by.
	 */
	protected double trainingFunction(final double rate, final double input,
			final double output) {
		return rate * input * output;
	}
}

    The Hebb's rule example uses two input neurons and one output neuron. As a result, there are a total of two weights, one weight for each of the connections between the input neurons and the output neuron. The first weight, which is the weight between neuron one and the output neuron is initialized to one. The second weight, which is the weight between neuron two and the output neuron, is initialized to two.

    Consider the output from the first epoch.

***Beginning Epoch #1***
Presented [-1.0,-1.0] result=0.0,delta w1=-0.0,delta w2=-0.0
Presented [-1.0,1.0] result=-1.0,delta w1=1.0,delta w2=-1.0
Presented [1.0,-1.0] result=2.0,delta w1=2.0,delta w2=-2.0
Presented [1.0,1.0] result=0.0,delta w1=0.0,delta w2=0.0

    The above output shows how the three-neuron network responded to four different input patterns. The middle two input patterns returned the strongest results. The second pattern was strong in the negative direction, the third pattern was strong in the positive direction. Hebb's rule tends to strengthen the output in the direction it already has a tendency towards.

    The above output also shows the calculated deltas for weight one (w1) and weight two (w2). The first and fourth patterns both produced outputs of zero, so neither will produce a delta for the weight, other than zero. However, the negative output for pattern two will produce a weight delta of –1, and the positive result of pattern three will produce a weight delta of 2. These delta weights will be applied and will strengthen the negative or positive tendencies of the respective neurons.

    It is also important to note that this example is always applying the deltas as the patterns are presented. This is why the third pattern will always have a stronger output than the second pattern; the delta for the second pattern has already been applied by the time the third pattern is presented.

    The second epoch continues in much the same way as the first.

***Beginning Epoch #2***
Presented [-1.0,-1.0] result=0.0,delta w1=-0.0,delta w2=-0.0
Presented [-1.0,1.0] result=-4.0,delta w1=4.0,delta w2=-4.0
Presented [1.0,-1.0] result=8.0,delta w1=8.0,delta w2=-8.0
Presented [1.0,1.0] result=0.0,delta w1=0.0,delta w2=0.0

    However, the deltas from the previous epoch have already been applied. New weight deltas are calculated that further enhance the positive or negative tendencies of the neurons.

Delta Rule

    The delta rule is also known as the least mean squared error rule (LMS). Using this rule, the actual output of a neural network is compared against the anticipated output. Because the anticipated output is specified, using the delta rule is considered supervised training. Algebraically, the delta rule is written as follows in Equation 4.4.

Equation 4.4: The Delta Rule

Supervised training.

    The above equation calculates the needed change (delta) in weights for the connection from neuron “i” to neuron “j.” The Greek letter mu (µ) represents the learning rate. The variable ideal represents the desired output of the “j” neuron. The variable actual represents the actual output of the “j” neuron. As a result, (ideal-actual) is the error. This equation can easily be translated into the following Java method.

protected double trainingFunction(
    double rate, double input, double ideal, double actual)
{
  return rate*input*(ideal-actual);
}

    We will now examine how the delta training algorithm actually works. To see this, we will look at the example program shown in Listing 4.3.

Listing 4.3: Using the Delta Rule

/**
 * Introduction to Neural Networks with Java, 2nd Edition
 * Copyright 2008 by Heaton Research, Inc. 
 * http://www.heatonresearch.com/books/java-neural-2/
 * 
 * ISBN13: 978-1-60439-008-7  	 
 * ISBN:   1-60439-008-5
 *   
 * This class is released under the:
 * GNU Lesser General Public License (LGPL)
 * http://www.gnu.org/copyleft/lesser.html
 */
package com.heatonresearch.book.introneuralnet.ch4.delta;

/**
 * Chapter 4: Machine Learning
 * 
 * Delta: Learn, using the delta rule.
 * 
 * @author Jeff Heaton
 * @version 2.1
 */
public class Delta {

	/**
	 * Main method just instanciates a delta object and calls run.
	 * 
	 * @param args
	 *            Not used
	 */
	public static void main(final String args[]) {
		final Delta delta = new Delta();
		delta.run();

	}

	/**
	 * Weight for neuron 1
	 */
	double w1;

	/**
	 * Weight for neuron 2
	 */
	double w2;

	/**
	 * Weight for neuron 3
	 */
	double w3;

	/**
	 * Learning rate
	 */
	double rate = 0.5;

	/**
	 * Current epoch #
	 */
	int epoch = 1;

	/**
	 * Process one epoch. Here we learn from all three training samples and then
	 * update the weights based on error.
	 */

	protected void epoch() {
		System.out.println("***Beginning Epoch #" + this.epoch + "***");
		presentPattern(0, 0, 1, 0);
		presentPattern(0, 1, 1, 0);
		presentPattern(1, 0, 1, 0);
		presentPattern(1, 1, 1, 1);
		this.epoch++;
	}

	/**
	 * This method will calculate the error between the anticipated output and
	 * the actual output.
	 * 
	 * @param actual
	 *            The actual output from the neural network.
	 * @param anticipated
	 *            The anticipated neuron output.
	 * @return The error.
	 */
	protected double getError(final double actual, final double anticipated) {
		return (anticipated - actual);
	}

	/**
	 * Present a pattern and learn from it.
	 * 
	 * @param i1
	 *            Input to neuron 1
	 * @param i2
	 *            Input to neuron 2
	 * @param i3
	 *            Input to neuron 3
	 * @param anticipated
	 *            The anticipated output
	 */
	protected void presentPattern(final double i1, final double i2,
			final double i3, final double anticipated) {
		double error;
		double actual;
		double delta;

		// run the net as is on training data
		// and get the error
		System.out.print("Presented [" + i1 + "," + i2 + "," + i3 + "]");
		actual = recognize(i1, i2, i3);
		error = getError(actual, anticipated);
		System.out.print(" anticipated=" + anticipated);
		System.out.print(" actual=" + actual);
		System.out.println(" error=" + error);

		// adjust weight 1
		delta = trainingFunction(this.rate, i1, error);
		this.w1 += delta;

		// adjust weight 2
		delta = trainingFunction(this.rate, i2, error);
		this.w2 += delta;

		// adjust weight 3
		delta = trainingFunction(this.rate, i3, error);
		this.w3 += delta;
	}

	/**
	 * @param i1
	 *            Input to neuron 1
	 * @param i2
	 *            Input to neuron 2
	 * @param i3
	 *            Input to neuron 3
	 * @return the output from the neural network
	 */
	protected double recognize(final double i1, final double i2, final double i3) {
		final double a = (this.w1 * i1) + (this.w2 * i2) + (this.w3 * i3);
		return (a * .5);
	}

	/**
	 * This method loops through 100 epochs.
	 */
	public void run() {
		for (int i = 0; i < 100; i++) {
			epoch();
		}
	}

	/**
	 * The learningFunction implements the delta rule. This method will return
	 * the weight adjustment for the specified input neuron.
	 * 
	 * @param rate
	 *            The learning rate
	 * @param input
	 *            The input neuron we're processing
	 * @param error
	 *            The error between the actual output and anticipated output.
	 * @return The amount to adjust the weight by.
	 */
	protected double trainingFunction(final double rate, final double input,
			final double error) {
		return rate * input * error;
	}
}

    This program will train for 100 iterations. It is designed to teach the neural network to recognize three patterns. These patterns are summarized as follows:

For 001 output 0
For 011 output 0
For 101 output 0
For 111 output 1

    For each epoch, you will be shown the actual and anticipated results. By epoch 100, the network will be trained. The output from epoch 100 is shown here:

***Beginning Epoch #100***
Presented [0.0,0.0,1.0] anticipated=0.0 actual=-0.33333333131711973 error=0.33333333131711973
Presented [0.0,1.0,1.0] anticipated=0.0 actual=0.333333333558949 error=-0.333333333558949
Presented [1.0,0.0,1.0] anticipated=0.0 actual=0.33333333370649876 error=-0.33333333370649876
Presented [1.0,1.0,1.0] anticipated=1.0 actual=0.6666666655103011 error=0.33333333448969893

    As you can see from the above display, there are only two possible outputs 0.333 and 0.666. The output of 0.333 corresponds to 0 and the output of 0.666 corresponds to 1. A neural network will never produce the exact output desired, but through rounding it gets pretty close. While the delta rule is efficient at adjusting weights, it is not the most commonly used.

    In the next chapter we will examine the feedforward backpropagation network, which is one of the most commonly used neural networks. Backpropagation is a more advanced form of the delta rule.


Copyright 2005 - 2010 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright and trademark information.