Examining the Backpropagation Process

You have now seen how to calculate the output for a feedforward neural network. You have seen both the mathematical equations and the Java implementation. As we examined how to calculate the final values for the network, we used the connection weights and threshold values to determine the final result. You may be wondering how these values were determined.

The values contained in the weight and threshold matrix were determined using the backpropagation algorithm. This is a very useful algorithm for training neural networks. The backpropagation algorithm works by running the neural network just as we did in our recognition example, as shown in the previous section. The main difference in the backpropagation algorithm is that we present the neural network with training data. As each item of training data is presented to the neural network, the error is calculated between the actual output of the neural network and the output that was expected (and specified in the training set). The weights and threshold are then modified, so there is a greater chance of the network returning the correct result when the network is next presented with the same input.

Backpropagation is a very common method for training multilayered feedforward networks. Backpropagation can be used with any feedforward network that uses a activation function that is differentiable. It is this derivative function that we will use during training. It is not necessary that you understand calculus or how to take the derivative of an equation to work with the material in this chapter. If you are using one of the common activation functions, you can simply get the activation function derivative from a chart.

To train the neural network, a method must be determined to calculate the error. As the neural network is trained, the network is presented with samples from the training set. The result obtained from the neural network is then compared with the anticipated result that is part of the training set. The degree to which the output from the neural network differs from this anticipated output is the error.

To train the neural network, we must try to minimize this error. To minimize the error, the neuron connection weights and thresholds must be modified. We must define a function that will calculate the rate of error of the neural network. This error function must be mathematically differentiable. Because the network uses a differentiable activation function, the activations of the output neurons can be thought of as differentiable functions of the input, weights, and thresholds. If the error function is also a differentiable function, such as the sum of square error function, the error function itself is a differentiable function of these weights. This allows us to evaluate the derivative of the error using the weights. Then, using these derivatives, we find weights and thresholds that will minimize the error function.

There are several ways to find weights that will minimize the error function. The most popular approach is to use the gradient descent method. The algorithm that evaluates the derivative of the error function is known as backpropagation, because it propagates the errors backward through the network.

The Train Interface

This book will cover three different training methods that can be used for feedforward neural networks. This chapter presents the backpropagation method. Chapter 6 will discuss using a genetic algorithm for training. Chapter 7 will discuss using simulated annealing for a neural network. All three of these training methods implement the Train interface. The Train interface is shown in Listing 5.5.

Listing 5.5: The Train Interface

```/**
* Introduction to Neural Networks with Java, 2nd Edition
* Copyright 2008 by Heaton Research, Inc.
* http://www.heatonresearch.com/books/java-neural-2/
*
* ISBN13: 978-1-60439-008-7
* ISBN:   1-60439-008-5
*
* This class is released under the:
* GNU Lesser General Public License (LGPL)
* http://www.gnu.org/copyleft/lesser.html
*/
package com.heatonresearch.book.introneuralnet.neural.feedforward.train;

import com.heatonresearch.book.introneuralnet.neural.feedforward.FeedforwardNetwork;

/**
* Train: Interface for all feedforward neural network training
* methods.  There are currently three training methods define:
*
* Backpropagation
* Genetic Algorithms
* Simulated Annealing
*
* @author Jeff Heaton
* @version 2.1
*/

public interface Train {

/**
* Get the current error percent from the training.
* @return The current error.
*/
public double getError();

/**
* Get the current best network from the training.
* @return The best network.
*/
public FeedforwardNetwork getNetwork();

/**
* Perform one iteration of training.
*/
public void iteration();
}
```

There are three different methods that must be implemented to use the Train interface. The getError method will return the current error level, calculated using root mean square (RMS) for the current neural network. The getNetwork method returns a trained neural network that achieves the error level reported by the getError method. Finally, the iteration method is called to perform one training iteration. Generally, the iteration method is called repeatedly until the getError method returns an acceptable error.

The Backpropagation Java Classes

The backpropagation algorithm is implemented in two classes. The Backpropagation class, which implements the Train interface, is the main training class used. Internally, the Backpropagation class uses the BackpropagationLayer class to hold information that the backpropagation algorithm needs for each layer of the neural network.

The two main methods that will be called by methods using the Backpropagation class are the iteration and getError functions. The iteration function has the following signature:

`public void iteration() { 		`

The iteration function begins by looping through all of the training sets that were provided.

`for (int j = 0; j < this.input.length; j++) { 	`

Each training set is presented to the neural network and the outputs are calculated.

`  this.network.computeOutputs(this.input[j]); 			`

Once the outputs have been calculated, the training can begin. This is a two-step process. First, the error is calculated by comparing the output to the ideal values.

```  calcError(this.ideal[j]);
}```

Once all the sets have been processed, then the network learns from these errors.

`learn(); 		`

Finally, the new global error, that is, the error across all training sets, is calculated.

`this.error = this.network.calculateError(this.input, this.ideal);`

It is this error that will be returned when a call is made to getError.

Calculating the Error for Backpropagation

The first step in backpropagation error calculation is to call the calcError method of the BackPropagation object. The signature for the calcError method is shown here:

`public void calcError(final double ideal[])`

First, we verify that the size of the ideal array corresponds to the number of output neurons. Since the ideal array specifies ideal values for the output neurons, it must match the size of the output layer.

```if (ideal.length !=
this.network.getOutputLayer().getNeuronCount()) {

throw new NeuralNetworkError(
"Size mismatch: Can't calcError for ideal input size="
+ ideal.length + " for output layer size="
+ this.network.getOutputLayer().getNeuronCount());
}```

The BackPropagation object contains a BackpropagationLayer object for each of the layers in the neural network. These objects should be cleared. The following code zeros out any previous errors from the BackpropagationLayer objects.

```for (final FeedforwardLayer layer :this.network.getLayers()) {
getBackpropagationLayer(layer).clearError();
}```

As its name implies, the backpropagation propagates backwards through the neural network.

`for (int i = this.network.getLayers().size() - 1; i >= 0; i--) {`

Obtain each layer of the neural network.

`  final FeedforwardLayer layer = this.network.getLayers().get(i);`

Now call either of two overloaded versions of the calcError method. If it is the output layer, then pass in the ideal for comparison. If it is not the output layer, then the ideal values are not needed.

```  if (layer.isOutput()) {
getBackpropagationLayer(layer).calcError(ideal);
} else {
getBackpropagationLayer(layer).calcError();
}
}```

The BackpropagationLayer class has two versions of the calcError method. The signature for the version that operates on the output layer is shown below:

`public void calcError(final double ideal[]) {`

First, calculate the error share for each neuron. Loop across all of the output neurons.

`for (int i = 0; i < this.layer.getNeuronCount(); i++) {`

Next, set the error for this neuron. The error is simply the difference between the ideal output and the actual output.

`  setError(i, ideal[i] – this.layer.getFire(i));`

Calculate the delta for this neuron. The delta is the error multiplied by the derivative of the activation function. Bound the number, so that it does not become extremely small or large.

```  setErrorDelta(i, BoundNumbers.bound(calculateDelta(i)));
}```

These error deltas will be used during the learning process that is covered in the next section.

All other layers in the neural network will have their errors calculated by the calcError method that does not require an ideal array.

`public void calcError() {`

First, obtain the next layer. Since we are propagating backwards, this will be the layer that was just processed.

```final BackpropagationLayer next =
this.backpropagation.getBackpropagationLayer(
this.layer.getNext());```

Loop through every matrix value for connections between this layer and the next.

```for (int i = 0; i < this.layer.getNext().getNeuronCount(); i++) {
for (int j = 0; j < this.layer.getNeuronCount(); j++) {```

The error calculation methods are called for each training set, therefore, it is necessary to accumulate the matrix deltas before the errors are cleared out. Determine this layer's contribution to the error by looking at the next layer's delta, and compare it to the outputs from this layer. Since we are propagating backwards, the next layer is actually the layer we just processed.

```    accumulateMatrixDelta(j, i, next.getErrorDelta(i)
* this.layer.getFire(j));```

Calculate, and add to, this layer's error by multiplying the matrix value by its delta.

```setError(j, getError(j) + this.layer.getMatrix().get(j, i)

* next.getErrorDelta(i));
}```

Also, accumulate deltas that affect the threshold.

```  accumulateBiasDelta(i, next.getErrorDelta(i));
}```

For the hidden layers, calculate the delta using the derivative of the activation function.

```if (this.layer.isHidden()) {
// hidden layer deltas
for (int i = 0; i < this.layer.getNeuronCount(); i++) {
setErrorDelta(i, BoundNumbers.bound(calculateDelta(i)));
}
}```

Once all of the errors have been calculated, the learn method can be used to apply the deltas to the weight matrix and teach the neural network to better recognize the input pattern.

Backpropagation Learning

The learning process is relatively simple. All of the desired changes were already calculated during the error calculation. It is now simply a matter of applying these changes. The values of the learning rate and momentum parameters will affect how these changes are applied.

The learn method in the BackPropagation object is called to begin the learning process.

`public void learn()`

Loop across all of the layers. The order is not important. During error calculation, the results from one layer depended upon another. As a result, it was very important to ensure that the error propagated backwards. However, during the learning process, values are simply applied to the neural network layers one at a time.

`for (final FeedforwardLayer layer : this.network.getLayers()) {`

Calling the learn method of each of the BackpropagationLayer objects causes the calculated changes to be applied.

```  getBackpropagationLayer(layer).learn(this.learnRate, this.momentum);
}```

The learn method provided in the BackpropagationLayer class is used to perform the actual modifications. The signature for the learn method is shown here:

`public void learn(final double learnRate, final double momentum) `

The learn layer makes sure there is a matrix. If there is no matrix, then there is nothing to train.

`  if (this.layer.hasMatrix()) {`

A matrix is then made that contains the accumulated matrix delta values that are scaled by the learnRate. The learning rate can be thought of as a percent. A value of one means that the deltas will be applied with no scaling.

```    final Matrix m1 = MatrixMath.multiply(this.accMatrixDelta,
learnRate);```

The previous deltas are stored in the matrixDelta variable. The learning from the previous iteration is applied to the current iteration scaled by the momentum variable. Some variants of Backpropagation use no momentum. To specify no momentum, use a momentum value of zero.

`    final Matrix m2 = MatrixMath.multiply(this.matrixDelta, momentum);`

Add the two together and store the result in the matrixDelta variable. This will be used with the momentum for the next training iteration.

`    this.matrixDelta = MatrixMath.add(m1, m2);`

Add the calculated values to the current matrix. This modifies the matrix and causes learning.

```this.layer.setMatrix(MatrixMath.add(this.layer.getMatrix(),
this.matrixDelta));```

Clear the errors for the next learning iteration.

```    this.accMatrixDelta.clear();
}```

The learn method on the BackpropagationLayer class is called once per layer.

Technology: