Chapter 5: Propagation Training
Chapter 5: Propagation Training
- How Propagation Training Works
- Backpropagation Training
- Manhattan Update Rule
- Resilient Propagation Training
Training is the means by which the weights and threshold values of a neural network are adjusted to give desirable outputs. This book will cover both supervised and unsupervised training. Propagation training is a form of supervised training, where the expected output is given to the training algorithm.
Encog also supports unsupervised training. With unsupervised training you do not provide the neural network with the expected output. Rather, the neural network is left to learn and make insights into the data with limited direction. Chapter 8 will discuss unsupervised training.
Propagation training can be a very effective form of training for feedforward, simple recurrent and other types of neural networks. There are several different forms of propagation training. This chapter will focus on the forms of propagation currently supported by Encog. These three forms are listed as follows:
- Backpropagation Training
- Manhattan Update Rule
- Resilient Propagation Training
All three of these methods work very similarly. However, there are some important differences. In the next section we will explore propagation training in general.
Understanding Propagation Training
Propagation training algorithms use supervised training. This means that the training algorithm is given a training set of inputs and the ideal output for each input. The propagation-training algorithm will go through a series of iterations. Each iteration will most likely improve the error rate of the neural network by some degree. The error rate is the percent difference between the actual output from the neural network and the ideal output provided by the training data.
Each iteration will completely loop through the training data. For each item of training data, some change to the weight matrix and thresholds will be calculated. These changes will be applied in batches. Encog uses batch training. Therefore, Encog updates the weight matrix and threshold values at the end of an iteration.
We will now examine what happens during each training iteration. Each training iteration begins by looping over all of the training elements in the training set. For each of these training elements a two-pass process is executed: a forward pass and a backward pass.
The forward pass simply presents data to the neural network as it normally would if no training had occurred. The input data is presented, and the algorithm calculates the error, which is the difference between the actual output and the ideal output. The output from each of the layers is also kept in this pass. This allows the training algorithms to see the output from each of the neural network layers.
The backward pass starts at the output layer and works its way back to the input layer. The backward pass begins by examining the difference between each of the ideal outputs and the actual output from each of the neurons. The gradient of this error is then calculated. To calculate this gradient, the network the actual output of the neural network is applied to the derivative of the activation function used for this level. This value is then multiplied by the error.
Because the algorithm uses the derivative function of the activation function, propagation training can only be used with activation functions that actually have a derivative function. This derivative is used to calculate the error gradient for each connection in the neural network. How exactly this value is used depends on the training algorithm used.
Understanding Backpropagation
Backpropagation is one of the oldest training methods for feedforward neural networks. Backpropagation uses two parameters in conjunction with the gradient descent calculated in the previous section. The first parameter is the learning rate. The learning rate is essentially a percent that determines how directly the gradient descent should be applied to the weight matrix and threshold values. The gradient is multiplied by the learning rate and then added to the weight matrix or threshold value. This will slowly optimize the weights to values that will produce a lower error.
One of the problems with the backpropagation algorithm is that the gradient descent algorithm will seek out local minima. These local minima are points of low error, but they may not be a global minimum. The second parameter provided to the backpropagation algorithm seeks to help the backpropagation out of local minima. The second parameter is called momentum. Momentum specifies, to what degree, the weight changes from the previous iteration should be applied to the current iteration.
The momentum parameter is essentially a percent, just like the learning rate. To use momentum, the backpropagation algorithm must keep track of what changes were applied to the weight matrix from the previous iteration. These changes will be reapplied to the current iteration, except scaled by the momentum parameters. Usually the momentum parameter will be less than one, so the weight changes from the previous training iteration are less significant than the changes calculated for the current iteration. For example, setting the momentum to 0.5 would cause fifty percent of the previous training iteration's changes to be applied to the weights for the current weight matrix.
Understanding the Manhattan Update Rule
One of the problems with the backpropagation training algorithm is the degree to which the weights are changed. The gradient descent can often apply too large of a change to the weight matrix. The Manhattan update rule and resilient propagation training algorithms only use the sign of the gradient. The magnitude is discarded. This means it is only important if the gradient is positive, negative or near zero.
For the Manhattan update rule, this magnitude is used to determine how to update the weight matrix or threshold value. If the magnitude is near zero, then no change is made to the weight or threshold value. If the magnitude is positive, then the weight or threshold value is increased by a specific amount. If the magnitude is negative, then the weight or threshold value is decreased by a specific amount. The amount by which the weight or threshold value is changed is defined as a constant. You must provide this constant to the Manhattan update rule algorithm.
Understanding Resilient Propagation Training
The resilient propagation training (RPROP) algorithm is usually the most efficient training algorithm provided by Encog for supervised feedforward neural networks. One particular advantage to the RPROP algorithm is that it requires no setting of parameters before using it. There are no learning rates, momentum values or update constants that need to be determined. This is good because it can be difficult to determine the exact learning rate that might be optimal.
The RPROP algorithms works similar to the Manhattan update rule, in that only the magnitude of the descent is used. However, rather than using a fixed constant to update the weights and threshold values, a much more granular approach is used. These deltas will not remain fixed, like in the Manhattan update rule or backpropagation algorithm. Rather these delta values will change as training progresses.
The RPROP algorithm does not keep one global update value, or delta. Rather, individual deltas are kept for every threshold and weight matrix value. These deltas are first initialized to a very small number. Every iteration through the RPROP algorithm will update the weight and threshold values according to these delta values. However, as previously mentioned, these delta values do not remain fixed. The gradient is used to determine how they should change, using the magnitude to determine how the deltas should be modified further. This allows every individual threshold and weight matrix value to be individually trained. This is an advantage that is not provided by either the backpropagation algorithm or the Manhattan update rule.
Propagation Training with Encog
Now that you understand the primary differences between the three different types of propagation training used by Encog, we will see how to actually implement each of them. The following sections will show C# examples that make use of all three. The XOR operator, which was introduced in the last chapter, will be used as an example. The XOR operator is trivial to implement, so it is a good example for a new training algorithm.
Using Backpropagation
In the last chapter we saw how to use the Encog Workbench to implement a solution with the XOR operator using a neural network. In this chapter we will now see how to do this with a C# program. Listing 5.1 shows a simple C# program that will train a neural network to recognize the XOR operator.
Listing 5.1: Using Backpropagation
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Encog.Neural.Networks;
using Encog.Neural.Networks.Layers;
using Encog.Neural.Activation;
using Encog.Neural.NeuralData;
using Encog.Neural.Data.Basic;
using Encog.Neural.Networks.
using Encog.Neural.Networks.
using Encog.Neural.Data;
using ConsoleExamples.Examples;
namespace Encog.Examples.XOR.Anneal.
{
/// <summary>
/// Learn to recognize the XOR pattern using
/// a backpropagation training algorithm.
/// </summary>
public class XorBackprop : IExample
{
public static ExampleInfo Info
{
get
{
ExampleInfo info = new ExampleInfo(
typeof(XorBackprop),
"xor-backprop",
"XOR Operator with Backpropagation",
"Use backpropagation to learn the XOR operator.");
return info;
}
}
/// <summary>
/// Input for the XOR function.
/// </summary>
public static double[][] XOR_INPUT ={
new double[2] { 0.0, 0.0 },
new double[2] { 1.0, 0.0 },
new double[2] { 0.0, 1.0 },
new double[2] { 1.0, 1.0 } };
/// <summary>
/// Ideal output for the XOR function.
/// </summary>
public static double[][] XOR_IDEAL = {
new double[1] { 0.0 },
new double[1] { 1.0 },
new double[1] { 1.0 },
new double[1] { 0.0 } };
/// <summary>
/// Program entry point.
/// </summary>
/// <param name="args">Not used.</param>
public void Execute(IExampleInterface app)
{
BasicNetwork network = new BasicNetwork();
network.AddLayer(new BasicLayer(
new ActivationSigmoid(), true, 2));
network.AddLayer(new BasicLayer(
new ActivationSigmoid(), true, 3));
network.AddLayer(new BasicLayer(
new ActivationSigmoid(), true, 1));
network.Structure.
network.Reset();
INeuralDataSet trainingSet =
new BasicNeuralDataSet(XOR_INPUT, XOR_IDEAL);
// train the neural network
ITrain train =
new Backpropagation(network, trainingSet,
0.7, 0.9);
int epoch = 1;
do
{
train.Iteration();
Console.WriteLine("Epoch #" + epoch + " Error:"
+ train.Error);
epoch++;
} while ((epoch < 5000) && (train.Error > 0.001));
// test the neural network
Console.WriteLine("Neural Network Results:");
foreach (INeuralDataPair pair in trainingSet)
{
INeuralData output =
network.Compute(pair.Input);
Console.WriteLine(pair.Input[
+ pair.Input[1]
+ ", actual=" + output[0] + ",ideal="
+ pair.Ideal[0]);
}
}
}
}
We will now examine the parts of the program necessary to implement the XOR backpropagation example.
A truth table defines the possible inputs and ideal outputs for a mathematical operator. The truth table for XOR is shown below.
0 XOR 0 = 0
1 XOR 0 = 1
0 XOR 1 = 1
1 XOR 1 = 0
The backpropagation XOR example must store the XOR truth table as a 2D array. This will allow a training set to be constructed. We begin by creating XOR_INPUT, which will hold the input values for each of the rows in the XOR truth table.
public static double[][] XOR_INPUT ={
new double[2] { 0.0, 0.0 },
new double[2] { 1.0, 0.0 },
new double[2] { 0.0, 1.0 },
new double[2] { 1.0, 1.0 } };
Next we create the array XOR_IDEAL, which will hold the expected output for each of the inputs previously defined.
public static double[][] XOR_IDEAL
= {
new double[1] { 0.0 },
new double[1] { 1.0 },
new double[1] { 1.0 },
new double[1] { 0.0 } };
You may wonder why it is necessary to use a 2D array for XOR_IDEAL. In this case it looks unnecessary, because the XOR neural network has a single output value. However, neural networks can have many output neurons. Because of this, a 2D array is used to allow each row to potentially have multiple outputs.
Constructing the Neural Network
First, the neural network must now be constructed. First we create a BasicNetwork class. The BasicNetwork class is very extensible. It is currently the only implementation of the more generic INetwork interface needed by Encog.
BasicNetwork network = new BasicNetwork();
This neural network will have three layers. The input layer will have two input neurons, the output layer will have a single output neuron. There will also be a three neuron hidden layer to assist with processing. All three of these layers can use the BasicLayer class. This implements a feedforward neural network, or a multilayer perceptron. Each of these layers makes use of the ActivationSigmoid activation function. Sigmoid is a good activation function for XOR because the Sigmoid function only processes positive numbers. Finally, the true value specifies that this network should have thresholds.
network.AddLayer(new BasicLayer(new ActivationSigmoid(),true,2));
network.AddLayer(new BasicLayer(new ActivationSigmoid(),true,3));
network.AddLayer(new BasicLayer(new ActivationSigmoid(),true,1));
Lastly, the neural network structure is finalized. This builds temporary structures to allow the network to be quickly accessed. It is very important that FinalizeStructure is always called after the network has been built.
network.Structure.
network.Reset();
Finally, the Reset method is called to initialize the weights and thresholds to random values. The training algorithm will organize these random values into meaningful weights and thresholds that produce the desired result.
Constructing the Training Set
Now that the network has been created, the training data must be constructed. We already saw the input and ideal arrays created earlier. Now, we must take these arrays and represent them as INeuralDataSet. The following code does this.
INeuralDataSet trainingSet = new BasicNeuralDataSet(
XOR_INPUT, XOR_IDEAL);
A BasicNeuralDataSet is used, it is one of several training set types that implement the INeuralDataSet interface. Other implementations of INeuralDataSet can pull data from a variety of abstract sources, such as SQL, HTTP or image files.
Training the Neural Network
We now have a BasicNetwork object and a INeuralDataSet object. This is all that is needed to train a neural network. To implement backpropagation training we instantiate a Backpropagation object, as follows.
ITrain train = new Backpropagation(network, trainingSet,
0.7, 0.8);
As previously discussed, backpropagation training makes use of a learning rate and a momentum. The value 0.7 is used for the learning rate, the value 0.8 is used for the momentum. Picking proper values for the learning rate and momentum is something of a trial and error process. Too high of a learning rate and the network will no longer decrease its error rate. Too low of a learning rate will take too long to train. If the error rate refuses to lower, even with a lower learning rate, the momentum should be increased to help the neural network get out of a local minimum.
Propagation training is very much an iterative process. The Iteration method is called over and over; each time the network is slightly adjusted for a better error rate. The following loop will loop and train the neural network until the error rate has fallen below one percent.
do
{
train.Iteration();
Console.WriteLine("Epoch #" + epoch + " Error:"
+ train.Error);
epoch++;
} while ((epoch < 5000) && (train.Error > 0.001));
Each trip through the loop is called an epoch, or an iteration. The error rate is the amount that the actual output from the neural network differs from the ideal output provided to the training set.
Evaluating the Neural Network
Now that the neural network has been trained, it should be executed to see how well it functions. We begin by displaying a heading as follows:.
Console.WriteLine("Neural Network Results:");
We will
now loop through each of the training set elements. A INeuralDataSet
is made up of a collection of INeuralDataPair classes. Each INeuralDataPair class contains an input and an ideal property.
Each of these two properties is a INeuralData object that essentially contains an array.
This is how Encog stores the training data. We begin by looping
over all of the INeuralDataPair objects contained in the INeuralDataSet object.
foreach (INeuralDataPair pair in trainingSet)
{
For each of the INeuralDataPair objects, we compute the neural network's output using the input property of the INeuralDataPair object.
INeuralData output = network.Compute(pair.Input);
We now display the ideal output, as well as the actual output for the neural network.
Console.WriteLine(pair.Input[
+ pair.Input[1]
+ ", actual=" + output[0] + ",ideal="
+ pair.Ideal[0]);
}
The output from this neural network is shown here.
Epoch #1 Error:0.504998283847474
Epoch #2 Error:0.504948046227928
Epoch #3 Error:0.5028968616826613
Epoch #4 Error:0.5034596686580215
Epoch #5 Error:0.5042340438643891
Epoch #6 Error:0.5034282078077391
Epoch #7 Error:0.501995999394481
Epoch #8 Error:0.5014532303103851
Epoch #9 Error:0.5016773751196401
Epoch #10 Error:0.5016348354128658
...
Epoch #3340 Error:0.01000800225100623
Epoch #3341 Error:0.010006374293649473
Epoch #3342 Error:0.01000474710532496
Epoch #3343 Error:0.010003120685432222
Epoch #3344 Error:0.010001495033371149
Epoch #3345 Error:0.009999870148542572
Neural Network Results:
0.0,0.0, actual=0.010977229866756838,
1.0,0.0, actual=0.9905671966735671,
0.0,1.0, actual=0.989931152973507,
1.0,1.0, actual=0.009434016119752921,
First, you will see the training epochs counting upwards and decreasing the error. The error starts out at 0.50, which is just above 50%. At epoch 3,345, the error has dropped below one percent and training can stop.
The program then evaluates the neural network by cycling through the training data and presenting each training element to the neural network. You will notice from the above data that the results do not exactly match the ideal results. For instance the value 0.0109 does not exactly match 0.0. However, it is close. Remember that the network was only trained to a one percent error. As a result, the data is not going to match precisely.
In this example, we are evaluating the neural network with the very data that it was trained with. This is fine for a simple example, where we only have four training elements. However, you will usually want to hold back some of your data to with which to validate the neural network. Validating the network with the same data that it was trained with does not prove much. However, validating good results with data other than what the neural network was trained with proves that the neural network has gained some sort of an insight into the data that it is processing.
Something else that is interesting to note is the number of iterations it took to get an acceptable error. Backpropagation took 3,345 iterations to get to an acceptable error. Different runs of this example produce different results, as we are starting from randomly generated weights and thresholds. However, the number 3,345 is a fairly good indication of the efficiency of the backpropagation algorithm. This number will be compared to the other propagation training algorithms.
Using the Manhattan Update Rule
Next, we will look at how to implement the Manhattan update rule. There are very few changes that are needed to the backpropagation example to cause it to use the Manhattan update rule. Listing 5.2 shows the complete Manhattan update rule example.
Listing 5.2: Using the Manhattan Update Rule
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Encog.Neural.Networks;
using Encog.Neural.Networks.Layers;
using Encog.Neural.Activation;
using Encog.Neural.NeuralData;
using Encog.Neural.Data.Basic;
using Encog.Neural.Networks.
using Encog.Neural.Data;
using Encog.Neural.Networks.
using ConsoleExamples.Examples;
namespace Encog.Examples.XOR.Manhattan
{
/// <summary>
/// Learn to recognize the XOR pattern using a
/// Manhattan update rule training algorithm.
/// </summary>
public class XORManhattan:IExample
{
public static ExampleInfo Info
{
get
{
ExampleInfo info = new ExampleInfo(
typeof(XORManhattan),
"xor-manhattan",
"XOR Operator with Manhattan Update Rule",
"Use the Manhattan Update Rule to learn the XOR operator.");
return info;
}
}
/// <summary>
/// Input for the XOR function.
/// </summary>
public static double[][] XOR_INPUT ={
new double[2] { 0.0, 0.0 },
new double[2] { 1.0, 0.0 },
new double[2] { 0.0, 1.0 },
new double[2] { 1.0, 1.0 } };
/// <summary>
/// Ideal output for the XOR function.
/// </summary>
public static double[][] XOR_IDEAL = {
new double[1] { 0.0 },
new double[1] { 1.0 },
new double[1] { 1.0 },
new double[1] { 0.0 } };
/// <summary>
/// Program entry point.
/// </summary>
/// <param name="args">Not used.</param>
public void Execute(IExampleInterface app)
{
BasicNetwork network = new BasicNetwork();
network.AddLayer(
new BasicLayer(new ActivationSigmoid(), true, 2));
network.AddLayer(
new BasicLayer(new ActivationSigmoid(), true, 3));
network.AddLayer(
new BasicLayer(new ActivationSigmoid(), true, 1));
network.Structure.
network.Reset();
INeuralDataSet trainingSet =
new BasicNeuralDataSet(XOR_INPUT, XOR_IDEAL);
// train the neural network
ITrain train =
new ManhattanPropagation(network,
trainingSet, 0.0001);
int epoch = 1;
do
{
train.Iteration();
Console.WriteLine(
"Epoch #" + epoch + " Error:" + train.Error);
epoch++;
} while (train.Error > 0.001);
// test the neural network
Console.WriteLine("Neural Network Results:");
foreach (INeuralDataPair pair in trainingSet)
{
INeuralData output =
network.Compute(pair.Input);
Console.WriteLine(pair.Input[
+ pair.Input[1]
+ ", actual=" + output[0]
+ ",ideal=" + pair.Ideal[0]);
}
}
}
}
There is
really only one line that has changed from the backpropagation example.
Because the ManhattanPropagation object uses the same
ITrain train =
new ManhattanPropagation(network,
trainingSet, 0.0001);
As previously discussed, the Manhattan update rule works by using a single constant value to adjust the weights and thresholds. This is usually a very small number so as not to introduce rapid of change into the network. For this example, the number 0.0001 was chosen. Picking this number usually comes down to trial and error, as was the case with backpropagation. A value that is too high causes the network to change randomly and never converge to a number.
The Manhattan update rule will tend to behave somewhat randomly at first. The error rate will seem to improve and then worsen. But it will gradually trend lower. After 710,954 iterations the error rate is acceptable.
Epoch #710941 Error:0.011714647667850289
Epoch #710942 Error:0.011573263349587842
Epoch #710943 Error:0.011431878106128258
Epoch #710944 Error:0.011290491948778713
Epoch #710945 Error:0.011149104888883382
Epoch #710946 Error:0.011007716937768005
Epoch #710947 Error:0.010866328106765183
Epoch #710948 Error:0.010724938407208937
Epoch #710949 Error:0.010583547850435736
Epoch #710950 Error:0.010442156447783919
Epoch #710951 Error:0.010300764210593727
Epoch #710952 Error:0.01015937115020837
Epoch #710953 Error:0.010017977277972472
Epoch #710954 Error:0.009876582605234318
Neural Network Results:
0.0,0.0, actual=-0.013777528025884167,
1.0,0.0, actual=0.9999999999999925,
0.0,1.0, actual=0.9999961061923577,
1.0,1.0, actual=-0.013757731687977337,
As you can see the Manhattan update rule took considerably more iterations to find a solution than the backpropagation. There are certain cases where the Manhattan rule is preferable to backpropagation training. However, for a simple case like the XOR problem, backpropagation is a better solution than the Manhattan rule. Finding a better delta value may improve the efficiency of the Manhattan update rule.
Using Resilient Propagation
One of the most difficult aspects of the backpropagation and the Manhattan update rule learning is picking the correct training parameters. If a bad choice is made for the learning rate, training momentum or delta values will not be as successful as it might have been. Resilient propagation does have training parameters, but it is extremely rare that they need to be changed from their default values. This makes resilient propagation a very easy way to use a training algorithm. Listing 5.3 shows an XOR example using the resilient propagation algorithm.
Listing 5.3: Using Resilient Propagation
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Encog.Neural.Networks;
using Encog.Neural.Networks.Layers;
using Encog.Neural.Activation;
using Encog.Neural.Data.Basic;
using Encog.Neural.NeuralData;
using Encog.Neural.Networks.
using Encog.Neural.Data;
using Encog.Neural.Networks.
using ConsoleExamples.Examples;
namespace Encog.Examples.XOR.Resilient
{
/// <summary>
/// XOR: This example is essentially the "Hello World" of
/// neural network
/// programming. This example shows how to construct an
/// Encog neural
/// network to predict the output from the XOR operator.
/// This example
/// uses RPROP to train the neural network.
/// </summary>
public class XORResilient : IExample
{
public static ExampleInfo Info
{
get
{
ExampleInfo info = new ExampleInfo(
typeof(XORResilient),
"xor-rprop",
"XOR Operator with Resilient Propagation",
"Use RPROP to learn the XOR operator.");
return info;
}
}
/// <summary>
/// Input for the XOR function.
/// </summary>
public static double[][] XOR_INPUT ={
new double[2] { 0.0, 0.0 },
new double[2] { 1.0, 0.0 },
new double[2] { 0.0, 1.0 },
new double[2] { 1.0, 1.0 } };
/// <summary>
/// Ideal output for the XOR function.
/// </summary>
public static double[][] XOR_IDEAL = {
new double[1] { 0.0 },
new double[1] { 1.0 },
new double[1] { 1.0 },
new double[1] { 0.0 } };
/// <summary>
/// Program entry point.
/// </summary>
/// <param name="args">Not used.</param>
public void Execute(IExampleInterface app)
{
BasicNetwork network = new BasicNetwork();
network.AddLayer(
new BasicLayer(new ActivationSigmoid(), true, 2));
network.AddLayer(
new BasicLayer(new ActivationSigmoid(), true, 6));
network.AddLayer(
new BasicLayer(new ActivationSigmoid(), true, 1));
network.Structure.
network.Reset();
INeuralDataSet trainingSet =
new BasicNeuralDataSet(XOR_INPUT, XOR_IDEAL);
// train the neural network
// train the neural network
ITrain train =
new ResilientPropagation(network, trainingSet);
int epoch = 1;
do
{
train.Iteration();
Console.WriteLine("Epoch #" + epoch
+ " Error:" + train.Error);
epoch++;
} while ((epoch < 5000) && (train.Error > 0.001));
// test the neural network
Console.WriteLine("Neural Network Results:");
foreach (INeuralDataPair pair in trainingSet)
{
INeuralData output =
network.Compute(pair.Input);
Console.WriteLine(pair.Input[
+ pair.Input[1]
+ ", actual=" + output[0] + ",ideal="
+ pair.Ideal[0]);
}
}
}
}
The following line of code creates a ResilientPropagation object that will be used to train the neural network.
ITrain train =
new ResilientPropagation(network, trainingSet);
As you can see there are no training parameters provided to the ResilientPropagation object. Running this example program will produce the following results.
Epoch #1 Error:0.5108505683309112
Epoch #2 Error:0.5207537811846186
Epoch #3 Error:0.5087933421445957
Epoch #4 Error:0.5013907858935785
Epoch #5 Error:0.5013907858935785
Epoch #6 Error:0.5000489677062201
Epoch #7 Error:0.49941437656150733
Epoch #8 Error:0.49798185395576444
Epoch #9 Error:0.4980795840636415
Epoch #10 Error:0.4973134271412919
...
Epoch #270 Error:0.010865894525995278
Epoch #271 Error:0.010018272841993655
Epoch #272 Error:0.010068462218315439
Epoch #273 Error:0.009971267210982099
Neural Network Results:
0.0,0.0, actual=0.00426845952539745,
1.0,0.0, actual=0.9849930511468161,
0.0,1.0, actual=0.9874048605752819,
1.0,1.0, actual=0.0029321659866812233,
Not only is the resilient propagation algorithm easier to use, it is also considerably more efficient than backpropagation or the Manhattan update rule.
Propagation and Multithreading
As of the writing of this book, single core computers are becoming much less common than multi core computers. A dual core computer effectively has two complete processors in a single chip. Quadcore computers have four processors on a single chip. The latest generation of Quadcores, the Intel i7, comes with hyperthreading as well. Hyperthreading allows one core processor to appear as two by simultaneously executing multiple instructions. A computer that uses hyperthreading technology will actually report twice the number of cores that is actually installed.
Processors seem to have maxed out their speeds at around 3 gigahertz. Growth in computing power will not be in the processing speed of individual processors. Rather, future growth will be in the number of cores a computer has. However, taking advantage of these additional cores can be a challenge for the computer programmer. To take advantage of these cores you must write multithreaded software.
Entire books are written on multithreaded programming, so it will not be covered in depth here. However, the general idea is to take a large problem and break it down into manageable pieces that be executed independently by multiple threads. The final solution must then be pieced back together from each of the threads. This process is called aggregation.
Encog makes use of multithreading in many key areas. One such area is training. By default the propagation training techniques will use multithreading if it appears that multithreading will help performance. Specifically, there should be more than one core and sufficient training data for multithreading to be worthwhile. If both of these elements are present, any of the propagation techniques will make use of multithreading.
It is possible to tell Encog to use a specific number of threads, or disable threading completely. The NumThreads property provided by all of the propagation training algorithms does this. To run in single threaded mode, specify one thread. To specify a specific number of threads specify the number of threads desired. Finally, to allow Encog to determine the optimal number of threads, specify zero threads. Zero is the default value for the number of threads.
When Encog is requested to determine the optimal number of threads to use, several things are considered. Encog considers the number of cores that are available. Encog also considers the size of the training data. Multithreaded training works best with larger training sets.
How Multithreaded Training Works
Multithreaded training works particularly well with larger training sets and machines multiple cores. If Encog does not detect that both are present, it will fall back to single threaded. When there is more than one processing core, and enough training set items to keep both cores busy, multithreaded training will function significantly faster than single threaded.
We've already looked at three propagation-training techniques. All propagation-training techniques work similarly. Whether it is backpropagation, resilient propagation or the Manhattan update rule, the technique is similar. There are two three distinct steps:
1. Perform a Regular Feed Forward Pass.
2. Process the levels backwards, and determine the errors at each level.
3. Apply the changes to the weights and thresholds.
First, a regular feed forward pass is performed. The output from each level is kept so the error for each level can be evaluated independently. Second, the errors are calculated at each level, and the derivatives of each of the activation functions are used to calculate gradient descents. These gradients show the direction that the weight must be modified to improve the error of the network. These gradients will be used in the third step.
The third step is what varies among the different training algorithms. Backpropagation simply takes the gradient descents and scales them by a learning rate. The scaled gradient descents are then directly applied to the weights and thresholds. The Manhattan Update Rule only uses the sign of the gradient to decide in which direction to affect the weight. The weight is then changed in either the positive or negative direction by a fixed constant.
RPROP keeps an individual delta value for every weight and thresholds and only uses the sign of the gradient descent to increase or decrease the delta amounts. The delta amounts are then applied to the weights and thresholds.
The multithreaded algorithm uses threads to perform Steps 1 and 2. The training data is broken into packets that are distributed among the threads. At the beginning of each iteration, threads are started to handle each of these packets. Once all threads have completed, a single thread aggregates all of the results from the threads and applies them to the neural network. There is a very brief amount of time where only one thread is executing, at the end of the iteration. This can be seen from Figure 5.1.
Figure 5.1: Encog Training on a Hyperthreaded Quadcore
As you can see from the above image, the i7 is currently running at 100%. You can clearly see the end of each iteration, where each of the processors falls briefly. Fortunately, this is a very brief time, and does not have a large impact on overall training efficiency. I did try implementations where I did not force the threads to wait at the end of the iteration for a resynchronization. However, these did not provide efficient training because the propagation training algorithms need all changes applied before the next iteration begins.
Using Multithreaded Training
To see multithreaded training really shine, a larger training set is needed. In the next chapter we will see how to gather information for Encog, and larger training sets will be used. However, for now, we will look a simple benchmarking example that generates a random training set and compares multithreaded and single-threaded training times.
A simple benchmark is shown that makes use of an input layer of 40 neurons, a hidden layer of 60 neurons, and an output layer of 20 neurons. A training set of 50,000 elements is used. This example is shown in Listing 5.4.
Listing 5.4: Using Multithreaded Training
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ConsoleExamples.Examples;
using Encog.Neural.Networks;
using Encog.Neural.NeuralData;
using Encog.Neural.Networks.Layers;
using Encog.Util.Banchmark;
using Encog.Neural.Networks.
using Encog.Util.Logging;
namespace Encog.Examples.MultiBench
{
public class MultiThreadBenchmark:IExample
{
public const int INPUT_COUNT = 40;
public const int HIDDEN_COUNT = 60;
public const int OUTPUT_COUNT = 20;
private IExampleInterface app;
public static ExampleInfo Info
{
get
{
ExampleInfo info = new ExampleInfo(
typeof(MultiThreadBenchmark),
"multibench",
"Multithreading Benchmark",
"See the effects that multithreading has on performance.");
return info;
}
}
public BasicNetwork generateNetwork()
{
BasicNetwork network = new BasicNetwork();
network.AddLayer(
new BasicLayer(
network.AddLayer(
new BasicLayer(
network.AddLayer(
new BasicLayer(
network.Structure.
network.Reset();
return network;
}
public INeuralDataSet generateTraining()
{
INeuralDataSet training =
RandomTrainingFactory.
INPUT_COUNT, OUTPUT_COUNT, -1, 1);
return training;
}
public double evaluateRPROP(
BasicNetwork network, INeuralDataSet data)
{
ResilientPropagation train =
new ResilientPropagation(network, data);
long start = DateTime.Now.Ticks;
Console.WriteLine(
"Training 20 Iterations with RPROP");
for (int i = 1; i <= 1; i++)
{
train.Iteration();
Console.WriteLine("Iteration #" + i + " Error:"
+ train.Error);
}
//train.FinishTraining();
long stop = DateTime.Now.Ticks;
double diff = new TimeSpan(stop - start).Seconds;
Console.WriteLine("RPROP Result:" + diff
+ " seconds.");
Console.WriteLine("Final RPROP error: "
+ network.CalculateError(data));
return diff;
}
public double evaluateMPROP(
BasicNetwork network, INeuralDataSet data)
{
ResilientPropagation train =
new ResilientPropagation(network, data);
long start = DateTime.Now.Ticks;
Console.WriteLine(
"Training 20 Iterations with MPROP");
for (int i = 1; i <= 20; i++)
{
train.Iteration();
Console.WriteLine("Iteration #" + i + " Error:"
+ train.Error);
}
//train.finishTraining();
long stop = DateTime.Now.Ticks;
double diff = new TimeSpan(stop - start).Seconds;
Console.WriteLine("MPROP Result:"
+ diff + " seconds.");
Console.WriteLine("Final MPROP error: "
+ network.CalculateError(data));
return diff;
}
public void Execute(IExampleInterface app)
{
this.app = app;
Logging.StopConsoleLogging();
BasicNetwork network = generateNetwork();
INeuralDataSet data = generateTraining();
double rprop = evaluateRPROP(network, data);
double mprop = evaluateMPROP(network, data);
double factor = rprop / mprop;
Console.WriteLine("Factor improvement:" + factor);
}
}
}
I executed this program on a Quadcore i7 with Hyperthreading. The following was the result.
Training 20 Iterations with Single-threaded
Iteration #1 Error:1.0594453784075148
Iteration #2 Error:1.0594453784075148
Iteration #3 Error:1.0059791059086385
Iteration #4 Error:0.955845375587124
Iteration #5 Error:0.934169803870454
Iteration #6 Error:0.9140418793336804
Iteration #7 Error:0.8950880473422747
Iteration #8 Error:0.8759150228219456
Iteration #9 Error:0.8596693523930371
Iteration #10 Error:0.843578483629412
Iteration #11 Error:0.8239688415389107
Iteration #12 Error:0.8076160458145523
Iteration #13 Error:0.7928442431442133
Iteration #14 Error:0.7772585699972144
Iteration #15 Error:0.7634533283610793
Iteration #16 Error:0.7500401666509937
Iteration #17 Error:0.7376158116045242
Iteration #18 Error:0.7268954113068246
Iteration #19 Error:0.7155784667628093
Iteration #20 Error:0.705537166118038
RPROP Result:35.134 seconds.
Final RPROP error: 0.6952141684716632
Training 20 Iterations with Multithreading
Iteration #1 Error:0.6952126315707992
Iteration #2 Error:0.6952126315707992
Iteration #3 Error:0.90915249248788
Iteration #4 Error:0.8797061675258835
Iteration #5 Error:0.8561169673033431
Iteration #6 Error:0.7909509694056177
Iteration #7 Error:0.7709539415065737
Iteration #8 Error:0.7541971172618358
Iteration #9 Error:0.7287094412886507
Iteration #10 Error:0.715814914438935
Iteration #11 Error:0.7037730808705016
Iteration #12 Error:0.6925902585055886
Iteration #13 Error:0.6784038181007823
Iteration #14 Error:0.6673310323078667
Iteration #15 Error:0.6585209150749294
Iteration #16 Error:0.6503710867148986
Iteration #17 Error:0.6429473784897797
Iteration #18 Error:0.6370962075614478
Iteration #19 Error:0.6314478792705961
Iteration #20 Error:0.6265724296587237
Multi-Threaded Result:8.793 seconds.
Final Multi-thread error: 0.6219704300851074
Factor improvement:4.0106783805299674
As you can see from the above results, the single threaded RPROP algorithm finished in 128 seconds, the multithreaded RPROP algorithm finished in only 31 seconds. Multithreading improved performance by a factor of four. Your results running the above example will depend on how many cores your computer has. If your computer is single core, with no hyperthreading, then the factor will be close to one. This is because the second multi-threading training will fall back to a single thread.
Summary
In this chapter you saw how to use three different propagation algorithms with Encog. Propagation training is a very common class of supervised training algorithms. In this chapter you saw how to use three different propagation training algorithms. Resilient propagation training is usually the best choice; however; the Manhattan update rule and backpropagation may be useful for certain situations.
Backpropagation was one of the original training algorithms for feedforward neural networks. Though Encog supports it mostly for historic purposes, it can sometimes be used to further refine a neural network after resilient propagation has been used. Backpropagation uses a learning rate and momentum. The learning rate defines how quickly the neural network will learn; the momentum helps the network get out of local minima.
The Manhattan update rule uses a delta value to change update the weight and threshold values. It can be difficult to choose this delta value correctly. Too high of a value will cause the network to learn nothing at all.
Resilient propagation (RPROP) is one of the best training algorithms offered by Encog. It does not require you to provide training parameters, like the other two propagation-training algorithms. This makes it much easier to use. Additionally, resilient propagation is considerably more efficient than Manhattan update rule or backpropagation.
Multithreaded training is a training technique that adapts propagation training to perform faster with multicore computers. Given a computer with multiple cores and a large enough training set, multithreaded training is considerably faster than single-threaded training. Encog can automatically set an optimal number of threads. If these conditions are not present, Encog will fall back to single threaded training.
Propagation training is not the only type of supervised training that can be used with Encog. In the next chapter we will see some other types of training algorithms that can be used for supervised training. You will see how training techniques such as simulated annealing and genetic algorithms can be used.
Questions for Review
1. What is the primary difference in the way backpropagation and the Manhattan update rule function?
2. What training parameters must be provided to the backpropagation algorithm?
3. What training parameters must be provided to the Manhattan update rule?
4. What is the difference between learning rate and momentum?
5. What is the “error rate” for a neural network using supervised training?
Terms
Backpropagation
Backward Pass
Batch Training
Epoch
Error Rate
Forward Pass
Gradient
Ideal Output
Iteration
Learning rate
Manhattan Update Rule
Momentum
Multicore
Multithreaded
Propagation Training
Resilient Propagation
Single Threaded
Update Delta




