Chapter 1: Introduction to Encog

jeffheaton's picture


Chapter 1: Introduction to Encog

  • The Encog Framework
  • What is a Neural Network?
  • Using a Neural Network
  • Training a Neural Network

Artificial neural networks are programming techniques that attempt to emulate the human brain's biological neural networks. Artificial neural networks (ANNs) are just one branch of artificial intelligence (AI). This book focuses primarily on artificial neural networks, frequently called simply neural networks, and the use of the Encog Artificial Intelligence Framework, usually just referred to as Encog. Encog is an open source project that provides neural network and HTTP bot functionality.

This book explains how to use neural networks with Encog and the C# programming language. Though this book focuses on C#, it could be used as a guide for other .Net language, such as VB.Net. Obviously, code contained in this book would need to be translated from C# to the .Net language of your choice.

The emphasis of this book is on how to use the neural networks, rather than how to actually create the software necessary to implement a neural network. Encog provides all of the low-level code necessary to construct many different kinds of neural networks. If you are interested in learning to actually program the internals of a neural network, using C#, you may be interested in the book “Introduction to Neural Networks with C#” (ISBN: 978-1604390094).

Encog provides the tools to create many different neural network types. Encog supports feedforward, recurrent, self-organizing maps, radial basis function and Hopfield neural networks. The low-level types provided by Encog can be recombined and extended to support additional neural network architectures as well. The Encog Framework can be obtained from the following URL:

http://www.encog.org/

Encog is released under the Lesser GNU Public License (LGPL). All of the source code for Encog is provided in a Subversion (SVN) source code repository provided by the Google Code project. Encog is also available for the Java and Silverlight platforms.

Encog neural networks, and related data, can be stored in .EG files. These files can be edited by a GUI editor provided with Encog. The Encog Workbench allows you to edit, train and visualize neural networks. The Encog Workbench can generate code in Java, Visual Basic or C#. The Encog Workbench can be downloaded from the above URL.

What is a Neural Network?

We will begin by examining what exactly a neural network is. A simple feedforward neural network can be seen in Figure 1.1. This diagram was created with the Encog Workbench. It is not just a diagram; this is an actual functioning neural network from Encog as you would actually edit it.

Figure 1.1: Simple Feedforward Neural Network

Figure 1.1: Simple Feedforward 
Neural Network

Networks can also become more complex than the simple network above. Figure 1.2 shows a recurrent neural network.

Figure 1.2: Simple Recurrent Neural Network

Figure 1.2: Simple Recurrent Neural Network

Looking at the above two neural networks you will notice that they are composed of layers, represented by the boxes. These layers are connected by lines, which represent synapses. Synapses and layers are the primary building blocks for neural networks created by Encog. The next chapter focuses solely on layers and synapses.

Before we learn to build neural networks with layers and synapses, let’s first look at what exactly a neural network is. Look at Figures 1.1 and 1.2. They are quite a bit different, but they share one very important characteristic. They both contain a single input layer and a single output layer. What happens between these two layers is very different, between the two networks. In this chapter, we will focus on what comes into the input layer and goes out of the output layer. The rest of the book will focus on what happens between these two layers.

Almost every neural network seen in this book will have, at a minimum, an input and output layer. In some cases, the same layer will function as both input and output layer. You can think of the general format of any neural network found in this book as shown in Figure 1.3.

Figure 1.3: Generic Form of a Neural Network

Figure 1.3: Generic Form of a Neural Network

To adapt a problem to a neural network, you must determine how to feed the problem into the input layer of a neural network, and receive the solution through the output layer of a neural network. We will look at the input and output layers in this chapter. We will then determine how to structure the input and interpret the output. The input layer is where we will start.

Understanding the Input Layer

The input layer is the first layer in a neural network. This layer, like all layers, has a specific number of neurons in it. The neurons in a layer all contain similar properties. The number of neurons determines how the input to that layer is structured. For each input neuron, one double value is stored. For example, the following array could be used as input to a layer that contained five neurons.

double[] input = new double[5];

The input to a neural network is always an array of doubles. The size of this array directly corresponds to the number of neurons on this layer. Encog uses the class INeuralData to hold these arrays. You could easily convert the above array into an INeuralData object with the following line of code.

INeuralData data = new BasicNeuralData(input);

The interface INeuralData defines any “array like” data that may be presented to Encog. You must always present the input to the neural network inside of an INeuralData object. The class BasicNeuralData implements the INeuralData interface. The class BasicNeuralData is not the only way to provide Encog with data. There are other implementations of INeuralData, as well. We will see other implementations later in the book.

The BasicNeuralData class simply provides a memory-based data holder for the neural network. Once the neural network processes the input, an INeuralData based class will be returned from the neural network's output layer. The output layer is discussed in the next section.

Understanding the Output Layer

The output layer is the final layer in a neural network. The output layer provides the output after all of the previous layers have had a chance to process the input. The output from the output layer is very similar in format to the data that was provided to the input layer. The neural network outputs an array of doubles.

The neural network wraps the output in a class based on the INeuralData interface. Most of the built in neural network types will return a BasicNeuralData class as the output. However, future, and third party, neural network classes may return other classes based other implementations of the INeuralData interface.

Neural networks are designed to accept input, which is an array of doubles, and then produce output, which is also an array of doubles. Determining how to structure the input data, and attaching meaning to the output, are two of the main challenges to adapting a problem to a neural network. The real power of a neural network comes from its pattern recognition capabilities. The neural network should be able to produce the desired output even if the input has been slightly distorted.

Hidden Layers

As previously discussed, neural networks contain and input layer and an output layer. Sometimes the input layer and output layer are the same. Often the input and output layer are two separate layers. Additionally, other layers may exist between the input and output layers. These layers are called hidden layers. These hidden layers can be simply inserted between the input and output layers. The hidden layers can also take on more complex structures.

The only purpose of the hidden layers is to allow the neural network to better produce the expected output for the given input. Neural network programming involves first defining the input and output layer neuron counts. Once you have defined how to translate the programming problem into the input and output neuron counts, it is time to define the hidden layers.

The hidden layers are very much a “black box”. You define the problem in terms of the neuron counts for the hidden and output layers. How the neural network produces the correct output is performed, in part, by the hidden layers. Once you have defined the structure of the input and output layers you must define a hidden layer structure that optimally learns the problem. If the structure of the hidden layer is too simple it may not learn the problem. If the structure is too complex, it will learn the problem but will be very slow to train and execute.

Later chapters in this book will discuss many different hidden layer structures. You will learn how to pick a good structure, based on the problem that you are trying to solve. Encog also contains some functionality to automatically determine a potentially optimal hidden layer structure. Additionally, Encog also contains functions to prune back an overly complex structure. Chapter 13, “Pruning and Structuring Networks” shows how Encog can help create a potentially optimal structure.

Some neural networks have no hidden layers. The input layer may be directly connected to the output layer. Further, some neural networks have only a single layer. A single layer neural network has the single layer self-connected. These connections permit the network to learn. Contained in these connections, called synapses, are individual weight matrixes. These values are changed as the neural network learns. We will learn more about weight matrixes in the next chapter.

Using a Neural Network

We will now look at how to structure a neural network for a very simple problem. We will consider creating a neural network that can function as an XOR operator. Learning the XOR operator is a frequent “first example” when demonstrating the architecture of a new neural network. Just as most new programming languages are first demonstrated with a program that simply displays “Hello World”, neural networks are frequently demonstrated with the XOR operator. Learning the XOR operator is sort of the “Hello World” application for neural networks.

The XOR Operator and Neural Networks

The XOR operator is one of three commonly used Boolean logical operators. The other two are the AND and OR operators. For each of these logical operators, there are four different combinations. For example, all possible combinations for the AND operator are shown below.

0 AND 0 = 0

1 AND 0 = 0

0 AND 1 = 0

1 AND 1 = 1

This should be consistent with how you learned the AND operator for computer programming. As its name implies, the AND operator will only return true, or one, when both inputs are true.

The OR operator behaves as follows.

0 OR 0 = 0

1 OR 0 = 1

0 OR 1 = 1

1 OR 1 = 1

This also should be consistent with how you learned the OR operator for computer programming. For the OR operator to be true, either of the inputs must be true.

The “exclusive or” (XOR) operator is less frequently used in computer programming, so you may not be familiar with it. XOR has the same output as the OR operator, except for the case where both inputs are true. The possible combinations for the XOR operator are shown here.

0 XOR 0 = 0

1 XOR 0 = 1

0 XOR 1 = 1

1 XOR 1 = 0

As you can see the XOR operator only returns true when both inputs differ. In the next section we will see how to structure the input, output and hidden layers for the XOR operator.

Structuring a Neural Network for XOR

There are two inputs to the XOR operator and one output. The input and output layers will be structured accordingly. We will feed the input neurons the following double values:

0.0,0.0

1.0,0.0

0.0,1.0

1.0,1.0

These values correspond to the inputs to the XOR operator, shown above. We will expect the one output neuron to produce the following double values:

0.0

1.0

1.0

0.0

This is one way that the neural network can be structured. This method allows a simple feedforward neural network to learn the XOR operator. The feedforward neural network, also called a perceptron, is one of the first neural network architectures that we will learn.

There are other ways that the XOR data could be presented to the neural network. Later in this book we will see two examples of recurrent neural networks. We will examine the Elman and Jordan styles of neural networks. These methods would treat the XOR data as one long sequence. Basically concatenate the truth table for XOR together and you get one long XOR sequence, such as:

0.0,0.0,0.0,

0.0,1.0,1.0,

1.0,0.0,1.0,

1.0,1.0,0.0

The line breaks are only for readability. This is just treating XOR as a long sequence. By using the data above, the network would have a single input neuron and a single output neuron. The input neuron would be fed one value from the list above, and the output neuron would be expected to return the next value.

This shows that there is often more than one way to model the data for a neural network. How you model the data will greatly influence the success of your neural network. If one particular model is not working, you may need to consider another. For the examples in this book we will consider the first model we looked at for the XOR data.

Because the XOR operator has two inputs and one output, the neural network will follow suit. Additionally, the neural network will have a single hidden layer, with two neurons to help process the data. The choice for 2 neurons in the hidden layer is arbitrary, and often comes down to trial and error. The XOR problem is simple, and two hidden neurons are sufficient to solve it. A diagram for this network can be seen in Figure 1.4.

Figure 1.4: Neuron Diagram for the XOR Network

Figure 1.4: Neuron Diagram for the XOR Network

Usually, the individual neurons are not drawn on neural network diagrams. There are often too many. Similar neurons are grouped into layers. The Encog workbench displays neural networks on a layer-by-layer basis. Figure 1.5 shows how the above network is represented in Encog.

Figure 1.5: Encog Layer Diagram for the XOR Network

Figure 1.5: Encog Layer Diagram for the XOR Network

The code needed to create this network is relatively simple.

BasicNetwork network = new BasicNetwork();

network.AddLayer(new BasicLayer(2));

network.AddLayer(new BasicLayer(2));

network.AddLayer(new BasicLayer(1));

network.Structure.FinalizeStructure();

network.Reset();

In the above code you can see a BasicNetwork being created. Three layers are added to this network. The first layer, which becomes the input layer, has two neurons. The hidden layer is added second, and it has two neurons also. Lastly, the output layer is added, which has a single neuron. Finally, the FinalizeStructure method must be called to inform the network that no more layers are to be added. The call to Reset randomizes the weights in the connections between these layers.

Neural networks frequently start with a random weight matrix. This provides a starting point for the training methods. These random values will be tested and refined into an acceptable solution. However, sometimes the initial random values are too far off. Sometimes it may be necessary to reset the weights again, if training is ineffective.

These weights make up the long-term memory of the neural network. Additionally, some layers have threshold values that also contribute to the long-term memory of the neural network. Some neural networks also contain context layers, which give the neural network a short-term memory as well. The neural network learns by modifying these weight and threshold values. We will learn more about weights and threshold values in Chapter 2, “The Parts of an Encog Neural Network”.

Now that the neural network has been created, it must be trained. Training is discussed in the next section.

Training a Neural Network

To train the neural network, we must construct a INeuralDataSet object. This object contains the inputs and the expected outputs. To construct this object, we must create two arrays. The first array will hold the input values for the XOR operator. The second array will hold the ideal outputs for each of 115 corresponding input values. These will correspond to the possible values for XOR. To review, the four possible values are as follows:

0 XOR 0 = 0

1 XOR 0 = 1

0 XOR 1 = 1

1 XOR 1 = 0

First we will construct an array to hold the four input values to the XOR operator. This is done using a two dimensional double array. This array is as follows:

public static double[][] XOR_INPUT ={

new double[2] { 0.0, 0.0 },

new double[2] { 1.0, 0.0 },

new double[2] { 0.0, 1.0 },

new double[2] { 1.0, 1.0 } };

Likewise, an array must be created for the expected outputs for each of the input values. This array is as follows:

public static double[][] XOR_IDEAL = {

new double[1] { 0.0 },

new double[1] { 1.0 },

new double[1] { 1.0 },

new double[1] { 0.0 } };

Even though there is only one output value, we must still use a two-dimensional array to represent the output. If there had been more than one output neuron, there would have been additional columns in the above array.

Now that the two input arrays have been constructed an INeuralDataSet object must be created to hold the training set. This object is created as follows.

INeuralDataSet trainingSet = new BasicNeuralDataSet(XOR_INPUT, XOR_IDEAL);

Now that the training set has been created, the neural network can be trained. Training is the process where the neural network's weights are adjusted to better produce the expected output. Training will continue for many iterations, until the error rate of the network is below an acceptable level. First, a training object must be created. Encog supports many different types of training.

For this example we are going to use Resilient Propagation (RPROP). RPROP is perhaps the best general-purpose training algorithm supported by Encog. Other training techniques are provided as well, as certain problems are solved better with certain training techniques. The following code constructs a RPROP trainer.

ITrain train = new ResilientPropagation(network, trainingSet);

All training classes implement the ITrain interface. The RPROP algorithm is implemented by the ResilientPropagation class, which is constructed above.

Once the trainer has been constructed the neural network should be trained. Training the neural network involves calling the Iteration method on the ITrain class until the error is below a specific value.

int epoch = 1;

do

{

train.Iteration();

Console.WriteLine("Epoch #" + epoch + " Error:" + train.Error);

epoch++;

} while ((epoch < 5000) && (train.Error > 0.01));

The above code loops through as many iterations, or epochs, as it takes to get the error rate for the neural network to be below 1%. Once the neural network has been trained, it is ready for use. The next section will explain how to use a neural network.

Executing a Neural Network

Making use of the neural network involves calling the Compute method on the BasicNetwork class. Here we loop through every training set value and display the output from the neural network.

Console.WriteLine("Neural Network Results:");

foreach (INeuralDataPair pair in trainingSet)

{

INeuralData output = network.Compute(pair.Input);

Console.WriteLine(pair.Input[0] + "," + pair.Input[1]

+ ", actual=" + output[0] + ",ideal=" + pair.Ideal[0]);

}

The Compute method accepts an INeuralData class and also returns a INeuralData object. This contains the output from the neural network. This output is displayed to the user. With the program is run the training results are first displayed. For each Epoch, the current error rate is displayed.

Epoch #1 Error:0.5604437512295236

Epoch #2 Error:0.5056375155784316

Epoch #3 Error:0.5026960720526166

Epoch #4 Error:0.4907299498390594

...

Epoch #104 Error:0.01017278345766472

Epoch #105 Error:0.010557202078697751

Epoch #106 Error:0.011034965164672806

Epoch #107 Error:0.009682102808616387

The error starts at 56% at epoch 1. By epoch 107 the training has dropped below 1% and training stops. Because neural network was initialized with random weights, it may take different numbers of iterations to train each time the program is run. Additionally, though the final error rate may be different, it should always end below 1%.

Finally, the program displays the results from each of the training items as follows:

Neural Network Results:

0.0,0.0, actual=0.002782538818034049,ideal=0.0

1.0,0.0, actual=0.9903741937121177,ideal=1.0

0.0,1.0, actual=0.9836807956566187,ideal=1.0

1.0,1.0, actual=0.0011646072586172778,ideal=0.0

As you can see, the network has not been trained to give the exact results. This is normal. Because the network was trained to 1% error, each of the results will also be within generally 1% of the expected value.

Because the neural network is initialized to random values, the final output will be different on second run of the program.

Neural Network Results:

0.0,0.0, actual=0.005489822214926685,ideal=0.0

1.0,0.0, actual=0.985425090860287,ideal=1.0

0.0,1.0, actual=0.9888064742994463,ideal=1.0

1.0,1.0, actual=0.005923146369557053,ideal=0.0

Above, you see a second run of the program. The output is slightly different. This is normal.

This is the first Encog example. You can see the complete program in Listing 1.1. All of the examples contained in this book are also included with the examples downloaded with Encog. For more information on how to download these examples and where this particular example is located, refer to Appendix A, “Installing Encog”.

Listing 1.1: Solve XOR with RPROP

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using Encog.Neural.Networks;

using Encog.Neural.Networks.Layers;

using Encog.Neural.Activation;

using Encog.Neural.Data.Basic;

using Encog.Neural.NeuralData;

using Encog.Neural.Networks.Training;

using Encog.Neural.Data;

using Encog.Neural.Networks.Training.Propagation.Resilient;

using ConsoleExamples.Examples;

namespace Encog.Examples.XOR.Resilient

{

public class XORResilient : IExample

{

public static ExampleInfo Info

{

get

{

ExampleInfo info = new ExampleInfo(

typeof(XORResilient),

"xor-rprop",

"XOR Operator with Resilient Propagation",

"Use RPROP to learn the XOR operator.");

return info;

}

}

/// <summary>

/// Input for the XOR function.

/// </summary>

public static double[][] XOR_INPUT ={

new double[2] { 0.0, 0.0 },

new double[2] { 1.0, 0.0 },

new double[2] { 0.0, 1.0 },

new double[2] { 1.0, 1.0 } };

/// <summary>

/// Ideal output for the XOR function.

/// </summary>

public static double[][] XOR_IDEAL = {

new double[1] { 0.0 },

new double[1] { 1.0 },

new double[1] { 1.0 },

new double[1] { 0.0 } };

/// <summary>

/// Program entry point.

/// </summary>

/// <param name="args">Not used.</param>

public void Execute(IExampleInterface app)

{

BasicNetwork network = new BasicNetwork();

network.AddLayer(new BasicLayer(

new ActivationSigmoid(), true, 2));

network.AddLayer(new BasicLayer(

new ActivationSigmoid(), true, 6));

network.AddLayer(new BasicLayer(

new ActivationSigmoid(), true, 1));

network.Structure.FinalizeStructure();

network.Reset();

INeuralDataSet trainingSet =

new BasicNeuralDataSet(XOR_INPUT, XOR_IDEAL);

// train the neural network

// train the neural network

ITrain train =

new ResilientPropagation(network, trainingSet);

int epoch = 1;

do

{

train.Iteration();

Console.WriteLine("Epoch #" + epoch

+ " Error:" + train.Error);

epoch++;

} while ((epoch < 5000) && (train.Error > 0.001));

// test the neural network

Console.WriteLine("Neural Network Results:");

foreach (INeuralDataPair pair in trainingSet)

{

INeuralData output = network.Compute(pair.Input);

Console.WriteLine(pair.Input[0] + "," + pair.Input[1]

+ ", actual=" + output[0] + ",ideal=" + pair.Ideal[0]);

}

}

}

}

Chapter Summary

Encog is a framework that allows you to create neural networks or bot applications. This book focuses on using Encog to create neural network applications. This chapter focused on the overall layout of a neural network. In this chapter, you saw how to create an Encog application that could learn the XOR operator.

Neural networks are made up of layers. These layers are connected by synapses. The synapses contain weights that make up the memory of the neural network. Some layers also contain threshold values that also contribute to the memory of the neural network. Together, thresholds and weights make up the long-term memory of the neural network. Networks can also contain context layers. Context layers are used to form a short-term memory.

There are several different layer types supported by Encog. However, these layers fall into three groups, depending on where they are placed in the neural network. The input layer accepts input from the outside. Hidden layers accept data from the input layer for further processing. The output layer takes data, either from the input or final hidden layer, and presents it on to the outside world.

The XOR operator was used as an example for this chapter. The XOR operator is frequently used as a simple “Hello World” application for neural networks. The XOR operator provides a very simple pattern that most neural networks can easily learn. It is important to know how to structure data for a neural network. Neural networks both accept and return an array of floating point numbers.

This chapter introduced layers and synapses. You saw how they are used to construct a simple neural network. The next chapter will greatly expand on layers and synapses. You will see how to use the various layer and synapse types offered by Encog to construct neural networks.

Questions for Review

1. Explain the role of the input layer, the output layer and hidden layers.

2. What form does the input to a neural network take? What form is the output from a neural network?

3. How does a neural network implement long-term memory? How does a neural network implement short-term memory?

4. Where does Encog store the weight matrix values? Where does Encog store the threshold values?

5. What is the best “general purpose” training method for an Encog neural network?

Terms

Artificial Intelligence

Artificial Neural Network

Biological Neural Network

Black Box

Context Layer

Encog

Encog File

Encog Workbench

Error Rate

Feedforward Neural network

Hidden Layer

Input Layer

Iteration

Layer

LGPL

Long Term Memory

Neural Network

Output Layer

Recurrent Neural Network

Resilient Propagation

Short Term Memory

Synapse

Training

Training Set

XOR Operator




Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.