A Feed Forward Neural Network | Heaton Research

A Feed Forward Neural Network

Teaser

Get the entire book!
Introduction to Neural Networks with Java

A "feed forward" neural network is similar to the types of neural networks that we are ready examined. Just like many other neural network types the feed forward neural network begins with an input layer. This input layer must be connected to a hidden layer. This hidden layer can then be connected to another hidden layer or directly to the output layer. There can be any number of hidden layers so long as at least one hidden layer is provided. In common use most neural networks will have only one hidden layer. It is very rare for a neural network to have more than two hidden layers. We will now examine, in detail, and the structure of a "feed forward neural network".

The Structure of a Feed Forward Neural Network

A "feed forward" neural network differs from the neural networks previously examined. Figure 5.1 shows a typical feed forward neural network with a single hidden layer.


Figure 5.1: A typical feed forward neural network with a single hidden layer

It is also possible to have more than 1 layer of hidden neurons. Figure 5.2 shows a feed forward neural network that has two hidden layers.


Figure 5.2: A typical feed forward neural network with two hidden layers

As previously mentioned, neural networks with more than two hidden layers are less common.

Choosing your Network Structure

As we saw the previous section there are many ways that feed forward neural networks can be constructed. You must decide how many neurons will be inside the input and output layers. You must also decide how many hidden layers you're going to have, as well as how many neurons will be in each of these hidden layers.

There are many techniques for choosing these parameters. In this section we will cover some of the general "rules of thumb" that you can use to assist you in these decisions. Rules of thumb will only take you so far. In nearly all cases some experimentation will be required to determine the optimal structure for your "feed forward neural network". Their many books dedicated entirely to this topic. For a thorough discussion on structuring feed forward neural networks you should refer to the book "Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks" (MIT Press, 1999).

The Input Layer

The input layer to the neural network is the conduit through which the external environment presents a pattern to the neural network. Once a pattern is presented to the input later of the neural network the output layer will produce another pattern. In essence this is all the neural network does. The input layer should represent the condition for which we are training the neural network for. Every input neuron should represent some independent variable that has an influence over the output of the neural network.

It is important to remember that the inputs to the neural network are floating point numbers. These values are expressed as the primitive Java data type "double". This is not to say that you can only process numeric data with the neural network. If you wish to process a form of data that is non-numeric you must develop a process that normalizes this data to a numeric representation. In chapter 7, "Applying Pattern Recognition" I will show you how to communicate graphic information to a neural network.

The Output Layer

The output layer of the neural network is what actually presents a pattern to the external environment. Whatever patter is presented by the output layer can be directly traced back to the input layer. The number of a output neurons should directly related to the type of work that the neural network is to perform.

To consider the number of neurons to use in your output layer you must consider the intended use of the neural network. If the neural network is to be used to classify items into groups, then it is often preferable to have one output neurons for each groups that the item is to be assigned into. If the neural network is to perform noise reduction on a signal then it is likely that the number of input neurons will match the number of output neurons. In this sort of neural network you would one day he would want the patterns to leave the neural network in the same format as they entered.

For a specific example of how to choose the numbers of input and output neurons consider a program that is used for optical character recognition, or OCR. This is the example program that will be presented in chapter 7. To determine the number of neurons used for the OCR example we will first consider the input layer. The number of input neurons that we will use is the number of pixels that might represent any given character. Characters processed by this program are normalized to universal size that is represented by a 5x7 grid. A 5x7 grid contains a total of 35 pixels. The optical character recognition program therefore has 35 input neurons.

The number of output neurons used by the OCR program will vary depending on how many characters the program has been trained for. The default training file that is provided with the optical character recognition program is trained to recognize 26 characters. As a result using this file the neural network would have 26 output neurons. Presenting a pattern to the input neurons will fire the appropriate output neuron that corresponds to the letter that the input pattern corresponds to.

The Number of Hidden Layers

There are really two decisions that must be made with regards to the hidden layers. The first is how many hidden layers to actually have in the neural network. Secondly, you must determine how many neurons will be in each of these layers. We will first examine how to determine the number of hidden layers to use with the neural network.

Neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. Further for many practical problems there's no reason to use any more than one hidden layer. Problems that require two hidden layers are rarely encountered. Differences between the numbers of hidden layers are summarized in Table 5.1.

Table 5.1: Determining the number of hidden layers

Number of Hidden Layers Result
none Only capable of representing linear separable functions or decisions.
1 Can approximate arbitrarily while any functions which contains a continuous mapping from one finite space to another.
2 Represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy.

Just deciding the number of hidden neuron layers is only a small part of the problem. You must also determine how many neurons will be in each of these hidden layers. This process is covered in the next section.

The Number of Neurons in the Hidden Layers

Deciding the number of hidden neurons in layers is a very important part of deciding your overall neural network architecture. Though these layers do not directly interact with the external environment these layers have a tremendous influence on the final output. Both the number of hidden layers and number of neurons in each of these hidden layers must be considered.

Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set.

Using too many neurons in the hidden layers can result in several problems. First too many neurons in the hidden layers may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. A second problem can occur even when there is sufficient training data. An inordinately large number of neurons in the hidden layers can increase the time it takes to train the network. The amount of training time can increase enough so that it is impossible to adequately train the neural network. Obviously some compromise must be reached between too many and too few look neurons in the hidden layers.

There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers. Some of them are summarized as follows.

  • The number of hidden neurons should be in the range between the size of the input layer and the size of the output layer.
  • The number of hidden neurons should be 2/3 of the input layer size, plus the size of the output layer.
  • The number of hidden neurons should be less than twice the input layer size.

These three rules are only starting points that you may want to consider. Ultimately the selection of the architecture of your neural network will come down to trial and error. But what exactly is meant by trial and error. You do not want to start throwing random layers and numbers of neurons at your network. To do so would be very time-consuming. There are two methods they can be used to organize your trial and error search for the optimum network architecture.

There are two trial and error approaches that you may use in determining the number of hidden neurons are the "forward" and "backward" selection methods. The first method, the "forward selection method", begins by selecting a small number of hidden neurons. This method usually begins with only two hidden neurons. Then the neural network is trained and tested. The number of hidden neurons is then increased and the process is repeated so long as the overall results of the training and testing improved. The "forward selection method" is summarized in figure 5.3.


Figure 5.3: Selecting the number of hidden neurons with forward selection

The second method, the "backward selection method", begins by using a large number of hidden neurons. Then the neural network is trained and tested. This process continues until about the performance improvement of the neural network is no longer significant. To backward selection method is summarized in figure 5.4.


Figure 5.4: Selecting the number of hidden neurons with backward selection

One additional method that can be used to reduce the number of hidden neurons is called pruning. In the simplest sense pruning involves evaluating the weighted connections between the layers. If the network contains any hidden neurons which contains only zero weighted connections, they can be removed. Pruning is a very important concept for neural networks and will be discussed in Chapter 11, "Pruning Neural Networks".

Copyright 2005-2009 by Heaton Research, Inc.