Introduction to Neural Networks for Java, Session 4
| Course Name | Introduction to Neural Networks for Java |
| Instructor | jeffheaton |
| Session Title | Feedforward Backpropagation Neural Networks |
| Session Number | 4 |
Session Material
In this class session we learned about the feedforward back propagation neural network. Feedforward neural networks are one class of neural network. Backpropagation refers to a common method by which these networks can be trained. Training is the process by which the weight matrix of a neural network is adjusted automatically to produce desirable results. Though backpropagation is commonly used with feedforward neural networks, it is by no means the only training method available for the feedforward neural network. In this course, we will learn about two other methods that can be used to train a feedforward neural network. There are other means as well, beyond the scope of this course.
Activation Functions
Feedforward neural networks make use of activation functions. The activation function is used to scale the output of a layer of a neural network. Activation functions are simply a mathematical formula that the output of a neural network layer is presented to. In this course we will deal with three different activation functions. Linear Activation Function – Rarely used, just a simple linear function that returns the same number as was passed in. Really, the linear activation function is no activation function at all, as it will not change the numbers passed into it.
- Linear Activation Function – A simple linear function that returns the same number as was passed to it. The same as having no activation function at all.
- Sigmoid Activation Function – The sigmoid activation function will scale the outputs to the range between 0 and 1. All numbers passed into the sigmoid activation function will be converted to positive numbers.
- Hyperbolic Tangent Activation Function – The hyperbolic tangent activation function will scale the outputs to the range between -1 and 1. The hyperbolic tangent function should be used when you have both upper and lowercase numbers that must be presented to the neural network.
How Many Hidden Layers to Choose
Feedforward neural networks can have one or more hidden layers. To solve more complex problems a hidden layer is usually required. Sometimes adding a second hidden layer can enhance the processing power of a neural network. Adding too many layers or hidden neurons can also detract somewhat from the trainability of the neural network. It will take some experimentation to determine a good number of hidden neurons.
There are several “rules of thumb” that can be used to determine a good number of hidden layers. A neural network with no hidden layers are only capable of representing linear separable functions or decisions. A neural network with one hidden layer can approximate any function that contains a continuous mapping from one finite space to another. A neural network with two hidden layers can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy.
How Many Neurons in the Hidden Layers
In addition to deciding the number of hidden layers to use, you must also consider the number of neurons to place inside of each of these hidden layers. It will often come down to trial and error to determine a workable number of hidden neurons to use.
There are several rules of thumb that can be applied to help determine the number of neurons to place in the hidden layer. The number of hidden neurons should be between the size of the input layer and the size of the output layer. The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer. The number of hidden neurons should be less than twice the size of the input layer.
Using Backpropagation
Backpropagation works by calculating the overall error rate of a neural network. The output layer is then analyzed to see the contribution of each of the neurons to that error. The neurons weights and threshold values are then adjusted, according to how much each neuron contributed to the error, to minimize the error next time. There are two training parameters that can be passed to the backpropagation algorithm to customize its output. These parameters are listed here.
- Learning Rate – The learning rates specifies to what degree the calculated changes will be made to the neural networks weight matrix and threshold values.
- Momentum – Determines how much influence the previous iterations learning will have on the current iteration’s. To disable momentum, set this parameter to zero. Momentum can be useful to prevent a training algorithm from getting trapped in a local minimum.








