Neural networks learn by adjusting numeric values called

weights and thresholds. A weight specifies how strong of a connection exists

between two neurons. A threshold is a value, stored on each neuron that either

adds or subtracts from the incoming weights from other neurons. Training is

the process by which these weights and thresholds are adjusted to cause the

neural network to produce useful results.

Usually neural network weights and thresholds are simply set

to random numbers. This provides the training algorithms with a good starting

point to work towards a solution. Pure random numbers are not always the most

desirable starting point. Performing some adjustment to the random numbers can

provide for quicker training. This article will compare some of the different

randomization techniques available for neural networks. This article will

compare the following weight randomization techniques.

- "Hard" Range Randomization
- Gaussian Random Numbers
- Nguyen-Widrow Randomization
- Fan-In Randomization

Once these methods have been compared, the article will

conclude by comparing the performance of the various weight initialization

techniques. Before we can compare these techniques, we first need a way to

visualize the weights and threshold values. We will view them using a

histogram. The Encog Workbench can display a histogram allowing you to

visualize the weights and thresholds. We will make use of these histograms in

this article.

This article will examine a neural network trained to

predict trends in the stockmarket. It contains 7 input neurons, to hold 7 days

of training price information. One single output neuron is provided to predict

the next day’s percentage movement. Finally, 25 hidden neurons are provided to

help detect trends in the chart. This neural network is shown here.

We will begin with the most simple of neural network weight

and threshold initialization techniques. In this technique, called the “hard”

range randomization, every weight and threshold is simply assigned a random

number, in a specific range. This is the type of weight initialization used by

most simple neural network examples. Figure 1 shows our predictive neural

network randomized in this way.

**Figure 1: Range Random**

It is a basically flat graph. As more weights were added,

by adding additional neurons, it would flatten further. If the random number

generator is doing a good job, the graph should flatten further as additional

data is provided.

The above random weights and thresholds provide a good

starting point for training. The neural network is then trained with 20 years

of market data. This allows it to detect some patters for price movement in

the security it was trained for. Once a network has been trained, the

histogram looks considerably different. Figure 2 shows a trained neural

network.

**Figure 2: A Trained Neural Network**

Trained neural almost always look more organized than their

untrained counterparts. The weights and thresholds are in a much narrower

range. In Figure 2 you can see that the majority of the weights and thresholds

are near zero. The number of weights decreases rapidly in both directions.

The question of random weight initialization is what

proportion of weights provides the best starting point for getting towards a

trained neural network.

The weights of a trained neural network are clustered about

the origin and taper off quickly on each side. This is a shape somewhat

similar to a narrow Gaussian function. This has led some to use a random

Gaussian distribution as a weight initialization technique for neural networks.

Figure 3 shows a neural network that was initialized using a

Gaussian random number generator.

**Figure 3: Gaussian Neural Network Weights**

You can see that the above random numbers fall into

something of a Gaussian curve. Additional weight values, which would result

from a larger neural network, would produce a closer representation to the

Gaussian curve. Of course, it will never exactly fit the Gaussian curve,

because they are random numbers. A Gaussian random number generator produces

zero the largest number of times and a tapering number of other numbers as we travel

further from the origin.

Gaussian random numbers have a variety of uses in science

and engineering beyond artificial intelligence programming. Because of this,

there are several methods for producing them. Encog makes use of a system

called the "Box Muller" transformation. For more information about

the Box Muller Transformation, refer to the following Wikipedia article.

http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

The Gaussian random initialization does produce more

trainable neural networks than simple “hard” range random numbers. The results

of the Gaussian initialization will be compared with the other methods at the

end of this article.

Another neural network weight initialization technique is

called the Nguyen-Widrow technique. This technique is one of the most

effective neural network weight initialization methods available. Because of

this, the Encog Neural Network Framework uses this technique by default.

This technique was invented by Derrick Nguyen and Bernard

Widrow. It was first introduced in their paper “Improving the learning speed

of 2-layer neural networks

by choosing initial values of the adaptive weights” in the

Proceedings of the International Joint Conference on Neural Networks, 3:21–26,

1990.

A neural network that was randomized with this approach is

shown in Figure 4.

**Figure 4: Nguyen-Widrow Randomization**

Shape-wise, an Nguyen-Widrow Randomization typically looks

somewhat like a wide and steep Gaussian curve. Near the edges of the

distribution the random counts fall off quickly.

To implement a Nguyen-Widrow randomization first initialize

the neural network with random weight values in a specific range. This is

exactly the same technique as was described earlier in this article for the

“hard” ranged random numbers.

Next, calculate a value beta, as follows.

The variable “h” represents the number of hidden neurons,

whereas the variable “i” represents the number of input neurons.

Next, calculate the Euclidean norm for all of the weights on

a layer. This is calculated as follows.

Once the beta and norm values have been calculate, the

random numbers can be adjusted. The equation below shows how weights are adjusted

using the previously calculated

The same equation is used to adjust the thresholds. Once

the weights and thresholds have been adjusted the neural network is ready for

training.

Another technique, similar Nguyen-Widrow Randomization, is

called the “Fan-In Weight Randomization”. This technique was introduced by

Simon Haykin in Chapter 6.7 of “Neural Networks - A Comprehensive Foundation”.

It is an effective technique, but does not generally provide nearly as good of

results as the Nguyen-Widrow Randomization method.

Figure 5 shows a neural network that was randomized using

the Fan-in Weight Randomization method.

**Figure 5: Fan-In Randomization Method**

The weights are initialized using the following formula.

The minimum and maximum range for the random numbers must be

specified. The variable “r” represents the number of neurons in the layer

being randomized. The variable “E” represents a random number between -1 and

+1.

The following Java application evaluates the performance for

each of these initialization techniques. The program begins by initializing

the weights of the neural network by one of the previously discussed means.

The 50 iterations of resilient propagation training is performed, teaching the

neural network to perform as an XOR operator. The drop in error for these 50

iterations is recorded. The entire process is repeated 1,000 times and the

average error drop is reported. The following is the output from this program.

Range random: 0.8657535921007118 Nguyen-Widrow: 0.9764568786444707 Fan-In: 0.8892070780666499 Gaussian: 0.9075669364869697

As you can see the Nguyen-Widrow randomizer performed the best.

Because of this, Nguyen-Widrow is the default randomizer used by Encog. A C# version of this example is provided with the Encog C# examples download.

public final void randomize(final MLMethod method) { if( !(method instanceof BasicNetwork) ) { throw new EncogError("Ngyyen Widrow only works on BasicNetwork."); } BasicNetwork network = (BasicNetwork)method; new RangeRandomizer(getMin(), getMax()).randomize(network); int hiddenNeurons = network.getLayerNeuronCount(1); // can't really do much, use regular randomization if (hiddenNeurons

Categories:

Technology:

Company:

Calais Document Category:

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer