You are here

Nguyen-Widrow and other Neural Network Weight/Threshold Initialization Methods

Neural networks learn by adjusting numeric values called
weights and thresholds.  A weight specifies how strong of a connection exists
between two neurons.  A threshold is a value, stored on each neuron that either
adds or subtracts from the incoming weights from other neurons.  Training is
the process by which these weights and thresholds are adjusted to cause the
neural network to produce useful results. 

Usually neural network weights and thresholds are simply set
to random numbers.  This provides the training algorithms with a good starting
point to work towards a solution.  Pure random numbers are not always the most
desirable starting point.  Performing some adjustment to the random numbers can
provide for quicker training.   This article will compare some of the different
randomization techniques available for neural networks.  This article will
compare the following weight randomization techniques. 

  • "Hard" Range Randomization
  • Gaussian Random Numbers
  • Nguyen-Widrow Randomization
  • Fan-In Randomization

Once these methods have been compared, the article will
conclude by comparing the performance of the various weight initialization
techniques.  Before we can compare these techniques, we first need a way to
visualize the weights and threshold values.  We will view them using a
histogram.  The Encog Workbench can display a histogram allowing you to
visualize the weights and thresholds.  We will make use of these histograms in
this article.

This article will examine a neural network trained to
predict trends in the stockmarket.  It contains 7 input neurons, to hold 7 days
of training price information.  One single output neuron is provided to predict
the next day’s percentage movement.  Finally, 25 hidden neurons are provided to
help detect trends in the chart.  This neural network is shown here.

Neural Network

“Hard” Range Randomization

We will begin with the most simple of neural network weight
and threshold initialization techniques.  In this technique, called the “hard”
range randomization, every weight and threshold is simply assigned a random
number, in a specific range.  This is the type of weight initialization used by
most simple neural network examples.  Figure 1 shows our predictive neural
network randomized in this way.

Figure 1: Range Random
Range Random

It is a basically flat graph.  As more weights were added,
by adding additional neurons, it would flatten further.  If the random number
generator is doing a good job, the graph should flatten further as additional
data is provided.

The above random weights and thresholds provide a good
starting point for training.  The neural network is then trained with 20 years
of market data.  This allows it to detect some patters for price movement in
the security it was trained for.  Once a network has been trained, the
histogram looks considerably different.  Figure 2 shows a trained neural
network.

Figure 2: A Trained Neural Network
A Trained Neural Network

Trained neural almost always look more organized than their
untrained counterparts.  The weights and thresholds are in a much narrower
range.  In Figure 2 you can see that the majority of the weights and thresholds
are near zero.  The number of weights decreases rapidly in both directions.

The question of random weight initialization is what
proportion of weights provides the best starting point for getting towards a
trained neural network.

Gaussian Random Weight Initialization

The weights of a trained neural network are clustered about
the origin and taper off quickly on each side.  This is a shape somewhat
similar to a narrow Gaussian function.  This has led some to use a random
Gaussian distribution as a weight initialization technique for neural networks.

Figure 3 shows a neural network that was initialized using a
Gaussian random number generator.

Figure 3: Gaussian Neural Network Weights
Gaussian Neural Network Weights

You can see that the above random numbers fall into
something of a Gaussian curve.  Additional weight values, which would result
from a larger neural network, would produce a closer representation to the
Gaussian curve.  Of course, it will never exactly fit the Gaussian curve,
because they are random numbers.  A Gaussian random number generator produces
zero the largest number of times and a tapering number of other numbers as we travel
further from the origin. 

Gaussian random numbers have a variety of uses in science
and engineering beyond artificial intelligence programming.  Because of this,
there are several methods for producing them.  Encog makes use of a system
called the "Box Muller" transformation.  For more information about
the Box Muller Transformation, refer to the following Wikipedia article.

http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

The Gaussian random initialization does produce more
trainable neural networks than simple “hard” range random numbers.  The results
of the Gaussian initialization will be compared with the other methods at the
end of this article.

Nguyen-Widrow Randomization

Another neural network weight initialization technique is
called the Nguyen-Widrow technique.  This technique is one of the most
effective neural network weight initialization methods available.  Because of
this, the Encog Neural Network Framework uses this technique by default.

This technique was invented by Derrick Nguyen and Bernard
Widrow.   It was first introduced in their paper “Improving the learning speed
of 2-layer neural networks

by choosing initial values of the adaptive weights” in the
Proceedings of the International Joint Conference on Neural Networks, 3:21–26,
1990.

A neural network that was randomized with this approach is
shown in Figure 4.

Figure 4: Nguyen-Widrow Randomization
Nguyen-Widrow Randomization

Shape-wise, an Nguyen-Widrow Randomization typically looks
somewhat like a wide and steep Gaussian curve.   Near the edges of the
distribution the random counts fall off quickly. 

To implement a Nguyen-Widrow randomization first initialize
the neural network with random weight values in a specific range.  This is
exactly the same technique as was described earlier in this article for the
“hard” ranged random numbers.

Next, calculate a value beta, as follows.

The variable “h” represents the number of hidden neurons,
whereas the variable “i” represents the number of input neurons.

Next, calculate the Euclidean norm for all of the weights on
a layer.  This is calculated as follows.

Once the beta and norm values have been calculate, the
random numbers can be adjusted.  The equation below shows how weights are adjusted
using the previously calculated

The same equation is used to adjust the thresholds.  Once
the weights and thresholds have been adjusted the neural network is ready for
training.

Fan-In Weight Randomization

Another technique, similar Nguyen-Widrow Randomization, is
called the “Fan-In Weight Randomization”.  This technique was introduced by
Simon Haykin in Chapter 6.7 of “Neural Networks - A Comprehensive Foundation”. 
It is an effective technique, but does not generally provide nearly as good of
results as the Nguyen-Widrow Randomization method.

Figure 5 shows a neural network that was randomized using
the Fan-in Weight Randomization method.

Figure 5: Fan-In Randomization Method
Fan-In Randomization Method

The weights are initialized using the following formula.

The minimum and maximum range for the random numbers must be
specified.  The variable “r” represents the number of neurons in the layer
being randomized.  The variable “E” represents a random number between -1 and
+1.

Evaluating the Randomization Techniques

The following Java application evaluates the performance for
each of these initialization techniques.  The program begins by initializing
the weights of the neural network by one of the previously discussed means. 
The 50 iterations of resilient propagation training is performed, teaching the
neural network to perform as an XOR operator.  The drop in error for these 50
iterations is recorded.  The entire process is repeated 1,000 times and the
average error drop is reported.  The following is the output from this program.

Range random: 0.8657535921007118
Nguyen-Widrow: 0.9764568786444707
Fan-In: 0.8892070780666499
Gaussian: 0.9075669364869697

As you can see the Nguyen-Widrow randomizer performed the best. 
Because of this, Nguyen-Widrow is the default randomizer used by Encog. A C# version of this example is provided with the Encog C# examples download.

public final void randomize(final MLMethod method) {
		
		if( !(method instanceof BasicNetwork) ) {
			throw new EncogError("Ngyyen Widrow only works on BasicNetwork.");
		}
		
		BasicNetwork network = (BasicNetwork)method;

		new RangeRandomizer(getMin(), getMax()).randomize(network);

		int hiddenNeurons = network.getLayerNeuronCount(1);

		// can't really do much, use regular randomization
		if (hiddenNeurons 
Technology: 
Company: 
Calais Document Category: 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer