There are really two decisions that must be made regarding the hidden layers: how many hidden layers to actually have in the neural network and how many neurons will be in each of these layers. We will first examine how to determine the number of hidden layers to use with the neural network.

    Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer. Table 5.1 summarizes the capabilities of neural network architectures with various hidden layers.

Table 5.1: Determining the Number of Hidden Layers

Number of Hidden Layers Result
none Only capable of representing linear separable functions or decisions.
1 Can approximate any function that contains a continuous mapping from one finite space to another.
2 Can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy.

    Deciding the number of hidden neuron layers is only a small part of the problem. You must also determine how many neurons will be in each of these hidden layers. This process is covered in the next section.

The Number of Neurons in the Hidden Layers

    Deciding the number of neurons in the hidden layers is a very important part of deciding your overall neural network architecture. Though these layers do not directly interact with the external environment, they have a tremendous influence on the final output. Both the number of hidden layers and the number of neurons in each of these hidden layers must be carefully considered.

    Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set.

    Using too many neurons in the hidden layers can result in several problems. First, too many neurons in the hidden layers may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. A second problem can occur even when the training data is sufficient. An inordinately large number of neurons in the hidden layers can increase the time it takes to train the network. The amount of training time can increase to the point that it is impossible to adequately train the neural network. Obviously, some compromise must be reached between too many and too few neurons in the hidden layers.

    There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:

  • The number of hidden neurons should be between the size of the input layer and the size of the output layer.
  • The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
  • The number of hidden neurons should be less than twice the size of the input layer.

    These three rules provide a starting point for you to consider. Ultimately, the selection of an architecture for your neural network will come down to trial and error. But what exactly is meant by trial and error? You do not want to start throwing random numbers of layers and neurons at your network. To do so would be very time consuming. Chapter 8, “Pruning a Neural Network” will explore various ways to determine an optimal structure for a neural network.

Comments

neural network structure with constant weights

aynna's picture

The absolute values of the weights determined by the constructive algorithm become larger and larger when the number of nested layers of a convex recursive deletion region increases. The absolute values of the weights also depend on the complexity of the structure of the convex recursive deletion region. If the structure of the convex recursive deletion region is very complicated, the absolute values of the weights determined by the constructive algorithm could be very large. Besides, we still need to use the constructive procedure to get the parameters (weights and thresholds) for the neural networks. In this paper, we propose a simple three-layer network structure to implement the convex recursive deletion regions in which all weights of the second and third layers are all 1's and the thresholds for the nodes in the second layer are pre-determined according to the structures of the convex recursive deletion regions. We also provide the activation function for internet marketing the output node. In brief, all of parameters (weights and activation functions) in the proposed structure are pre-determined and no constructive algorithm is needed for solving the convex recursive deletion region problems. We prove the feasibility of the proposed structure and give an illustrative example to demonstrate how the proposed structure implements the convex recursive deletion regions.


Copyright 2005 - 2010 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright and trademark information.