You are here

The Number of Hidden Layers

    There are really two decisions that must be made regarding the hidden layers: how many hidden layers to actually have in the neural network and how many neurons will be in each of these layers. We will first examine how to determine the number of hidden layers to use with the neural network.

    Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer. Table 5.1 summarizes the capabilities of neural network architectures with various hidden layers.

Table 5.1: Determining the Number of Hidden Layers

Number of Hidden Layers Result
none Only capable of representing linear separable functions or decisions.
1 Can approximate any function that contains a continuous mapping from one finite space to another.
2 Can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy.

    Deciding the number of hidden neuron layers is only a small part of the problem. You must also determine how many neurons will be in each of these hidden layers. This process is covered in the next section.

The Number of Neurons in the Hidden Layers

    Deciding the number of neurons in the hidden layers is a very important part of deciding your overall neural network architecture. Though these layers do not directly interact with the external environment, they have a tremendous influence on the final output. Both the number of hidden layers and the number of neurons in each of these hidden layers must be carefully considered.

    Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set.

    Using too many neurons in the hidden layers can result in several problems. First, too many neurons in the hidden layers may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. A second problem can occur even when the training data is sufficient. An inordinately large number of neurons in the hidden layers can increase the time it takes to train the network. The amount of training time can increase to the point that it is impossible to adequately train the neural network. Obviously, some compromise must be reached between too many and too few neurons in the hidden layers.

    There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:

  • The number of hidden neurons should be between the size of the input layer and the size of the output layer.
  • The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
  • The number of hidden neurons should be less than twice the size of the input layer.

    These three rules provide a starting point for you to consider. Ultimately, the selection of an architecture for your neural network will come down to trial and error. But what exactly is meant by trial and error? You do not want to start throwing random numbers of layers and neurons at your network. To do so would be very time consuming. Chapter 8, “Pruning a Neural Network” will explore various ways to determine an optimal structure for a neural network.

Technology: 

Comments

jnterry's picture

Those layers are very important to me. I will require them for my new project. Thanks for sharing.

jeffheaton's picture

You are welcome

elisagirola's picture

Hi, I have a question for you. I hope you can help me with this.
I’m using a 2 layer backpropagation NN to perform a series of pairwise comparisons. I’m using the error in the testing set as a measure of similarities between 2 groups of data (i.e. the more similar the 2 groups the higher the error in the network and vice versa). I’m currently trying to decide the number of neurons in the hidden layer and saw your post. Since I’m not trying to generalize my network, but instead I’m trying to use it as a measure of similarities, do you think it makes sense to overfit the network using a higher number of neurons. In particular it looks like my network works better at calculating similarities when the number of neurons is twice the size of the input layer.

Cheers

aiom's picture

great book!! thanks for sharing

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer