Examining the Back Propagation Process | Heaton Research

Examining the Back Propagation Process

Get the entire book!
Introduction to Neural Networks with Java

You have now seen how to calculate the output for a feed forward neural network. You have seen both the mathematical equations and the Java implementation. As we examined how to calculate the final values for the network we used the connection weights and bias values to determine the final result. You may be wondering where these values actually came from.

The values contained in the weight matrix and bias variables were determined using the back propagation algorithm. This is a very useful algorithm for training neural networks. The back propagation algorithm works by running the neural network just as we did when doing a recognition, as shown in the previous section. The main difference is we are presenting the neural network with training data. As each item of training data is presented to the neural network and error is calculated between the actual output of the neural network, and the output that was expected (and specified in the training set). The weights and bias are then modified so that there is a greater chance of the network returning the correct result when the network is next presented with the same input.

Backpropagation is a very common method for training multilayer feed forward networks. Backpropagation can be used with any feed-forward network that uses a threshold function which is differentiable.

It is this derivative function that we will use during training. It is not necessary that you understand Calculus or how to take the derivative of an equation to work with this chapter. If you are using one of the common threshold functions you can simply get the threshold function from a chart. Appendix E, Neuron layer types contains a listing of each of the common threshold functions and the derivatives of each.

To train the neural network a method must be determined to calculate the error. As the neural network is trained, the net is presented with samples from the training set. The result obtained from the neural network is then compared with the anticipated result that is part of the training set. The degree to which the output from the neural network matches this anticipated output is the error.

To train the neural network, we must try to minimize this error. To minimize the error the neuron connection weights and biases must be modified. We must define a function that will calculate the error of the neural network. This error function must be differentiable. Because the network uses a differential threshold function the activations of the output neurons can be thought of as differentiable functions of the input, weights and bias. If the error function is also differentiable error, such as the "sum of square" error function the error function itself is a differentiable function of the these weights. This allows us to evaluate the derivative of the error using the weights. Then using these derivatives we fine weights and bias that will minimize the error function.

There are several ways that weights that minimize the error function can be found. The most popular is by using the gradient descent method. The algorithm that evaluates the derivative of the error function is known as backpropagation, because it propagates the errors backward through the network. In the next section we will walk through how JOONE implements the backpropagation algorithm. As we step through this process you will see how the backpropagation algorithm actually works.

Implementing Back Propagation

I will now show you how the JOONE neural network implements back propagation training. The training process uses many of the same methods as the recognition process that we just evaluated. Infact the back propagation method works by first running a recognition against the training data and then adjusting the weights and biases to improve the error.

You can see this process by examining the Layer.run method, which is shown in Listing 5.1. We already examined the first part of the Layer.run method in the previous section. It is the second half of the Layer.run method that is responsible for providing training for the neural network. The second half of the Layer.run method can be thought of as the main loop for the training process. It is this section of code that will be ran against each item in the training set, and the training data will be ran repeatedly until the error of the neural network falls within an acceptable level. The Layer.run method first checks to see if the neural network is in training mode.

    if ( step != -1 )
      // Checks if the next step is a learning step
      m_learning = monitor.isLearningCicle(step);
    else
      // Stops the net
      running = false;

To determine if we are learning we examine the step variable. If there is no current step then we are not training.

    if ( (m_learning) && (running) ) {  // Learning

If we are infact learning then we must calculate the gradient inputs. The concept of gradient was discussed earlier in this chapter. For now we simply allocate an array large enough to hold the gradient values.

      gradientInps = new double[dimO];

Next we call the fireRevGet method.

      fireRevGet();

The fireRevGet method is called to.

      backward(gradientInps);
      m_pattern = new Pattern(gradientOuts);
      m_pattern.setCount(step);

Then the pattern is

      fireRevPut(m_pattern);
    }
  }  // END while (running = false)

The code that we just examined implements back propagation learning from a high level. Next we will examine the individual methods that were called by the Layer.run method to see how the learning actually takes place.

Listing 5.5: The Layer.fireRevGet Method

protected void fireRevGet() {
  if ( aOutputPatternListener == null )
    return;
  double[] patt;
  int currentSize = aOutputPatternListener.size();
  OutputPatternListener tempListener = null;
  for ( int index = 0; index < currentSize; index++ ){
    tempListener = (OutputPatternListener)aOutputPatternListener.elementAt(index);
    if ( tempListener != null ) {
      m_pattern = tempListener.revGet();
      if ( m_pattern != null ) {
        patt = m_pattern.getArray();
        if ( patt.length != gradientInps.length )
          gradientInps = new double[patt.length];
        sumBackInput(patt);
      }
    };
  };
}

The Layer.fireRevGet method is very similar to the fireFwdGetMethod in that they are both used to sum the patterns obtained from multiple levels into one. In most cases there will only be one layer that you are summing. This is the case with the XOR example. This summation process is assitsted by the Synapse.sumBackInput method that is shown in Listing 5.6.

Listing 5.6: The Synapse.sumBackInput Method

protected void sumBackInput(double[] pattern) {
  int x;
  int n = getRows();
  for ( x=0; x < n; ++x )
    gradientInps[x] += pattern[x];

}

As you can see the Synapse.sumBackInput method eseentially just sums every element of each pattern that it is passed. This is a cumulative effect as the sumBackInput method is called repeatedly.

Once the fireRevPut method completes it returns back to the Layer.run method. The next method that the Layer.run method calls is the SigmoidLayer.backward method that is shown in Listing 5.7.

Listing 5.7: The SigmoidLayer.backward Method

public void backward(double[] pattern) {
  super.backward(pattern);
  double dw, absv;
  int x;
  int n = getRows();
  for ( x = 0; x < n; ++x ) {
    gradientOuts[x] = pattern[x] * outs[x] * (1 - outs[x]);
    // Adjust the bias
    if ( getMomentum() < 0 ) {
      if ( gradientOuts[x] < 0 )
        absv = -gradientOuts[x];
      else
        absv = gradientOuts[x];
      dw = getLearningRate() * gradientOuts[x] + absv * bias.delta[x][0];
    } else
      dw = getLearningRate() * gradientOuts[x] + getMomentum() * bias.delta[x][0];
    bias.value[x][0] += dw;
    bias.delta[x][0] = dw;
  }
}

This method is where the much of the training actually takes place. It is here that gradient output and new bias values will be calculated. The weights will be adjusted later. Next the other layers much be given a chance to update and train. To do this the layer must now pass control to the synapse. It is here, in the synapse that the connection weights will be updated. To pass control to the synapse the Layer.fireRevPut method is called. The fireRevPut method will call all synapses that are connected to this layer. This method can be seen in Listing 5.8.

Listing 5.8: The Layer.fireRevPut Method

protected void fireRevPut(Pattern pattern) {
  if ( aInputPatternListener == null ) {
    return;
  };
  int currentSize = aInputPatternListener.size();
  InputPatternListener tempListener = null;
  for ( int index = 0; index < currentSize; index++ ){
    tempListener = (InputPatternListener)aInputPatternListener.elementAt(index);
    if ( tempListener != null ) {
      tempListener.revPut((Pattern)pattern.clone());
    };
  };
}

As you can see from the above listing the fireRevPut method will loop through each input synapse that is connected. You may notice that we are passing data to our own input synapse. It may seem backward to pass data to your inputs, but that is exactly the pattern that back propagation follows. The revPut method that is called in each of the input synapses is shown in Listing 5.9.

Listing 5.9: The Synapse.revPut Method

public synchronized void revPut(Pattern pattern) {
  if ( isEnabled() ) {
    count = pattern.getCount();

    while ( bitems > 0 ) {
      try {
        wait();
      } catch ( InterruptedException e )
      { 
        e.printStackTrace();
        return;
      }
    }
    m_pattern = pattern;
    backward(pattern.getArray());
    ++bitems;
    notifyAll();
  }
}

As you can see the Synapse.revPut method is designed to be synchronized. This allows triaining to take advantage of a multi-processor computer or a distributed environment while training, as the program can operate concurrently.

The Synapse.revPut method then makes any adjustments needed to the neuron biases by calling the Synapse.backward method. Finally once all this is done the revPut method calls notifyAll() to inform any thread that might be waiting on data that data is available. You will now be shown how the Synapse.backward method works.

Listing 5.10: The Synapse.backward Method

protected void backward(double[] pattern) {
  int x;
  int y;
  double s, dw;
  int m_rows = getInputDimension();
  int m_cols = getOutputDimension();

  // Weights adjustement
  for ( x=0; x < m_rows; ++x ) {
    double absv;
    s = 0;
    for ( y=0; y < m_cols; ++y ) {
      s += pattern[y] * array.value[x][y];
      if ( getMomentum() < 0 ) {
        if ( pattern[y] < 0 )
          absv = -pattern[y];
        else
          absv = pattern[y];
        dw = getLearningRate() * pattern[y] * inps[x] + absv * array.delta[x][y];
      } else
        dw = getLearningRate() * pattern[y] * inps[x] + getMomentum() * array.delta[x][y];
      array.value[x][y] += dw;
      array.delta[x][y] = dw;
    }
    bouts[x] = s;
  }
}

The Synapse.backward method very closely parallels the SigmoidLayer.backward layer. Both layers mathematically adjust the weights based on the back propagation training algorithm. Only this time we are modifying the connection weights, not the biases of the neurons.

Copyright 2005-2008 by Heaton Research, Inc.