You are here

Activation Functions

    Most neural networks pass the output of their layers through activation functions. These activation functions scale the output of the neural network into proper ranges. The neural network program in the last section used the sigmoid activation function. The sigmoid activation function is the default choice for the FeedforwardLayer class. It is possible to use others. For example, to use the hyperbolic tangent activation function, the following lines of code would be used to create the layers.

network.addLayer(new FeedforwardLayer(new ActivationTANH(),2)); network.addLayer(new FeedforwardLayer(new ActivationTANH(),3)); network.addLayer(new FeedforwardLayer(new ActivationTANH(),1));

    As you can see from the above code, a new instance of ActivationTANH is created and passed to each layer of the network. This specifies that the hyperbolic tangent should be used, rather than the sigmoid function.

    You may notice that it would be possible to use a different activation function for each layer of the neural network. While technically there is nothing stopping you from doing this, such practice would be unusual.

    There are a total of three activation functions provided:

  • Hyperbolic Tangent
  • Sigmoid
  • Linear

    It is also possible to create your own activation function. There is an interface named ActivationFunction. Any class that implements the ActivationFunction interface can serve as an activation function. The three activation functions provided will be discussed in the following sections.

Using a Sigmoid Activation Function

    A sigmoid activation function uses the sigmoid function to determine its activation. The sigmoid function is defined as follows:

Equation 5.1: The Sigmoid Function

A typical feedforward neural network (single hidden layer).

    The term sigmoid means curved in two directions, like the letter “S.” You can see the sigmoid function in Figure 5.2.

Figure 5.2: The Sigmoid function.

The Sigmoid function.

    One important thing to note about the sigmoid activation function is that it only returns positive values. If you need the neural network to return negative numbers, the sigmoid function will be unsuitable. The sigmoid activation function is implemented in the ActivationFunctionSigmoid class. This class is shown in Listing 5.2.

Listing 5.2: The Sigmoid Activation Function Class

/**
 * Introduction to Neural Networks with Java, 2nd Edition
 * Copyright 2008 by Heaton Research, Inc. 
 * http://www.heatonresearch.com/books/java-neural-2/
 * 
 * ISBN13: 978-1-60439-008-7  	 
 * ISBN:   1-60439-008-5
 *   
 * This class is released under the:
 * GNU Lesser General Public License (LGPL)
 * http://www.gnu.org/copyleft/lesser.html
 */
package com.heatonresearch.book.introneuralnet.neural.activation;

/**
 * ActivationSigmoid: The sigmoid activation function takes on a
 * sigmoidal shape.  Only positive numbers are generated.  Do not
 * use this activation function if negative number output is desired.
 * 
 * @author Jeff Heaton
 * @version 2.1
 */
public class ActivationSigmoid implements ActivationFunction {
	/**
	 * Serial id for this class.
	 */
	private static final long serialVersionUID = 5622349801036468572L;

	/**
	 * A threshold function for a neural network.
	 * @param The input to the function.
	 * @return The output from the function.
	 */
	public double activationFunction(final double d) {
		return 1.0 / (1 + Math.exp(-1.0 * d));
	}
	
	/**
	 * Some training methods require the derivative.
	 * @param The input.
	 * @return The output.
	 */
	public double derivativeFunction(double d) {
		return d*(1.0-d);
	}

}

    As you can see, the sigmoid function is defined inside the activationFunction method. This method was defined by the ActivationFunction interface. If you would like to create your own activation function, it is as simple as creating a class that implements the ActivationFunction interface and providing an activationFunction method.

    The ActivationFunction interface also defines a method named derivativeFunction that implements the derivative of the main activation function. Certain training methods require the derivative of the activation function. Backpropagation is one such method. Backpropagation cannot be used on a neural network that uses an activation function that does not have a derivative. However, a genetic algorithm or simulated annealing could still be used. These two techniques will be covered in the next two chapters.

Using a Hyperbolic Tangent Activation Function

    As previously mentioned, the sigmoid activation function does not return values less than zero. However, it is possible to “move” the sigmoid function to a region of the graph so that it does provide negative numbers. This is done using the hyperbolic tangent function. The equation for the hyperbolic activation function is shown in Equation 5.2.

Equation 5.2: The TANH Function

The Sigmoid function.

    Although this looks considerably more complex than the sigmoid function, you can safely think of it as a positive and negative compatible version of the sigmoid function. The graph for the hyperbolic tangent function is provided in Figure 5.3.

Figure 5.3: The hyperbolic tangent function.

The hyperbolic tangent function.

    One important thing to note about the hyperbolic tangent activation function is that it returns both positive and negative values. If you need the neural network to return negative numbers, this is the activation function to use. The hyperbolic tangent activation function is implemented in the ActivationFunctionTanH class. This class is shown in Listing 5.3.

Listing 5.3: The Hyperbolic Tangent Function Class

/**
 * Introduction to Neural Networks with Java, 2nd Edition
 * Copyright 2008 by Heaton Research, Inc. 
 * http://www.heatonresearch.com/books/java-neural-2/
 * 
 * ISBN13: 978-1-60439-008-7  	 
 * ISBN:   1-60439-008-5
 *   
 * This class is released under the:
 * GNU Lesser General Public License (LGPL)
 * http://www.gnu.org/copyleft/lesser.html
 */
package com.heatonresearch.book.introneuralnet.neural.activation;

/**
 * ActivationTANH: The hyperbolic tangent activation function takes the
 * curved shape of the hyperbolic tangent.  This activation function produces
 * both positive and negative output.  Use this activation function if 
 * both negative and positive output is desired.
 * 
 * @author Jeff Heaton
 * @version 2.1
 */
public class ActivationTANH implements ActivationFunction {

	/**
	 * Serial id for this class.
	 */
	private static final long serialVersionUID = 9121998892720207643L;

	/**
	 * A threshold function for a neural network.
	 * @param The input to the function.
	 * @return The output from the function.
	 */
	public double activationFunction(double d) {
		final double result = (Math.exp(d*2.0)-1.0)/(Math.exp(d*2.0)+1.0);
		return result;
	}
	
	/**
	 * Some training methods require the derivative.
	 * @param The input.
	 * @return The output.
	 */
	public double derivativeFunction(double d) {
		return( 1.0-Math.pow(activationFunction(d), 2.0) );
	}

}

    As you can see, the hyperbolic tangent function is defined inside the activationFunction method. This method was defined by the ActivationFunction interface. The derivativeFunction is also defined to return the result of the derivative of the hyperbolic tangent function.

Using a Linear Activation Function

    The linear activation function is essentially no activation function at all. It is probably the least commonly used of the activation functions. The linear activation function does not modify a pattern before outputting it. The function for the linear layer is given in Equation 5.3.

Equation 5.3: A Linear Function

The hyperbolic tangent function.

    The linear activation function might be useful in situations when you need the entire range of numbers to be output. Usually, you will want to think of your neurons as active or non-active. Because the hyperbolic tangent and sigmoid activation functions both have established upper and lower bounds, they tend to be used more for Boolean (on or off) type operations. The linear activation function is useful for presenting a range. A graph of the linear activation function is provided in Figure 5.4.

Figure 5.4: The linear activation function.

The linear activation function.

    The implementation for the linear activation function is fairly simple. It is shown in Listing 5.4.

Listing 5.4: The Linear Activation Function

/**
 * Introduction to Neural Networks with Java, 2nd Edition
 * Copyright 2008 by Heaton Research, Inc. 
 * http://www.heatonresearch.com/books/java-neural-2/
 * 
 * ISBN13: 978-1-60439-008-7  	 
 * ISBN:   1-60439-008-5
 *   
 * This class is released under the:
 * GNU Lesser General Public License (LGPL)
 * http://www.gnu.org/copyleft/lesser.html
 */
package com.heatonresearch.book.introneuralnet.neural.activation;

import com.heatonresearch.book.introneuralnet.neural.exception.NeuralNetworkError;

/**
 * ActivationLinear: The Linear layer is really not an activation function 
 * at all.  The input is simply passed on, unmodified, to the output.
 * This activation function is primarily theoretical and of little actual
 * use.  Usually an activation function that scales between 0 and 1 or
 * -1 and 1 should be used.
 * 
 * @author Jeff Heaton
 * @version 2.1
 */
public class ActivationLinear implements ActivationFunction {

	/**
	 * Serial id for this class.
	 */
	private static final long serialVersionUID = -5356580554235104944L;

	/**
	 * A threshold function for a neural network.
	 * @param The input to the function.
	 * @return The output from the function.
	 */
	public double activationFunction(final double d) {
		return d;
	}

	/**
	 * Some training methods require the derivative.
	 * @param The input.
	 * @return The output.
	 */
	public double derivativeFunction(double d) {
		throw new NeuralNetworkError("Can't use the linear activation function where a derivative is required.");
	}

}

    As you can see, the linear activation function does no more than return what it is given. The derivative of the linear function is 1. This is not useful for training, thus the derivativeFunction for the linear activation function throws an exception. You cannot use backpropagation to train a neural network that makes use of the linear function.

Technology: 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer