The next sections will explain each of the activation functions supported by Encog. There are several factors to consider when choosing an activation function. Firstly, the type of neural network you are using may dictate the activation function you must use. Secondly, you should consider if you would like to train the neural network using propagation. Propagation training requires an activation function that provides a derivative. You must also consider the range of numbers you will be dealing with.

The **ActivationBiPolar** activation function is used with neural networks that require bipolar numbers. Bipolar numbers are either **true** or **false**. A **true** value is represented by a bipolar value of 1; a **false** value is represented by a bipolar value of -1. The bipolar activation function ensures that any numbers passed to it are either -1 or 1. The **ActivationBiPolar** function does this with the following code:

if (d[i] > 0) { d[i] = 1; } else { d[i] = -1; }

As you can see the output from this activation is limited to either -1 or 1. This sort of activation function is used with neural networks that require bipolar output from one layer to the next. There is no derivative function for bipolar, so this activation function cannot be used with propagation training.

The **ActivationCompetitive** function is used to force only a select group of neurons to win. The winner is the group of neurons that has the highest output. The outputs of each of these neurons are held in the array passed to this function. The size of the winning group of neurons is definable. The function will first determine the winners. All non-winning neurons will be set to zero. The winners will all have the same value, which is an even division of the sum of the winning outputs.

This function begins by creating an array that will track whether each neuron has already been selected as one of the winners. We also count the number of winners so far.

final boolean[] winners = new boolean[d.length]; double sumWinners = 0;

First, we loop **maxWinners** a number of times to find that number of winners.

for (int i = 0; i < this.maxWinners; i++) { double maxFound = Double.NEGATIVE_INFINITY; int winner = -1;

Now, we must find one winner. We will loop over all of the neuron outputs and find the one with the highest output.

for (int j = 0; j < d.length; j++) {

If this neuron has not already won, and it has the maximum output then it might potentially be a winner, if no other neuron has a higher activation.

if (!winners[j] && (d[j] > maxFound)) { winner = j; maxFound = d[j]; } }

Keep the sum of the winners that were found, and mark this neuron as a winner. Marking it a winner will prevent it from being chosen again. The sum of the winning outputs will ultimately be divided among the winners.

sumWinners += maxFound; winners[winner] = true;

Now that we have the correct number of winners, we must adjust the values for winners and non-winners. The non-winners will all be set to zero. The winners will share the sum of the values held by all winners.

for (int i = 0; i < d.length; i++) { if (winners[i]) { d[i] = d[i] / sumWinners; } else { d[i] = 0.0; }

This sort of an activation function is used with competitive, learning neural networks, such as the Self Organizing Map. This activation function has no derivative, so it cannot be used with propagation training.

The **ActivationGaussian** function is based on the gaussian function. The gaussian function produces the familiar bell-shaped curve. The equation for the gaussian function is shown in Equation 3.1.

**Equation 3.1: The Gaussian Function **

There are three different constants that are fed into the gaussian function. The constant represents the curve’s peak. The constant b represents the position of the curve. The constant c represents the width of the curve.

**Figure 3.1: The Graph of the Gaussian Function**

The gaussian function is implemented in Java as follows.

return this.peak * Math.exp(-Math.pow(x - this.center, 2) / (2.0 * this.width * this.width));

The gaussian activation function is not a commonly used activation function. However, it can be used when finer control is needed over the activation range. The curve can be aligned to somewhat approximate certain functions. The radial basis function layer provides an even finer degree of control, as it can be used with multiple gaussian functions. There is a valid derivative of the gaussian function; therefore, the gaussian function can be used with propagation training.

The **ActivationLinear** function is really no activation function at all. It simply implements the linear function. The linear function can be seen in Equation 3.2.

**Equation 3.2: The Linear Activation Function**

The graph of the linear function is a simple line, as seen in Figure 3.2.

**Figure 3.2: Graph of the Linear Activation Function **

The Java implementation for the linear activation function is very simple. It does nothing. The input is returned as it was passed.

public void activationFunction(final double[] d) { }

The linear function is used primarily for specific types of neural networks that have no activation function, such as the self-organizing map. The linear activation function does not have a derivative, so it cannot be used with propagation training.

The **ActivationLog** activation function uses an algorithm based on the log function. The following Java code shows how this is calculated.

if (d[i] >= 0) { d[i] = BoundMath.log(1 + d[i]); } else { d[i] = -BoundMath.log(1 – d[i]); }

This produces a curve similar to the hyperbolic tangent activation function, which will be discussed later in this chapter. You can see the graph for the logarithmic activation function in Figure 3.3.

**Figure 3.3: Graph of the Logarithmic Activation Function **

The logarithmic activation function can be useful to prevent saturation. A hidden node of a neural network is considered saturated when, on a given set of inputs, the output is approximately 1 or -1 in most cases. This can slow training significantly. This makes the logarithmic activation function a possible choice when training is not successful using the hyperbolic tangent activation function.

As illustrated in Figure 3.3, the logarithmic activation function spans both positive and negative numbers. This means it can be used with neural networks where negative number output is desired. Some activation functions, such as the sigmoid activation function will only produce positive output. The logarithmic activation function does have a derivative, so it can be used with propagation training.

The **ActivationSigmoid** activation function should only be used when positive number output is expected. The **ActivationSigmoid** function will block negative numbers. The equation for the **ActivationSigmoid** function can be seen in Equation 3.3.

**Equation 3.3: The ActivationSigmoid Function**

The fact that the **ActivationSigmoid** function will block negative numbers can be seen in Figure 3.4, which shows the graph of the sigmoid function.

**Figure 3.4: Graph of the ActivationSigmoid Function **

The **ActivationSigmoid** function is a very common choice for feedforward and simple recurrent neural networks. However, you must be sure that the training data does not expect negative output numbers. If negative numbers are required, consider using the hyperbolic tangent activation function.

The **ActivationSIN** activation function is based on the sine function. It is not a commonly used activation function. However, it is sometimes useful for certain data that periodically changes over time. The graph for the **ActivationSIN** function is shown in Figure 3.5.

**Figure 3.5: Graph of the SIN Activation Function **

The **ActivationSIN** function works with both negative and positive values. Additionally, the **ActivationSIN** function has a derivative and can be used with propagation training.

The **ActivationSoftMax** activation function is an activation that will scale all of the input values so that their sum will equal one. The **ActivationSoftMax** activation function is sometimes used as a hidden layer activation function.

The activation function begins by summing the natural exponent of all of the neuron outputs.

double sum = 0; for (int i = 0; i < d.length; i++) { d[i] = BoundMath.exp(d[i]); sum += d[i]; }

The output from each of the neurons is then scaled according to this sum. This produces outputs that will sum to 1.

for (int i = 0; i < d.length; i++) { d[i] = d[i] / sum; }

The **ActivationSoftMax** is generally used in the hidden layer of a neural network or a classification neural network.

The **ActivationTANH** activation function is an activation function that uses the hyperbolic tangent function. The hyperbolic tangent activation function is probably the most commonly used activation function, as it works with both negative and positive numbers. The hyperbolic tangent function is the default activation function for Encog. The equation for the hyperbolic tangent activation function can be seen in Equation 3.4.

**Equation 3.4: The Hyperbolic Tangent Activation Function**

The fact that the hyperbolic tangent activation function accepts positive numbers can be seen in Figure 3.6, which shows the graph of the hyperbolic tangent function.

**Figure 3.6: Graph of the Hyperbolic Tangent Activation Function **

The hyperbolic tangent function that you see above calls the natural exponent function twice. This is an expensive function call. Even using Java's new **Math.tanh** is still fairly slow. We really do not need the exact hyperbolic tangent. An approximation will do. The following code does a fast approximation of the hyperbolic tangent function.

private double activationFunction(final double d) { return -1 + (2/ (1+BoundMath.exp(-2* d ) ) ); }

The hyperbolic tangent function is a very common choice for feedforward and simple, recurrent neural networks. The hyperbolic tangent function has a derivative, so it can be used with propagation training.

Technology:

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer