Elliott Activation Function
The Elliott Activation Function is higher-speed approximation of the Hyperbolic Tangent Activation Function and Sigmoid activation functions. Most neural network training tasks bottleneck at the calculation of the activation functions, and their derivatives. The computation time of the Elliott function is considerably less than the traditional versions of the hyperbolic tangent and sigmoid as there is no reason to perform the exp function.
- ActivationElliott (can be used in place of ActivationTANH)
- ActivationElliottSymmetric (can be used in place of ActivationSigmoid)
The equation for the ActivationElliott (range 0 to 1) is shown here.
The equation for the ActivationElliottSymmetric (range -1 to 1) is shown here.
Comparing Elliott Activation Function to Sigmoid, Hyperbolic Tangent & Logistic
The following chart shows how the blue sigmoid function(logistic) compares to the Elliott activation function.
The following chart shows how the blue hyperbolic tangent compares to the Symmetric Elliott activation function.
The Elliott activation function, in both cases, is slower to approach its asymptote on both sides.
Performance of the Elliott Function vs Sigmoid, Hyperbolic Tangent & Logistic
The entire purpose of Elliott is to be a faster approximation than other activation functions. However, performance is a two-sided coin. The computation time of Elliott is a traction of the computation time for the more complex activation functions. However, the curve is slightly different TANH and Sigmoid. The real question is how fast will the neural network converge. The Encog examples contain a program named ElliottBenchmark that attempts to answer that. This simple benchmark uses a Fahlman Encoder to evaluate the Elliott Symmetric vs TANH. We consider both time to train, as well as total iterations used. The results are given here.
|Activation Function||Total Training Time||Avg Iterations Needed|
Two things should stand out from the above chart. First, TANH usually needed fewer iterations to train than Elliott. So the training accuracy is not as good with Elliott, for an Encoder. However, notice the training times. Elliott completed its entire task, even with the extra iterations it had to do, in half the time of TANH. This is a huge improvement and literally means that in this case, Elliott will cut your training time in half, and deliver the same final training error. While it does take more training iterations to get there, the speed per iteration is so much faster it still results in the training time being cut in half.
For any benchmark, it is important to list the stats. For this example, we were using a 25:5:25 encoder. We were using Resilient Propagation training. Exactly the same starting weights were used for all trials. These starting weights were a random range of -1 to +1. The total training time is how much time it took to complete 50 trials. Each trial trained to an error of 1%, using the Mean Square Error method. The average number of iterations per trial is reported.
- ↑ A better Activation Function for Artificial Neural Networks (David L. Elliott, 1993)