Flat Spot Problem (Sigmoid/Logist Activation Functions)

Flat Spot Problem

The Flat Spot Problem is an issue observed by Scott Fahlman in a paper on the Quickprop
training method. Flat spot occurs in certain activation functions, particularly the
Sigmoid/Logistic Activation Function. Encog does not use flat spot on the Hyperbolic
Tangent or ReLU Activation Functions. Flatspot can make it very difficult for a neural
network to property train with propagation training. Because of the Flatspot problem,
certain hidden neurons can be rendered completely useless. This can greatly increase
training time, and decrease overall efficiency for neural networks. For small neural
networks, with just a few hidden neurons, if enough random weights fall into the flat
spot range, the neural network will fail to ever converge.

To see why the Flatspot problem exists, consider that all propagation training methods
require a derivative of the activation function. Consider the derivative of the sigmoid
activation function.

$$ o_j(1 − o_j) $$

Where $o_j$ is the sigmoid output from unit $j$. The above derivative will approach zero
when $o_j$ is near 1.0 or 0.0. The graph illustrates this. You can see the flat spot at
the top of the graph, near zero.

Eliminating the flat spot is as simple as adding a constant, such as 0.1 to the derivative
function. This results in.

$$ oj(1 − o_j) + 0.1 $$

This, generally, has a very positive effect on all propagation training. No change is
made to the actual activation function, so the flat spot modification is only necessary
at training time.

Encog Handling of the Flat Spot

By default Encog addresses the flat spot. This has been shown to enhance Encog training.
However, you can disable the flat spot propcessing. To do this set the FixFlatSpot
property, on any propagation training object to false.

References

  • An Empirical Study of Learning Speed in Back-Propagation Networks” (Scott E. Fahlman, 1988)