Resilient Propagation

Resilient Propagation (RPROP) is one of the best general purpose training methods provided by Encog for neural networks. RPROP is not the best training method in every case, but in most cases it is. RPROP can be used for feedforward neural networks and simple recurrent neural networks. Resilient propagation is a type of propagation training, and is susceptible to the Flat Spot Problem for certain activation functions.

Resilient propagation will typically outperform backpropagation by a considerable factor. Additionally, RPROP has no parameters that must be set. Backpropagation requires that a learning rate and momentum value be specified. Finding an optimal learning rate and momentum value for backpropagation can be difficult. This is not necessary with resilient propagation.

Encog made use of an RPROP technique described by M. Reidmiller [1] in versions prior to Encog 3.0. For Encog 3.0 Encog switched to four different RPROP algorithms described in a paper by C. Igel. [2]

Currently Encog makes use of the following four RPROP algorithms:

• RPROP+
• RPROP-
• iRPROP+
• iRPROP-

Usage

The Encog RPROP trainer is used much like any other trainer in Encog. It implements the IMLTrain (C#) or the MLTrain (Java) interface. The following code shows how to create an RPROP object. For a complete example see the Hello World Example.

Java Usage

The following code creates a RPROP trainer in Java.

MLTrain train = new ResilientPropagation(network, trainingSet);

C# Usage

MLTrain train = new ResilientPropagation(network, trainingSet);

The following code creates a RPROP trainer in C#.

Calculation

In this section we will take a look at how resilient propagation actually functions. The change in weight is calculated as follows.

Meaning of RPROP Clauses
Clause Meaning Pseudo-code Variable
$\Delta_{ij}^{(t)}$ Update value for current iteration t. delta
$\Delta_{ij}^{(t-1)}$ Update value for the iteration t-1. lastDelta
$\frac{\partial E}{\partial w_{ij}}^{(t)}$ Gradient weight from i to j for iteration t. gradient
$\frac{\partial E}{\partial w_{ij}}^{(t-1)}$ Gradient weight from i to j for iteration t-1. lastGradient
E The training error error
E(t − 1) The training error for iteration t-1 lastError
$\Delta{w_{ij}^{(t)}}$ The change in weight from i to j made by the current iteration (t). weightChange
$\Delta{w_{ij}^{(t)}}$ The change in weight from i to j made by the current iteration (t). lastWeightChange
η + Positive step value. Typically 1.2. maxStep
η Negative step value. Typically 0.5. minStep

Reidmiller Implementation

The original paper describes RPROP as a two step process. First, we update the weights. This is done with the following formula.

$\Delta{w_{ij}^{(t)}}=\begin{cases} -\Delta_{ij}^{(t)} & \mbox{, if } { \frac{\partial E}{\partial w_{ij}}^{(t)} > 0} \\ +\Delta_{ij}^{(t)} & \mbox{, if } { \frac{\partial E}{\partial w_{ij}}^{(t)} < 0} \\ 0 & \mbox{, otherwise } \end{cases}$

Here we calculate the change in weight (delta w) depending on what the update value (delta) is.

Once the weights are calculated, we determine the new weight update value. This is done with the following formula.

$\Delta_{ij}^{(t)}=\begin{cases} \eta^+ \cdot \Delta_{ij}^{(t-1)} & \mbox{, if } { \frac{\partial E}{\partial w_{ij}}^{(t-1)} \cdot \frac{\partial E}{\partial w_{ij}}^{(t)} > 0} \\ \eta^- \cdot \Delta_{ij}^{(t-1)} & \mbox{, if } { \frac{\partial E}{\partial w_{ij}}^{(t-1)} \cdot \frac{\partial E}{\partial w_{ij}}^{(t)} < 0} \\ \Delta_{ij}^{(t-1)} & \mbox{, otherwise } \end{cases}$

Now lets see how this is actually done, broken into a series of smaller steps. First me must determine the sign of the change in derivative. This will require use of a sgn function. Such a function is defined here.

$\sgn(x) = \begin{cases} -1 & \text{if } x < 0, \\ 0 & \text{if } x = 0, \\ 1 & \text{if } x > 0. \end{cases}$

The actual change in sign is calculated as follows.

$c = \frac{\partial E}{\partial w_{ij}}^{(t)} \cdot \frac{\partial E}{\partial w_{ij}}^{(t-1)}$

What we do now depends on the sign of c.

If c>0 Then

If this is the case, then the sign has not changed. This is good, try to accelerate with a larger update value.

$\Delta_{ij}(t) = min(\Delta_{ij}(t-1)\cdot\eta{+},\Delta_{max})$

$\Delta w_{ij}(t) = -sgn \left(\frac{\partial E}{\partial w_{ij}}^{(t)}\right )\cdot\Delta_{ij}(t)$

Δwij(t + 1) = Δwij(t) + Δij(t)

$\frac{\partial E}{\partial w_{ij}}{(t-1)} = \frac{\partial E}{\partial w_{ij}}{(t)}$

Else If c<0 Then

If this is the case, then the sign has changed. This means that the last update was too big and we jumped over a local minimum.

$\Delta_{ij}(t) = max(\Delta_{ij}(t-1)\cdot\eta{-},\Delta_{min})$

$\frac{\partial E}{\partial w_{ij}}{(t-1)} = 0$

Else c = 0 Then

If this is the change in the gradient was very small. We will continue to apply the update, but not change the update.

$\Delta w_{ij}(t) = -sgn \left(\frac{\partial E}{\partial w_{ij}}^{(t)}\right )\cdot\Delta_{ij}(t)$

Δwij(t + 1) = Δwij(t) + Δij(t)

$\frac{\partial E}{\partial w_{ij}}{(t-1)} = \frac{\partial E}{\partial w_{ij}}{(t)}$

End If

Implementing RPROP+

RPROP+ is very similar to the original Reidmiller implementation. The main difference is that we revert the previous iteration's weight change if the sign of the gradient changes in the current iteration. This is sometimes called "weight backtracking". The basic programmatic implementation for calculating each weight change is shown here.

// multiply the current and previous gradient, and take the
// sign. We want to see if the gradient has changed its sign.

weightChange = 0

// if the gradient has retained its sign, then we increase the
// delta so that it will converge faster

if change > 0

delta = min( delta * positiveStep , maxStep)

// if change<0, then the sign has changed, and the last
// delta was too big

else if change < 0

delta = max( delta * negativeStep , minStep)
weightChange = -lastWeightChange

// if change==0 then there is no change to the delta

else if change == 0

weightChange = -sign( gradient ) * delta

end if

lastDelta = delta
lastWeightChange = weightChange

Implementing RPROP-

RPROP- questions how important the "weight backtracking" of RPROP+ actually is. RPROP- removes this and simplifies the algorithm.

// multiply the current and previous gradient, and take the
// sign. We want to see if the gradient has changed its sign.

weightChange = 0

// if the gradient has retained its sign, then we increase the
// delta so that it will converge faster

if change > 0

delta = min( delta * positiveStep , maxStep)

// if change<0, then the sign has changed, and the last
// delta was too big

else if change < 0

delta = max( delta * negativeStep , minStep)

end if

weightChange = -sign( gradient ) * delta
lastGradient = gradient

Implementing iRPROP+

The iPROP+ algorithm revisits the "weight backtracking" seen in in RPROP+. Some research suggests that iRPROP+ is the optimum RPROP algorithm.

// multiply the current and previous gradient, and take the
// sign. We want to see if the gradient has changed its sign.

weightChange = 0

// if the gradient has retained its sign, then we increase the
// delta so that it will converge faster

if change > 0

delta = min( delta * positiveStep , maxStep)

// if change<0, then the sign has changed, and the last
// delta was too big

else if change < 0

delta = max( delta * negativeStep , minStep)

if currentError > lastError then weightChange = -lastWeightChange

// if change==0 then there is no change to the delta

else if change == 0

weightChange = -sign( gradient ) * delta

end if

lastDelta = delta
lastWeightChange = weightChange

Implementing iRPROP-

iRPROP- is very similar to RRPROP-. No weight backtracking is used. However, the lastGradient is set to zero when the gradient changes its sign.

// multiply the current and previous gradient, and take the
// sign. We want to see if the gradient has changed its sign.

weightChange = 0

// if the gradient has retained its sign, then we increase the
// delta so that it will converge faster

if change > 0

delta = min( delta * positiveStep , maxStep)

// if change<0, then the sign has changed, and the last
// delta was too big

else if change < 0

delta = max( delta * negativeStep , minStep)
lastGradient = gradient