Very slow training
Hi,
I am training a feed-forward NN using SCG and SA (hybrid strategy) and I also added the RequiredImprovementStrategy as suggested in a previous thread. Still I get the following results:
Error after epoch 100: 36.61838072764125
Error after epoch 200: 14.347653165186989
Error after epoch 300: 14.347653164862063
Error after epoch 400: 13.62807281946011
Error after epoch 500: 13.62807281946011
Error after epoch 600: 12.59441989294264
Error after epoch 700: 12.318382684206615
Error after epoch 800: 12.318382684206593
Error after epoch 900: 12.034959935828116
Error after epoch 1000: 12.034959935828116
Error after epoch 1100: 11.845618148062027
Error after epoch 1200: 11.764543105108851
Error after epoch 1300: 11.764543105108848
Error after epoch 1400: 11.69735220500957
Error after epoch 1500: 11.69735220500957
Error after epoch 1600: 11.673416910003713
Error after epoch 1700: 11.672358891917776
Error after epoch 1800: 11.672358891795284
Error after epoch 1900: 11.647973867560914
Error after epoch 2000: 11.647973867560914
Error after epoch 2100: 11.647973867560914
Error after epoch 2200: 11.647973867560914
.....
Error after epoch 4700: 11.647973867560914
Error after epoch 4800: 11.647973867560914
Error after epoch 4900: 11.647973867560914
Error after epoch 5000: 11.647973867560914
As you can see I am checking the error every 100 cycles. Sometimes the error remains constant over 100 cycles and later it changes. In the end there is no improvement at all for more than 3000 cycles. How can this be explained? And more importantly, do you have any suggestions for improving the training process?
Thanks in advance




This is actually a common occurrence. And there are several ways to go about it.
1. Your data just might not support learning to any lower of an error rate. This could be the way you normalized or collected it. It might just be too divergent. For example, consider two inputs, one ideal:
0,0,1
0,1,0
0,0,0
0,1,1
Okay, the above will make the neural network crazy in a real hurry, at least a feedforward one. You are saying that the output to for 0,0 is one sometimes, and zero other times. This will never train to a good error rate. A temporal network, such as elman, would actually try to learn the order you are giving the training elements, in this sort of case.
2. You might be in a local minimum. Though, SCG is usually pretty good at getting out of them. One option here, that I know Jeff uses alot, is to use a genetic algorithm. You may or may not get a really low error rate with the GA, but that is not really the idea. The idea is to get a population of say 10 or 100 decent neural networks. Then, take them one by one and train them with RPROP or SCG. The idea here is that you will have 100, or so, neural networks with very different weights. Then use iterative training to fine tune them.
3. An even more advanced hybrid, Jeff actually uses something a GA where he keeps taking the top ten organisms from the population, training them with RPROP, then putting them back into the population. Its kind of like sending the top organisms to "university". I know this technique will be introduced into Encog soon, but the building blocks are all there.