Current Encog 2.0 Build Training Problem?
Hi, I've been using Encog 1.1 to train Feed Forward networks with backpropagation. I've been wanting to try out resilient backpropagation and eventually recurrent networks so I decided to download the latest 2.0 code from the trunk directory.
I'm pretty sure I've made all the necessary code architecture changes to work with Encog 2.0 but when I run resilient or backprop on the same network structure and training data as I do with 1.1, the training error is way higher and drops much slower (60% as opposed to 10% with v1.1). So I am wondering if I'm am missing some network setup code that doesn't exist in v1.1 or if the current 2.0 build has training bugs?
Below is the code I use to create a new feed forward network
private static BasicNetwork getBaseNetwork()
{
BasicNetwork result = new BasicNetwork();
result.addLayer(new BasicLayer(new ActivationTANH(), true, NeuralDbSetupCommand.NEURAL_INPUTS.size()));
result.addLayer(new BasicLayer(new ActivationTANH(), true, 7));
result.addLayer(new BasicLayer(new ActivationTANH(), true, NeuralDbSetupCommand.NEURAL_OUTPUTS.size()));
result.getStructure().finalizeStructure();
result.reset();
return result;
}
Below is the code I use to train the network generated from the above code.
Train train = new ResilientPropagation(network, trainingSet);
int epoch = 0;
do
{
train.iteration();
if (Calendar.getInstance().getTimeInMillis() - startTime.getTimeInMillis() > TRAIN_TIME)
done = true;
System.out.println("Epoch #" + epoch + " Error:" + (train.getError()*100.0)+"%");
epoch++;
} while (train.getError() > 0.005 && !done && !Thread.currentThread().isInterrupted());
The only code I changed for the conversion from 1.1 to 2.0 was for network generation and training and so I have left out the code for data set loading and anything else. If anyone has ideas as to why I get such a bigger training error on v2.0, I would appreciate your input.
On another note, when I was trying to get specific network layers or synapses using the network structure get methods, I noticed that they return a list of the layers/synapses but that their ordering still wasn't definite. For example, network.getStructure().getSynapses().get(0) would not necessarily be the synapse between the input and first hidden layer. I thought this was weird seeing as getSynapses() returns a List and so I'm wondering if this unspecified order is a bug. I fixed the ordering issue on my downloaded copy by changing the Sets used in the finalizeStructure() to Lists.




Something is strange with the training. That and a few issues with the SOM training are the main things to address before 2.0 goes beta. It seems to be something new, the earlier builds seemed to do better. So I am currently looking into training. But thanks for the report, let me know if any other issues you run into.
As to the ordering. There is no real good way to order synapses or layers. So the order is not guaranteed between calls to finalizeStructure. Once its finalized the structure should stay the same. This lets genetic algorithms and simulated annealing work. At least that is the idea. Are you noticing that two calls to get return two different results in a row? That should not happen. But between saves, the order could change.
Jeff
The two calls may return a different synapse/layer if I call it for one network, then generate a new network via the getBaseNetwork() function I posted earlier, and make the call again.
In v1.1, I use the getLayers() function of the network class to get the weight matrices of each layer so that I can save them into my database to later be able to load them into a network when needed. This worked well for my saving and loading of networks because getLayer().get(0) always represented the same layer when I was saving and when I was loading.
In v2.0, when I save getSynapses().get(0), it may be saving the the synapse between the hidden and output layers while when loading into the same getSynapses().get(0), it may be loading the weights into the synapse between the input and hidden layers instead.
Is there some other way in v2.0 to traverse layers/synapses in an ordering that is always the same across networks with the same topology other than having to do a depth first search like the finalize does?
It seems that 2.0 was not using the thresholds correctly. This was having an adverse effect on ALL training methods. I checked in a fix for this, it was in the latest core cruse control build. I did not propagate the JAR file out to the other builds(i.e. workbench). I will let Jeff do this once he checks my fix. But I think I fixed it.
Another thing to note. Encog 1.x used Sigmoid as the default activation function, whereas 2.x uses TANH. I assume because its more general and works with positive/negative. But if you are using positive only data and results, I would suggest sigmoid. I updated the XOR example to use sigmoid, it works better that way!
I found a big article on neural nets a few weeks back that told me TANH was the best to use on my hidden layers and inputs. I normalize my inputs so that the mean is set to 0 and 1,-1 is a standard deviation from the mean. That seems to work pretty well from the results I've gotten so far.
I've re-downloaded the trunk code and everything seems to be working perfectly. I am amazed at how much faster resilient prop trains than back prop. This is going to allow me to attempt bigger and better things. I will be experimenting with recurrent networks most likely this weekend.
Thanks for all the hard work in making this project great!
Seema, thanks for fixing that bug. I checked out the latest and it seems to be working great. Cfraser, thanks for testing this out as well, and pointing out the bug.
I also fixed a persistence bug yesterday. I think we are getting close to being able to release 2.0-beta1. There are still some problems with the competitive training though. The OCR example is really inaccurate, much worse than 1.1. Though the SOM code that it is based on was a total rewrite from 1.1.
RPROP is pretty cool. It does so much more with an iteration than backprop, and you do not have to figure out optimal learning rates and momentum. I use it as my primary training algorithm, at this point.
One of the tasks for 2.1 is to introduce threading into the propagation algorithms. This will allow multicore processor machines to train even faster. Particularly on large training sets. Additionally, there are numerous optimizations we can make that will help all of the propagation methods. So things should get even faster with 2.1.
Jeff
Hi Jeff,
First of all, congratulations on your excellent work... after trying Joone, I've to say that I'm definitely happy with Encog.
I have a doubt about how to use properly RPROP. I've tried to substitute the original 'train' for the new one as follows:
//Train train = new Backpropagation(network, trainingSet, 0.0000001, 0.9);Train train = new ResilientPropagation(network, trainingSet);
But I get this error: "Type mismatch: cannot convert from ResilientPropagation to Train".
How can I use it properly? I'm working with image recognition, so I think that RPROP is the best option.
Looking forward to your answer.
Liboh
That should work, as Train is the interface to ResilientPropagation. I suppose you could always use:
ResilientPropagation train = new ResilientPropagation(network, trainingSet);
Though the XORResilient program does exactly the same line as you do. Does the XORResilient example program work? Make sure you have the latest 2.0 build, grab it from http://build.heatonresearch.com if you want.
Thanks, glad you like Encog so far. Many additions planned once 2.0 is released.
Jeff
i want to get help in the main vendour script in order to maintain my backup software, just upload the coding.
thanks
phantomguy