difference between calculateError & getError, Bug?

securelinprog's picture

I created a Backpropagation object(bp) for training a feedforward MLP with one hidden layer. basicnetwork is trained until bp.getError() was less than 0.02 .

After training stopped, when i use basicnetwork.calculateError function for see error on training set, give me: 1.152316933510082

why there is different? I set stopping training at less 0.02 error but when i test the data set used for training get me 1.152316933510082 !

Is there a bug in Encog?

Please help me. Thanks.

jeffheaton's picture

But not generally a huge value. The reason for this is as follows.

Propagation training works by calculating the error. To calculate the error I must iterate over every training set element, which takes time. So I don't want to do this any more than I have to.

To calculate the training adjustment I need to calculate the error for each training set element and as I do this I also calculate the overall error. Then, once I've processed the entire training set, I modify the weights in such a way that usually produces a better result(backpropagation). The problem is, the error(which is reported by train.getError) is from my loop where I was calculating the differences, BEFORE the adjustment is made.

When you go to call basicnetwork.calculateError, it is working on the modified neural network, so it usually should be slightly better than the .getError reported by the last training iteration.

To test this I modified the XOR backprop example to display the final training iteration error, as well as the final error of the network, as reported by basicNetwork.getError.

I get the following results:

Epoch #3924 Error:0.009999918617581394
Final network error:0.009998509740975152

If its a really big difference, there is something else going on. Make sure its the same training set as the network was trained with.

Jeff

securelinprog's picture

I am sure that i use same training set as the network trained with.

But This is Wonderful! I set maximum error to 0.01. When training stopped, I test training set but output of network is wrong, very very wrong?! This makes me surprised! May be in calculating error in Encog exists bug. If want, i can send my source code.

Is it possible my network was wrong? network is MLP(feedforward) with hidden layer, i set all activation function to ActivationSigmoid(), Thredhold parameter for layers is true, training method is Backpropagation.

I think there is a bug in Encog. If want, i can send my source code.

jeffheaton's picture

I still can't reproduce this. This is also an area I use quite a bit, since I train networks and then turn right around and use them.

Use the following, code for your training loop.

int epoch = 1;

do {
train.iteration();
System.out.println("Epoch #" + epoch + " Error:" + Format.formatPercent(train.getError()));
System.out.println("Evaluated error: " + Format.formatPercent(train.getNetwork().calculateError(trainingSet)));
epoch++;
} while(train.getError() > 0.01);

System.out.println("Final evaluated error: " + Format.formatPercent(train.getNetwork().calculateError(trainingSet)));

The code above displays the error reported by the training function at each loop iteration and then the second shows the "evaluated error" just after the training loop is done. Notice they are not exactly the same, which is normal.

They actually move together in pars, as each Epoc's evaluated error is the same as the next epoc's training error. Try this technique with your data and see how close they stay together.

Epoch #1 Error:56.997184%
Evaluated error: 56.997184%
Epoch #2 Error:56.997184%
Evaluated error: 52.843587%
Epoch #3 Error:52.843587%
Evaluated error: 50.120023%
Epoch #4 Error:50.120023%
Evaluated error: 51.324022%
Epoch #5 Error:51.324022%
Evaluated error: 51.296656%
Epoch #6 Error:51.296656%
Evaluated error: 51.270567%
Epoch #7 Error:51.270567%
Evaluated error: 50.072441%
Epoch #8 Error:50.072441%
Evaluated error: 50.634920%
Epoch #9 Error:50.634920%
Evaluated error: 50.627715%
Epoch #10 Error:50.627715%
Evaluated error: 50.614936%
...
Epoch #68 Error:4.308266%
Evaluated error: 2.846251%
Epoch #69 Error:2.846251%
Evaluated error: 1.700173%
Epoch #70 Error:1.700173%
Evaluated error: 0.883071%
Epoch #71 Error:0.883071%
Evaluated error: 0.391167%
Final evaluated error: 0.391167%
Neural Network Results:
0.0,0.0, actual=2.1195601254130874E-5,ideal=0.0
1.0,0.0, actual=0.9961354019551435,ideal=1.0
0.0,1.0, actual=0.995249927137771,ideal=1.0
1.0,1.0, actual=0.004868884513333364,ideal=0.0

securelinprog's picture

This is result. Only for running faster, i changed 0.01 to 0.02

run:
Epoch #1 Error:2.392933%
Evaluated error: 113.232199%
Epoch #2 Error:1.769253%
Evaluated error: 104.223779%
Final evaluated error: 104.223779%

As you see, there is large different between train.getError() and network.calculateError()!
Also outputs of network have large different with ideal outputs(i don't insert outputs top, if want i can do it)!

jeffheaton's picture

Okay my last idea. Are you using a ContextLayer? If so, it may be holding context and be out of sync between training and error evaluation. If that is the case try calling network.clearContext() just before you call calculateError.

Failing that, I would almost need to see source code to verify if it is an Encog bug.

securelinprog's picture

Please give me a email address to send source code to you or send me a mail to getting it:
[[email edited]]

jeffheaton's picture

I also edited your email address from the post so it won't be out in the public.

jeffheaton's picture

I analyzed the code and found that if you specify the number of threads to be greater than the number of training set elements you will get an "incorrect" and very low error rate. Generally it is best to set the thread count to zero(the default) and allow Encog to choose the number of threads to use, based on your processor count and training set size.

This issue has been fixed in the beta version of 2.4, and has been checked in.

Thanks for the bug report!

MAMA's picture

Hi

I am using version 3.0.1 and i have exactly the same problem as mentioned

I train in loop until getError() is below a certain value. I stop the training, getError() stays the same and calculateError() gives me a completely different number. Also testing on the train set
itself gives me the calculateError() result and not the getError() result (i compare the output for
each item in the train set and compute my own MSE)

for example:
my program print this line after training completed
==============================
Train results: Epoch #6 Error:2.4392628789510988E-5 calculated 0.5864413691841887
The calculated error is always much higher which renders the network just trained useless

I have not set any thread count so i guess i am using 1 thread.

I would appreciate a suggestion, am i doing anything wrong?

MaMa

P.S. i can analyze the problem myself and maybe propose a bug fix if you can give me an Eclipse
environment for the encog 3.0.1 source code (which i have downloaded)

jeffheaton's picture

Have not seen that. What type of network are you using? I get this.

Epoch #1 Error:0.26775822894465584
Epoch #2 Error:0.257262863962477
Epoch #3 Error:0.2512153294004623
Epoch #4 Error:0.248852631892871
Epoch #5 Error:0.24790764184957806
Epoch #6 Error:0.2457956130135336
Epoch #7 Error:0.24463221766519644
Epoch #8 Error:0.2407044638402997
Epoch #9 Error:0.23575270874240695
Epoch #10 Error:0.22939899854035473
...
Epoch #114 Error:0.010428424792723556
Epoch #115 Error:0.008624079357125608
CalculateError: 0.006926492267455526

CalculateError and train.getError will never be exactly the same, for reasons described earlier in this thread. But they should not vary by much.

This is the code I used.

final ResilientPropagation train = new ResilientPropagation(network, trainingSet);

int epoch = 1;

do {
train.iteration();
System.out.println("Epoch #" + epoch + " Error:" + train.getError());
epoch++;
} while(train.getError() > 0.01);

SeemaSingh's picture

Another thing to consider. A .5xxx error rate seems like an untrained network.

Some of the trainers modify the network that they were passed, others do not. For example genetic algorithm and PSO do not, they create a new trained neural network. Therefore, you need to make sure that you call getMethod() on the trainer so that you have the new trained network. Most trainers will modify the network provided to them. Like the RPROP that Jeff is using above.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.