You are here

LevenbergMarquardtTraining

I started looking at Encog recently when the ConsoleApp example named "Forex" attracted my attention. Tried running it, it was throwing an exception.

Traced the exception, was resulted from _lamba going /10 till reaching zero, tried to apply a dirty fix, by skipping the Iteration method completly if lamba is zero, however I realized this is not right.

Heard later that the C# classes are a converted from Java, so went to check the Java class. Looking at it appears that the Java class is not an exact replica of the C# class. On the forum, César Souza also mentioned that the class is a clone of his class, so went checking in his Accord.NET framework as well.

Conclusions: the java class of Encog is a simillar code as Accord.NET implementation. The C# class implementation is not.

Recommended change:

                if (decomposition.IsNonsingular)
                {
                    _deltas = decomposition.Solve(_hessian.Gradients);

                    UpdateWeights();
                    currentError = CalculateError();
                }

to:

                if (!decomposition.IsNonsingular)
                    continue;
                
                _deltas = decomposition.Solve(_hessian.Gradients);

                UpdateWeights();

with this change the code should have the closer to the Java behavior. I'm not sure it is the same as the Java code with just this change. Will keep looking and make another post once I find a complete list of changes of the C# class to match the Java implementation.

I would request that a member of the Encog team checks that the code is as intended and if the association with Accord.NET by César Souza is accurate, the code should be re-checked against that as well.

Please apply the patch to the GIT repository or tell me what is the protocol for checkin.

Regards,

John

Neural Network Forums: 
SeemaSingh's picture

That will actually be addressed in 3.1. We originally used the Accord.Net implementation of LMA. But it is pretty limited, it got rewritten on the Java side. Most of the issues we had with the accord version was a few calculation bugs, limitation to a single layer, huge memory requirements, single threaded, and limitation to a single output neuron Jeff rewrote the Java side of LMA to address all of these issues with the Accord version. So checking to it would not be useful at this point. I think they have since fixed a few of the bugs....

However, we still need to port the Java side to .Net, and that is on the list for 3.1, and it will solve the issue you are referring to.

LSI's picture

If the java implementation is done, please let me know where it is and I'll be happy to try porting it to C#.

If this goes well, what is the procedure to check in the ported C# version?

jeffheaton's picture

First of all, the typical path for checking a change into Encog is to fork the GitHub repository and then push you change to us. Which is typical for Git projects. Then we can review the change, and if accepted you are credited as the one who made the change. The following is a good summary of how to contribute.

http://www.lornajane.net/posts/2010/contributing-to-projects-on-github

Seema, the I have ported the Java code to C# so that step is not needed.

LSI, what version of Encog are you comparing the Java and C# implementations? In 3.1, which we are just about ready to release, they are the same.

https://github.com/encog/encog-java-core/blob/master/src/main/java/org/e...

vs

https://github.com/encog/encog-dotnet-core/blob/master/encog-core-cs/Neu...

These are both based on the newer version of LMA that I created after removing the Accord version (Encog 3.1). The biggest problem I had with the Accord version is that it makes use of a very large Jaccobian matrix that has a row for each training set element. On a large training set with a large neural network that blows out the memory real fast.

I will take a look at the change you suggested and implement it if all checks out okay.

cesarsouza's picture

Hello Jeff, John,

I just wanted to mention that the current version of the Levenberg-Marquardt learning of the Accord.NET Framework also has been updated and does not suffer from those limitations anymore. The code is now multi-threaded, supports multiple outputs and multiple hidden layers and the gradient calculation bug has been fixed. The Accord.NET still computes the Jacobian, but now uses an optional gradual construction approach which could be used to trade speed for memory and vice versa.

By the way Jeff, I also implemented an in-place version of the Cholesky decomposition and an in-place inverse trace calculation which could also be useful in Encog. It increases speed neatly as it is not necessary to compute the whole inverse matrix only to update the Bayesian regularization hyperparamenters. If Encog doesn't already do that, feel free to use the same technique in Encog as well. It improves performance a lot when using Bayesian regularization.

Best regards,
Cesar

jeffheaton's picture

Cesar, thanks! Will take a look.

Also just checked in a change (Java and C#) that should prevent the lambda zero issue on a non-signular.

LSI's picture

Hi Jeff,

I've taken the latest of LMA and tested it. On the forex example, maketrain.cs, the lamba still hits zero.

Have inserted a few lines in the method LevenbergMarquardtTraining.Iteration(), from:

            LUDecomposition decomposition;
            PreIteration();
//insertion
            if (_lambda<100*double.Epsilon)
            {
                PostIteration();
                return;
            }
//end insertion
            _hessian.Clear();
            _weights = NetworkCODEC.NetworkToArray(_network);

and the return statement has been hit.

I don't understand LMA well enough to figure why the lamba gets to zero.

Currently I'm playing with other training methods and have applied what you are saying in another post, that you are using genetic training and than training with RPROP the first 100 networks, but applied the same method on PSO and then training the top 100 best networks.

PSO actually has no method for exposing the networks, so I've added one (not checked it in).

* I would say that the code that would apply genetic and than RPROP or PSO and than RPROP should be made visible in some example, as a sample implementation may benefit others.

* Other idea, looking at the code of Encog, it feels to me that all your development is done in Java and than ported, as the C# code uses none of the advancements of .NET 4.0 which makes the code lot smaller and more concise. Actually, as a side effect, in .NET 4.0 you can make threading optimisations easier.

Should I use the Java code to get quicker access to your latest development?

* Been reading about your C++/GPU idea of client that would work on sockets. I was thinking the same thing, for faster performance. Also, I believe that in Encog, C# or Java, it is easy to develop a distributed, multiple-computer network for doing the training, which may speed up things a lot, even without the C++ client.

Would a sample of multi process/multi machine training benefit the community?

Regards,
John

jeffheaton's picture

Since I am not reproducing the issue you are seeing I am mostly guessing at what you are encountering. LMA works by using lambda to interpolate between GNA and Gradient Descent. A low(or zero) lambda would happen is training is successful each time and lambda just continues to get pushed to the GNA side of things.

But even if I fix lambda at zero, I am not getting any sort of exception. I can easily impose a hard minimum on Lambda. But I still can't see why that is necessary. What is the stack trace you run into when lamda hits zero? If I can see why that is happening then I will put in a minimum on lamda... but just want to be sure that is really the issue.

Sure, any sort of distributed training example would be useful.

The C++ client mainly gives me direct access to the GPU API, in this case CUDA. It is also one less layer to go through. The problem on the GPU is that often the transfer time between the host and GPU will eat up any performance gain the GPU gave you. So while I experiment with getting the best performance from the GPU, pure C/C++ is the best environment to experiment with.

Encog releases are typically a mix of C# and Java improvements. Depending on which contributor is working on what. Then they get translated to the other side. I typically work on the Java side first. I do like the new C# improvements for threaded loops, and will probably move some of the code over to that, at some point. The C# code reminds me of C/C++ OpenMP, which is pretty nice! The Hessian calculation is a natural place the 4.0 threading improvements could be added, rather than the thread pool it uses now.

mancer's picture

I tried imposing a floor on lambda, and I actually got worse results. I really suggest you DO NOT make that change. I also tried hard-fixing lambda at zero and got no exception.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer