Threading Problem
If I'm not mistaken, as of version 2.3, Encog now performs training in a multi-threaded fassion. This makes sense, because just a moment ago I was getting around 20 epochs of training per second with 2.3 and now I am getting 7 with 2.2.
So here's the problem. Training will be progressing just fine, and then just flat out stop. No error. No nothing. Just, stops. If I am running the training through a c# windows forms application, the application stops responding. If I attempt to pause execution through Visual Studio, that fails. Nothing responds until I stop the application. The thing that makes me think this is related to threading is that the "error" (if you can call it that), happens at completely random points during training. I've had it happen after 500 epochs of training, and after 20,000 epochs of training. Something that random has to be 2 threads doing something and they finally tried to do it at the exact same moment.
Also, the error happens whether I am executing the code in a windows forms app, or just a console app. Also, it always happens after my own code has executed and compiled the data for the network. Always happens during training. No exceptions are generated, just everything stops responding. Occurs inside the loop responsible for iteratively training the network.
Seems like a 2.3 bug to me, but I could be wrong.
Ideas?




That does sound like a possible threading problem to me as well. And I think I actually saw it happen once while I was trying a long C# train. I noticed the same thing, visual Studio would not pause either. I will try to reproduce this.
For 2.4 the threading is going to go through the C# thread pool, rather than using "joins", so there is a chance that might fix it as well. But I will look further into this.
You can disable threading too. There is a property on all of the propagation training methods that sets the number of threads. If you set it to zero then Encog will pick a thread count based on your number of processors(this is the default). If you set it to 1, then threading is disabled.
Jeff
Thanks for the response. You rock Jeff.
A few questions.
1. Have you seen the lockup in conjunction with a console mode train?
2. Were you using the built-in Encog training dialog, or one of your own? (I've see it do it once with the built in Encog dialog, after about 1 day of non-stop training).
I have a dialog-based train running right now, been going about 16 hours, and no problem. I did see this once after a straight day of training, so I am going to let this one go up to 3 days, if it can make it that for.
I am also curious if console mode has this problem or not. I may assign a second computer to the same training in console to see if that happens or not.
The method by which Encog 2.3 does multithreading is by using one thread and then joining the other threads. This is less than ideal, as it creates quite a few threads that are constantly dieing and being recreated. This is something I am nearly done fixing in 2.4, so I will also check to see if this issue exists in 2.4 once I am done revamping the threading stuff. It will now use the built in threadpool in Java and C#. I started this process C# side, so it will be in Encog .Net 2.4 very soon.
1. Yes I have had a console app fail in training.
2. One of my own... For the forms based app I wrote, I've used 3 different things. A thread, and BackgroundWorker object, and a timer, all of which had a bit of code that updated several points on the user interface informing me of the current status of the training. All of those failed at some point fairly early in training.
Though, I swear I did see it once, but dismissed it to something else locking my computer up. But I recall, I was not able to pause.
What are the stats on your network? i.e. input count, output count, hidden layers and number of training elements?
I am hoping the more robust threadpool based threading used in Encog 2.4 will eliminate any likelyhood of this happening. I think its nearly done, I have a test training right now with a fairly large neural network.
I have not seen this issue in some lengthy training I was doing. I also greatly revamped the thread processing, there was a very big error with the way the thread pool was being handled.