Looking for volunteers/help for a reasonbly easy Encog extension... PROBEN1
Okay, if you've been looking for a chance to contribute to Encog, this might be it. This falls very much under the domain of traditional programming, and not so much under the more "advanced mathematical" extensions we've been doing with Encog lately. Also it is fine to help in C# or Java, this is very much a "parallel" effort. Also I will guide/help this effort, and write the "core" of it.
Here is the idea. Encog has been benchmarked fairly extensively. But this is mostly in terms of how fast can Encog calculate a backpropagation iteration compared to some of the other frameworks out there. We have very little in the way of benchmarking how well a training iteration does. This is really more useful for benchmarking Encog against itself! Why do such a thing? Maybe you want to know what network structure or activation function works best. For me, I am doing quite a bit with extending Encog NEAT, and there are a whole "witches brew" of extensions to NEAT to consider, and who knows which ones actually help vs harm. It would be nice to have a very standard Encog benchmark that measures both time AND error accuracy.
Encog does benchmark accuracy in some areas. Like weight initialization. But almost always we use a Fahlman Encoder. Which is a good enough way I suppose, but an encoder is FAR from representative of real data.
There is actually a standard benchmark. It is called PROBEN1. I would like to add a full PROBEN1 evaluation to Encog. That works with ANY ML method. I.e. we could pit NEAT vs ANN, or ANN vs SVM or anything. There are 15 tests in PROBEN1. I will take a few and create the core of this "example", then others can extend this and add the remaining. I will assign the tests, and answer any questions. This is actually a very good introduction to how to get data into and out of Encog.
Would any one be interested in helping? Advanced knowledge of AI/Math is defiantly NOT needed for this one.




Thanks for organizing this! I for one will frequently use this once it is in place.
I agree, this would be a great way to learn the basics of how to setup data to be used by a machine learning method, such as a neural network. Or even SVM, etc.
I googled PROBEN1 and found the PDF (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.8960&rep=rep...) document. If one volunteers will they be given time to study the document first?
Oh for sure, this is not something we will be doing at a real fast pace either.... Since I have lots of Encog 3.1 things I am doing, myself. :) There will probably be a several week pause at the beginning while I build the core code anyhow.... Which, I have not even started on.
I need to read the doc too. :)
Thanks for posting the link, I should have done that!
I would be interested in helping with this. I haven't played around much with encog yet but I guess there would be no better way than actually working on it.
Hi,
The results could be quite interesting both for demonstrating the capabilities and quality of Encog implementations and also as a way to compare algorithms, control parameters and network topologies in a more scientific way. Systematic studies with many combinations are not common because they require a lot of preparation work, rigour, patience/stamina(!) and are computationally intensive. However if it is done over time can be contributed by other people and facilitated by a framework which automatise the process then it might prove very useful for collecting evidences to better understand the difference among algorithms and the effects of control parameters. I can see this being used for many purposes.
If you think this can help I can send you the code of the "virtual lab" I've built on top of Encog as part of my project?
Basically it is a lightweight framework to carry repeated and comparative experiments just like those done in PROBEN1 and other studies. I can include a PROBEN1 dataset and set up any number of training algorithm on it in just a few lines of code and all the training and performance statistics (nb of iterations, duration, accuracy, accuracy per class, MSE, ...) are done automatically. My intention was to make it public at some point anyway as it might be useful for other researches or even people trying to find the best parameters for their networks.
The code is not polished and many things could be done more simply by better exploiting Encog own classes but I believe that at least you might be able to reuse parts of it of find useful ideas.
Geoffroy
I'm game...count me in.
Any progress on this, Seema?
Happy to help out if I can. My focus is on C# but I can run whatever you come up with, I suppose.
System is a 4GHz 6-core cpu, 24GB RAM and I also have 1024 CUDA cores in the box.
Regards,
Merlin