The Future of Encog GPU/OpenCL Usage
We are currently wrapping up Encog 3.1 on the Java side, and will soon begin efforts to port new enhancements to the C# side. I will be working some with some of the other Encog developers as they port and tie up any lose ends in 3.1. However my main focus will be pushing some of the new directions for Encog in 3.2. I put up a poll recently seeking votes on what new features are needed most in Encog. This poll is still ongoing, so feel free to weigh in. You can see it here.
http://www.heatonresearch.com/node/2462
One feature that seems to be high on the list is GPU integration. Encog did have GPU integration in version 2.5. Encog could really push a neural network off to a GPU, train it, and get consistent and useful numeric results. Did this actually do anything for training times, though? In certain rare instances, I saw marginal improvements. But overall, GPU training was generally slower than CPU-only. In a nutshell, this is because GPU's are really fast at a few "purely mathematical" things, but dead slow on everything else. Also it takes time to move data between the GPU and CPU. And fundamentally, for a neural network, the data MUST come home to the CPU. For a video game, this is not often necessary, the rendering done by the GPU is shown to the monitor and never returns to the CPU.
Adding the GPU code complexities the Encog code considerably. Since it had limited effect, I removed it in 3.0. I learned quite a bit from the first attempt. Lessons I am going to now apply to the second attempt. I also believe this new approach will allow me to exploit the strengths of the GPU, and hopefully side step the GPU's many limitations.
I am going to begin a series of articles as I attempt to re-integrate the GPU into Encog. This will be a totally different approach than before. For one, it will be a totally parallel approach. I am going to create a stand-alone executable for high-speed training. This executable will be used by both the Java and C# sides of Encog. It will be written in either C/C99 or C++, and it will be 64-bit. It will use the GPU, as well. Eventually it will use sockets and communicate with other versions of itself. I am going to use GCC, so it will work in Windows, Linux and Mac quite well.
You will see me reference a language called C99. C99 is very similar to C, but has some nice syntax cleanups. It is essentially C++ minus the objects. OpenCL kernals must be written in C99. If you can read C, you can read C99. So these articles are going to be very much C/C99/C++ based. You communicate with a GPU using a C-based API. So with Java or C# communicate with a GPU they are going via a very indirect approach. Which is why you need intermediaries, such as CLOO, JOCL, JCUDA, and the like. I am going to skip all of that and just do it right in C. I was a C/C++ programmer long before I did Java/C#.
It will also be interesting to see the performance of a C-based trainer vs a Java or C# one. I really don't know which will win. I know Java and C# have made great strides, but it will be interesting to see for myself.
Here are the articles.
Article 1: An XOR Neural Network in C99
For this one I am not even going to use C++. I am shooting for minimalist. I am going to create a multi-layered neural network that is to be trained using PSO. I will use C99, because MOST C compilers support this dialect. Also my future GPU kernals will be written in C99, so this will be useful.
Article 2: A Particle Swarm (PSO) Neural Network in C99 with PThreads
Next, I am going to enhance the first article with POSIX threads. We will now have a multi-threaded PSO trainer for a neural network, similar to what Encog Java/C# already has. I will do some speed comparisons at this poin. This will be iterative training, just like Encog.
Article 3: Taking Particle Swarm (PSO) Beyond Iterations (C99/Java/C#)
Iterations are the enemy of parallel programming. It is terrible to make all of the threads stop, and wait for every last thread to stop before beginning a new iteration. The threads will NEVER all finish at the same time. This prevents you from being able to use high-percentages of CPU power. If some of the threads are using the GPU, than that even further unbalances things. I will modify the PSO algorithm to not use iterations.
Article 4: Running OpenCL Based C99/C/C++ Applications on the Amazon Cloud
Amazon makes some really nice GPU based cloud nodes available. Here are the current specs.
- 22 GB of memory
- 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
- 2 x NVIDIA Tesla “Fermi” M2050 GPUs
- 1690 GB of instance storage
- 64-bit platform
- I/O Performance: Very High (10 Gigabit Ethernet)
- API name: cg1.4xlarge
For more information on the GPU, you can read here.
http://www.nvidia.com/object/preconfigured-clusters.html
This is the platform I want to test the new GPU software on. This prevents me from having to build such a monster computer! I can simply rent it for the time I need it. Which is really not that much for what will probably be several test-fires. I will use it to make sure I am dual-GPU compatible.
I've never used the Amazon cloud before. So I will learn to get Encog and my C-based trainer up and running on this machine. I will document in the article, in case anyone else wants to.
Article 5: A Threaded OpenCL PSO Trainer
Final article in the series, I will extend the trainer for OpenCL. This will be multi-threaded and will make use of the GPU and be fully compatable with HPC type instances provided by Amazon Cloud. I will also benchmark with and without GPU. This time it better be faster!
Beyond the Articles
If this all works out, I will then take article 5 and make this executable easier to use with Encog. It will read/write Encog EG and EGB files. I might even provide a means for Java/C# to call it directly.




I hope that the GPU optimization will work this time
I've seen it before applied to number crunching that is not related to graphics processing, such as page 20 of this one: http://www.backtrack-linux.org/documents/BACKTRACK_CUDA_v2.0.pdf
In addition to that...
Is there a possibility to use custom data types in the new edition?
I've modified one of Encog 2 builds to be able to deal with CVNN. However, I can afford to redo the modification whenever Encog is updated.
You can use templates for example...
Hey Jeff,
This is something I've been looking into. Have not had a chance to write any code using it. Let me know what you think.
http://code.google.com/p/aparapi
Jeff E.
Another implementation of OpenCL for Java http://code.google.com/p/javacl/
When are you thinking of hooking Encog for C into Encog Java/C#? Would this be a 3.1 or 3.2 thing? I am thinking 3.2??