You are here

Choosing Between nVidia CUDA, GeForce and Tesla

Over the past few weeks I've been learning more about GPU programming. With GPU programming there are decisions to make. Should you use OpenCL, CUDA or perhaps DirectCompute. Once you've picked your platform, you will need a mid to high-end GPU card. GPU card speeds vary greatly. The first time I attempted GPU programming I did so with the same GPU that my computer came with. The GPU was compatible with CUDA and OpenCL. However it was not very fast, and I was making decisions about how to structure Encog based on obsolete hardware.

Now, a year later, I am ready to try to add GPU capabilities to Encog again. Hopefully, more successfully this time. I decided to start by buying a more advanced GPU. The first decision I had was nVidia or ATI. I decided to go with nVidia. Why? Because nVidia, to me anyways, seems to be on the forefront of GPGPU. GPGPU is General Purpose Graphics Processing Unit, basically, using your graphics card similarly to how you use your CPU.

The CUDA platform, which can be used to implement GPGPU, is based on nVidia. If you want to use CUDA, you will be using nVidia. If you want to use nVidia, ATI and Intel, then you will use OpenCL. Sounds like a slam-dunk to use OpenCL? Right? CUDA is much more advanced that OpenCL. In OpenCL you are programming your graphics kernels in C (actually C99, but still C). In CUDA, you are using C/C++. In OpenCL, you are dealing with a higher-level abstraction. If you are on an nVidia card, OpenCL is essentially compiling to CUDA. Because of these reasons, I decided to go directly with CUDA for Encog's GPU implementation. This limits me to nVidia cards, but that is something I am willing to accept. At least for now. I may add an OpenCL version at some point in the future, but that is not planned at this point.

Understanding nVidia GPU Families

Okay, at this point, I've decided on the chipset manufacturer that I will use. That is nVidia. Now, which nVidia card to choose. I am not a gamer. So I really do not keep up with the latest offerings by the various video card manufacturers. So I was immediately overwhelmed by the video card families offered by nVidia. Which should I buy? If you factor out notebook computers, you are left with three different families of cards you can buy for a desktop.

  • GeForce
  • Quadro
  • Tesla

I had a difficult time telling the difference between these. And after many hours of pouring over forum posts, reviews, nVidia's own material and other sources, I made my decision. Let me review what each of these are.

GeForce is perhaps the most well know of the nVidia lines. GeForce cards are designed for gamers. The absolutely pack the greatest amount of raw processing power for the amount of money you spend. GeForce is considerably cheaper than Quadro and Tesla. If you are a gamer, this is a no-brainer. Get a GeForce. Quadro cards are tested and certified to work with these packages. The difference is primarily in the drivers.

Quadro cards are the "professional" family of nVidia cards. They are designed for CAD/graphics systems. I saw a common theme on many sites. If you create content, then buy a Quadro. If you consume content, then buy a GeForce. Many high-end CAD and graphics creation software packages make use of CUDA and other advanced features of GPU's. The physical hardware is fairly similar between Quadro and GeForce. Quadro is great if you are a system builder creating machines for a particular program. You can be guaranteed it will work. Most likely GeForce would work too, but who wants guess work when you are selling a client several machines costing big bucks.

Tesla represents the high-end of the nVidia line. Yet again, just like with Quadro vs GeForce the difference gets fuzzy. Again, the difference is mainly in the drivers. GeForce and Quadro are meant for single person computers, or workstations. Tesla is for the data center. For servers. The hardware in a Tesla is not that different than a GeForce or Quadro. One really nice feature of Tesla, at least on windows, is that it can be run from a service. Out of the box, Quadro and GeForce cannot be run as a windows service. The program must be run from the GUI by the user. For most single user apps, this if fine. However, if I were building a cluster of 200 machines, I would really want to run my program as a service on each of them.

Choosing the Correct nVidia Family

Which did I choose? I chose a GeForce. Initially, I ruled out Tesla simply on price. Also, Tesla is not really commonly available as a standalone card. Typically Tesla is used by high-end OEM's. I then read GeForce is for gamers, and quickly ruled out GeForce as well. Then I settled on Quadro. I even placed an order for a Quadro 2000. Then I kept researching and realized I had made a mistake. One of the best pages I saw was this one.

http://www.videocardbenchmark.net/high_end_gpus.html

I saw there was something called a GeForce 580 that I could get for about what I had paid for the Quadro 2000. And remember that slow graphics card that I mentioned earlier, that I was replacing? The Quadro 2000 is only marginally faster! This just did not feel like a good use of money.

The main difference I could find was compatibility. Compatibility for something that does not exist yet! Encog CUDA, which is what I am about to program. I also looked at forum posts on the GPUGRID project. GPUGRID is like SETI@Home, except you donate GPU power for medical research.

http://www.gpugrid.net/

GPUGRID has been doing a great deal with GPU number crunching. Surly I could learn something about their preferences of video cards. And I did! Overwhelmingly their theme seems to be don't even deal with Quadro or Tesla, just get a GeForce 580, and call it a day. This is exactly what I did.

Conclusion

I will program the first generation of Encog for CUDA using gamer hardware. I will test and develop it on a GeForce 580. I will also use Amazon EC2 and run/test it on Amazon's GPU node which has a Tesla. I really believe this will give me a wide range of compatibility. It will also allow me to use the massive power built into some of the higher end, yet reasonably affordable, GeForce cards.

Comments

CBrauer's picture

Hi Jeff,

If I may, I would like to offer a constructive criticism. I am sad to see that you are spending your valuable time working on GPU programming.

I, like many others, have quad-core processors. In my case, I have a dual quad-core machine. I find that there are very few applications that use all eight processors. There is one notable exception, and that is the work of Michael Schmidt. Please download Eureqa/Formulize from:

http://formulize.nutonian.com/,

and run the default demo. If you use your performance monitor while Formulize is running, you will see that all your processors are running at 100%. This is the best Machine Learning application that I have ever seen.

Just a thought….

Charles

gbabint's picture

I work with some very difficult non-linear problems that simply cannot be solved with an 8 cores i7 in due time. I'm sure I'm not the only one, I don't even want to know how hard a protein folding problem is...

Jeff, I appreciate all your efforts a lot, and I can't wait to see Encog running on my tesla GPUs.

Keep it up!!

jeffheaton's picture

Sorry, coming into this thread late. Thanks! All comments are welcome.

One note on the scaling of quadcores. Encog does actually scale pretty well to high numbers of CPU codes. Here is an example of it running with 32 "cores" dual quadcore machine (16 real cores, but using 32 total due to hyperthreading). Most of the cores stayed in the 90-100% range during training, and the actual train time scaled down nicely from smaller boxes.

http://www.heatonresearch.com/wiki/Amazon_EC2_Encog_Performance

GPU is interesting because it somewhat takes CPU to the logical max, which is what happens if I have, say 512 cores. Which is the case for my GTX580. Now memory access becomes critical, because only so many of those cores can get to the memory at once. Reads get handle by cache reasonably well. But writes will kill performance, as the cache does nothing for you. Optimizing memory access on a CPU app is important, but not nearly as much so as on the GPU.

Also Jeff S, thanks for the link from nVidia, very good info.

bpeikes's picture

Jeff,
You note in your research that the GeForce cards cannot be run as a Windows service "out of the box". Does that mean that you have found some way to connect to GeForce cards from a Windows service? I've been looking around on the NVidia site for TCC drivers for the GeForce cards, but have found none.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer