
| From: David Mason via talk <talk@gtalug.org> | The short answer is: Machine Learning (and other data-mining-like applications) A much LONGER answer: There has been a field of Computing on GPUs for perhaps a dozen years. GPUs have evolved into having a LOT of Floating Point units that can act simultaneously, mostly in lock-step. They are nasty to program: conventional high-level languages and programmers aren't very good at exploiting GPUs. NVidia's Cuda (dominant) and the industry standard OpenCL (struggling) are used to program the combination of the host CPU and the GPU. Generally, a set of subroutines is written to exploit a GPU and those subroutines get called by conventional programs. Examples of such a library: TensorFlow, PyTorch, OpenBLAS. The first two are for machine learning. Some challenges GPU programmers face: - GPUs cannot do everything that programmers are used to. A program using a GPU must be composed of a Host CPU program and a GPU program. (Some languages let you do the split within a single program, but there still is a split.) - GPU programming requires a lot effort designing how data gets shuffled in and out of the GPU's dedicated memory. Without care, the time eaten by this can easily overwhelm the time saved by using a GPU instead of just the host CPU. Like any performance problem, one needs to measure to get an accurate understanding. The result might easily suggest massive changes to a program. - Each GPU links its ALUs into fixed-size groups. Problems must be mapped onto these groups, even if that isn't natural. A typical size is 64 ALUs. Each ALU in a group is either executing the same instruction, or is idled. OpenCL and Cuda help the programmer create doubly-nested loops that map well onto this hardware. Lots of compute-intensive algorithms are not easy to break down into this structure. - GPUs are not very good at conventional control-flow. And it is different from what most programmers expect. For example, when an "if" is executed, all compute elements in a group are tied up, even if they are not active. Think how this applies to loops. - each GPU is kind of different, it is hard to program generically. This is made worse by the fact that Cuda, the most popular language, is proprietary to NVidia. Lots of politics here. - GPUs are not easily safe to share amongst multiple processes. This is slowly improving. - New GPUs are getting better, so one should perhaps revisit existing programs regularly. - GPU memories are not virtual. If you hit the limit of memory on a card, you've got to change your program. Worse: there is a three or more level hierarchy of fixed-size memories within the GPU that needs to be explicitly managed. - GPU software is oriented to performance. Compile times are long. Debugging is hard and different. Setting up the hardware and software for GPU computing is stupidly challenging. Alex gave a talk to GTALUG (video available) about his playing with this. Here's what I remember: - AMD is mostly open source but not part of most distros (why???). You need to use select distros plus out-of-distro software. Support for APUs (AMD processor chips with built-in GPUs) is still missing (dumb). - NVidia is closed source. Alex found it easier to get going. Still work. Still requires out-of-distro software. - He didn't try Intel. Ubiquitous but not popular for GPU computing since all units are integrated and thus limited in crunch. Intel, being behind, is the nicest player.