Monday, August 1, 2011
Edinburgh University, James Clerk Maxwell Building (JCMB), Kings Buildings, Mayfield Road

Rob Farber - GP-GPUs for high-performance computing

Monday 1st August 2011, 2pm--4pm

Abstract: GPU computing has made teraflop supercomputing available to anyone with a computer. Algorithm, application and library developers need to be aware of and consider the potential in GPU computing and how it now extends into conventional multi-core x86 computing. NVIDIA introduced CUDA for GPU computing in February 2007. The rate of adoption has been remarkable as have been the improvements in application performance (10-times to 1000-times) for a variety of problem domains. NVIDA estimates that over a 1/3 billion CUDA-enabled GPUs have been sold world-wide. CUDA is now taught at 454 institutions worldwide.

This talk will discuss how simple it is to express problems in CUDA and particularly with the Thrust API. Results for a generic machine-learning data mining problem on a single GPU show an 85-times speedup over a modern quad-core Xeon processor (341-times single core performance) for a PCA/NLPCA problems using Nelder-Mead. The parallel mapping developed by Farber at Los Alamos is generally applicable to a range of optimization problems (SVM, MDS, EM, ICS, ...) and optimization methods (Powell, Levenberg-Marquardt, Conjugate Gradient, ...). Scaling results will demonstrate that this same mapping, and CUDA implementation exhibits near linear scaling to 500 GPUs. A CPU version scales to over 60,000 processing cores and delivers over 1/3 of a petaflop. Speedups using CUDA in a number of other problems domains plus links to downloadable source code will be provided. Finally, recent developments make CUDA a potential development language like Java, FORTRAN, and C++ for all application development including those applications intended for only x86 architecture deployments.

Rob Farber is currently a visiting scientist at the NVIDIA Center for Excellence at the Irish Center for High-End Computing (ICHEC). He has worked as a scientist in massively parallel computing at several U.S. national laboratories (LANL, Berkeley, and PNNL), external faculty at the Santa Fe Institute, and as co-founder of several successful startups. He is the author of a popular CUDA tutorial series on the Doctor Dobb's Journal website, an OpenCL tutorial series on The Code Project, as well as publishing in peer-reviewed journals and other venues such as Scientific Computing. Rob is currently writing a book to teach students how to program and use CUDA.