Stanford PPL

Research Spotlight

The following paper was presented at Micro 2014:

47th International Symposium on Microarchitecture
Cambridge, United Kingdom
December 13-17, 2014

Locality-Aware Mapping of Nested Parallel Patterns on GPUs
HyoukJoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Tiark Rompf, and Kunle Olukotun
Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which is common in nontrivial applications. To address this issue, we present a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs. The analysis maps nested patterns onto a logical multidimensional domain and parameterizes the block size and degree of parallelism in each dimension. We then add GPU-specific hard and soft constraints to prune the space of possible mappings and select the best mapping. We also perform multiple compiler optimizations that are guided by the mapping to avoid dynamic memory allocations and automatically utilize shared memory within GPU kernels. We compare the performance of our automatically selected mappings to hand-optimized implementations on multiple benchmarks and show that the average performance gap on 7 out of 8 benchmarks is 24%. Furthermore, our mapping strategy outperforms simple 1D mappings and existing 2D mappings by up to 28.6x and 9.6x respectively.
Paper PDF

Who We Are

The Stanford Pervasive Parallelism Lab is a collaboration of many leading Stanford computer scientists and electrical engineers for the purpose of developing the parallel computing platform for the year 2020. We are supported by a completely open industrial affiliates program.

What We Do

The core of our research agenda is to allow the domain expert to develop parallel software without becoming an expert in parallel programming. Our approach is to use a layered system based on DSLs, a common parallel compiler and runtime infrastructure, and an underlying architecture that provides efficient mechanisms for communication, synchronization, and performance monitoring.

Why We Do It

New heterogeneous architectures continue to provide increases in achievable performance, but programming these devices to reach maximum performance levels is not straightforward. The goal of the PPL is to make heterogeneous parallelism accessible to average software developers through domain-specific languages (DSLs) so that it can be freely used in all computationally demanding applications.

Research Spotlight

The following paper was presented at Micro 2014:

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Who We Are

What We Do

Why We Do It

Member Companies