Research

 

CSI ]   [ ETH ]


Lab Manager ]

LST Home ]     [ People ]     [ Research ]     [ Teaching ]
[ Publications ]     [ Research Opportunities ]     [ Partners & Supporters ]     [ Earlier Work ]

Matching Memory Access Patterns and Data Placement for NUMA Systems

Zoltan Majo,  Thomas R. Gross,  Matching Memory Access Patterns and Data Placement for NUMA Systems, Proceedings of CGO '12, ACM, March 2012. [CGO_2012.pdf]
Many recent multicore multiprocessors are based on a non-uniform memory architecture (NUMA). A mismatch between the data access patterns of programs and the mapping of data to memory incurs a high overhead, as remote accesses have higher latency and lower throughput than local accesses. This paper reports on a limit study that shows that many scientific loop-parallel programs include multiple, mutually incompatible data access patterns, therefore these programs encounter a high fraction of costly remote memory accesses. Matching the data distribution of a program to the individual data access patterns is possible, however it is difficult to find a data distribution that matches all access patterns. Directives as included in, e.g., OpenMP provide a way to distribute the computation, but the induced data partitioning does not take into account the placement of data into the processors' memory. To alleviate this problem we describe a small set of language-level primitives for memory allocation and loop scheduling. Using the primitives together with simple program-level transformations eliminates mutually incompatible access patterns from OpenMP-style parallel programs. This result represents an improvement of up to 3.3X over the default setup, and the programs obtain a speedup of up to 33.6X over single-core execution (19X on average) on a 4-processor 32-core machine.
[ Publications ]     [ Research Opportunities ]     [ Partners & Supporters ]     [ Earlier Work ]