|
[ Publications ]
[ Research Opportunities ]
[ Partners & Supporters ]
[ Earlier Work ]
|
|
Matching Memory Access Patterns and Data Placement for NUMA Systems
|
| Zoltan Majo,
Thomas R. Gross,
Matching Memory Access Patterns and Data Placement for NUMA Systems, Proceedings of CGO '12, ACM, March 2012.
[CGO_2012.pdf]
|
| Many recent multicore multiprocessors are based on a
non-uniform memory architecture (NUMA). A mismatch between the data
access patterns of programs and the mapping of data to memory incurs a
high overhead, as remote accesses have higher latency and lower
throughput than local accesses. This paper reports on a limit study
that shows that many scientific loop-parallel programs include
multiple, mutually incompatible data access patterns, therefore these
programs encounter a high fraction of costly remote memory
accesses. Matching the data distribution of a program to the
individual data access patterns is possible, however it is difficult
to find a data distribution that matches all access patterns.
Directives as included in, e.g., OpenMP provide a way to distribute
the computation, but the induced data partitioning does not take into
account the placement of data into the processors' memory. To
alleviate this problem we describe a small set of language-level
primitives for memory allocation and loop scheduling. Using the
primitives together with simple program-level transformations
eliminates mutually incompatible access patterns from OpenMP-style
parallel programs. This result represents an improvement of up to
3.3X over the default setup, and the programs obtain a speedup of up
to 33.6X over single-core execution (19X on average) on a 4-processor
32-core machine.
|
|
[ Publications ]
[ Research Opportunities ]
[ Partners & Supporters ]
[ Earlier Work ]
|
|