2009年11月4日水曜日

複数カーネル実行 on GPU

タスク並列を現在の CUDA でどのように効率的に行うかを論じた論文です。森田君の研究テーマに非常に関連しています。

PACT 2009 で行われたワークショップで発表された論文
First Workshop on Programming Models for Emerging Architectures (PMEA)

Marisabel Guevara, Enabling Task Parallelism in the CUDA Scheduler
http://www.cs.virginia.edu/~skadron/Papers/guevera_pmea09.pdf

以下、概要。
General purpose computing on graphics processing units (GPUs) introduces the challenge of scheduling task parallel workloads on devices designed for data parallel applications. This paper analyzes the problem, and presents a method for merging workloads such that they can be run concurrently on an NVIDIA GPU. Some kernels do not fully utilize the processing power of the GPU, and thus overall throughput will increase when running two kernels alongside one another. Our approach scans a queue of independent CUDA kernels (i.e., code segments that will run on the GPU), across processes or from within the same process, and evaluates whether merging the kernels would increase both throughput on the device and overall efficiency of the computing platform. Using kernels from microbenchmarks and a Nearest Neighbor application we show that throughput is increased in all cases where the GPU would have been underused by a single kernel. An exception is the case of memory-bound kernels, seen in a Nearest Neighbor application, for which the execution time still outperforms the same kernels executed serially by 12-20%. It can also be benefitial to choose a merged kernel that over-extends the GPU resources, as we show the worst case to be bounded by executing the kernels serially. This paper also provides an analysis of the latency penalty that can occur when two kernels with varying completion times are merged


また、NVida の Forum には、複数カーネル実行の議論が書かれています。
http://forums.nvidia.com/index.php?showtopic=84740

0 件のコメント:

コメントを投稿