2010年11月7日日曜日

[StreamGraph] Extended Library for Graph Processing

Standard Template Library for XXL Data Sets

http://www.springerlink.com/content/6drvfrkj7uu4hq0a/
We present a software library , that enables practice-oriented experimentation with huge data sets. is an implementation of the C++ standard template library STL for external memory computations. It supports parallel disks, overlapping between I/O and computation, and pipelining technique that can save more than half of the I/Os. has already been used for computing minimum spanning trees, connected components, breadth-first search decompositions, constructing suffix arrays, and computing social network analysis metrics.


Building a Parallel Pipelined External Memory Algorithm Library


Large and fast hard disks for little money have enabled the processing of huge amounts of data on a single machine. For this purpose, the well-established STXXL library provides a framework for external memory algorithms with an easy-to-use interface. However, the clock speed of processors
cannot keep up with the increasing bandwidth of parallel disks, making many algorithms actually compute-bound. To overcome this steadily worsening limitation, we exploit today’s multi-core processors with two new approaches. First, we parallelize the internal computation
of the encapsulated external memory algorithms by utilizing the MCSTL library. Second, we augment the unique pipelining feature of the STXXL, to enable automatic task parallelization.
We show using synthetic and practical use cases that the combination of both techniques increases performance
greatly.

PDF
http://algo2.iti.kit.edu/singler/publications/parallelstxxl-ipdps2009.pdf

0 件のコメント:

コメントを投稿