2011年12月1日木曜日

Distributed Graph Analytics

http://dl.acm.org/citation.cfm?id=958940
Δ-stepping: a parallelizable shortest path algorithm

The single source shortest path problem for arbitrary directed graphs with n nodes, m edges and nonnegative edge weights can sequentially be solved using O(n ċ log n + m) operations. However, no work-efficient parallel algorithm is known that runs in sublinear time for arbitrary graphs. In this paper we present a rather simple algorithm for the single source shortest path problem. Our new algorithm, which we call Delta-stepping, can be implemented very efficiently in sequential and parallel setting for a large class of graphs. For random edge weights and arbitrary graphs with maximum node degree d, sequential Δ-stepping needs O(n + m + d ċ L) total average-case time, where L denotes the maximum shortest path weight from the source node s to any node reachable from s. For example, this means linear time on directed graphs with constant maximum degree. Our best parallel version for a PRAM takes O(d ċ L ċ log n + log2 n) time and O(n + m + d ċ L ċ log n) work on average. For random graphs, even O(log2 n) time and O(n + m) work on average can be achieved. We also discuss how the algorithm can be adapted to work with nonrandom edge weights and how it can be implemented on distributed memory machines. Experiments indicate that already a simple implementation of the algorithm achieves significant speedup on real machines.


Single-Source Shortest Paths with the Parallel Boost Graph Library
http://www.dis.uniroma1.it/~challenge9/papers/edmonds.pdf
The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributed-memory computation on large graphs. Developed with the Generic Programming
paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary
vertex and edge properties, and different communication media. In this paper, we describe the implementation of two parallel variants of Dijkstra’s single-source shortest paths algorithm in the Parallel BGL.We also provide an experimental evaluation of these implementations using synthetic and real-world benchmark graphs from the 9th DIMACS Implementation Challenge.


http://eecs.wsu.edu/~cs460/cs550/parallelShortestPath.pdf
A Parallel Shortest Path Algorithm Based on Graph-Partitioning and Iterative Correcting, 2010
In this paper, we focus on satisfying the actual demands of quickly finding the shortest paths over real-road networks in an intelligent transportation system. A parallel shortest path algorithm based on graph partitioning and iterative correcting is proposed. After evaluating the algorithm using three real road networks, we conclude that our graph- partitioning and iterative correcting based parallel algorithm has good performance. In addition, it achieves more than a 15-fold speedup on 16 processors in an IBM cluster over these real road networks.


http://scidok.sulb.uni-saarland.de/volltexte/2004/207/pdf/UlrichMeyer_ProfDrKurtMehlhorn.pdf
Design and Analysis of Sequential and Parallel Single–Source Shortest–Paths Algorithms, Ulrich Meyer

e study the performance of algorithms for the Single-Source Shortest-Paths (SSSP) problem on graphs with nodes and edges with nonnegative random weights. All previously known SSSP algorithms for directed graphs required superlinear time. We give the first SSSP algorithms that provably achieve linear average-case execution time on arbitrary directed graphs with random edge weights. For independent edge weights, the linear-time bound holds with high probability, too. Additionally, our result implies im- proved average-case bounds for the All-Pairs Shortest-Paths (APSP) problem on sparse graphs, and it yields the first theoretical average-case analysis for the “Approximate Bucket Implementation” of Dijkstra’s SSSP algorithm (ABI–Dijkstra). Furthermore, we give con- structive proofs for the existence of graph classes with random edge weights on which ABI–Dijkstra and several other well-known SSSP algorithms require superlinear average- case time. Besides the classical sequential (single processor) model of computation we also consider parallel computing: we give the currently fastest average-case linear-work parallel SSSP algorithms for large graph classes with random edge weights, e.g., sparse random graphs and graphs modeling the WWW, telephone calls or social networks.


http://www.cse.psu.edu/~madduri/papers/ThorupSSSP-MTAAP07.pdf


http://www.sandia.gov/~sjplimp/papers/pc11.pdf
MapReduce in MPI for Large-scale Graph Algorithms, 2010

We describe a parallel library written with message-passing (MPI) calls that allows algo- rithms to be expressed in the MapReduce paradigm. This means the calling program does not need to include explicit parallel code, but instead provides “map” and “reduce” functions that operate independently on elements of a data set distributed across processors. The library performs needed data movement between processors. We describe how typical MapReduce functionality can be implemented in an MPI context, and also in an out-of-core manner for data sets that do not fit within the aggregate memory of a parallel machine. Our motivation for creating this library was to enable graph algorithms to be written as MapReduce opera- tions, allowing processing of terabyte-scale data sets on traditional MPI-based clusters. We outline MapReduce versions of several such algorithms: vertex ranking via PageRank, triangle finding, connected component identification, Luby’s algorithm for maximally independent sets, and single-source shortest-path calculation. To test the algorithms on arbitrarily large artifi- cial graphs we generate randomized R-MAT matrices in parallel; a MapReduce version of this operation is also described. Performance and scalability results for the various algorithms are presented for varying size graphs on a distributed-memory cluster. For some cases, we com- pare the results with non-MapReduce algorithms, different machines, and different MapReduce software, namely Hadoop. Our open-source library is written in C++, is callable from C++, C, Fortran, or scripting languages such as Python, and can run on any parallel platform that supports MPI.


Apache HAMA http://blogs.apache.org/hama/


http://homepages.dcc.ufmg.br/~tmacam/publications/macambira2010middleware.pdf
A middleware for parallel processing of large graphs, 2010
With the increasing “data deluge” scientists face today, the analysis and processing of large datasets of structured data is a daring task. Among such data, large graphs are gaining particular importance with the growing interest on social networks and other complex networks. Given the dimensions considered, parallel pro- cessing is essential. However, users are generally not experts in writing parallel code to handle such structures. In this work we present Rendero, a middleware that makes it possible to easily de- scribe graph algorithms in a form adequate for parallel execution. The system is based on the Bulk-Synchronous programming model and offers a vertex-based abstraction. Our current implementation offers good speed-up and scale-up results for large graphs ranging from tens of thousands to millions of vertices and edges in some cases.


http://www.cslab.ntua.gr/~anastop/files/papers/mtaap09dijkstra.pdf
Early Experiences on Accelerating Dijkstra’s Algorithm Using Transactional Memory

In this paper we use Dijkstra’s algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several par- allelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurrent accesses to shared data and compare its perfor- mance with straightforward parallel versions of the algorithm based on traditional synchronization primitives. To increase the granularity of parallelism and avoid excessive synchro- nization, we combine TM with Helper Threading (HT). Our simulation results demonstrate that the straightforward par- allelization of Dijkstra’s algorithm with traditional locks and barriers has, as expected, disappointing performance. On the other hand, TM by itself is able to provide some performance improvement in several cases, while the version based on TM and HT exhibits a significant performance improvement that can reach up to a speedup of 1.46.

0 件のコメント:

コメントを投稿