2011年10月25日火曜日

2011年10月21日金曜日

GBASE: a scalable and general graph management system

GBASE: a scalable and general graph management system
http://dl.acm.org/citation.cfm?id=2020408.2020580&coll=DL&dl=ACM

Graphs appear in numerous applications including cyber-security, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose GBASE, a scalable and general graph management and mining system. The key novelties lie in 1) our storage and compression scheme for a parallel setting and 2) the carefully chosen graph operations and their efficient implementation. We designed and implemented an instance of GBASE using MapReduce/Hadoop. GBASE provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries. We ran numerous experiments on real graphs, spanning billions of nodes and edges, and we show that our proposed GBASE is indeed fast, scalable and nimble, with significant savings in space and time.

Open Street Map (OSM)

世界の道路ネットワークデータ Open Street Map
http://www.openstreetmap.org/

データの形式は OSM (XML) か、以下の PBF 形式で提供
http://wiki.openstreetmap.org/wiki/PBF_Format

研究室には、/nfs/data0/roadnetwork/ にダウンロードしています。PBF フォーマットで 9GB ぐらいあります。

2011年10月10日月曜日

Solving Large, Irregular Graph Problems using Adaptive Work-stealing

X10 のワークスティーリングフレームワーク XWS。Sun Niagara 上で BFS/DFS を X10+XWS で実行し、Cilk に対する優位性を示す。

http://gee.cs.oswego.edu/dl/papers/icpp08.pdf

Solving large, irregular graph problems efficiently is challenging. Current software systems and commoditymultiprocessors do not support fine-grained, irregular parallelism well. We present XWS, the X10 Work Stealing framework, an open-source runtime for the parallel programming language X10 and a library to be used directly by application writers. XWS extends the Cilk work-stealing framework with several features necessary to efficiently implement graph algorithms, viz., support for improperly nested procedures, global termination detection, and phased computation. We also present a strategy to adaptively control the granularity of parallel tasks in the work-stealing scheme, depending on the instantaneous size of the work queue. We compare the performance of the XWS implementations of spanning tree algorithms with that of the hand-written C and Cilk
implementations using various graph inputs. We show that XWS programs (written in Java) scale and exhibit comparable or better performance.

HipG: parallel processing of large-scale graphs

汎用グラフ処理システム HipG. 強連結成分分解を実装し、PBGL (Parallel Boost Graph Library) と hand-written の C/MPI を用いて速度的な優位性を示す。

http://dl.acm.org/citation.cfm?id=2007185

Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-defined pieces of sequential work on graph nodes. To make the user code high-level, the framework provides a unified interface to executing methods on local and non-local graph nodes and an abstraction of exclusive execution. The graph computations are managed by logical objects called synchronizers, which we used, for example, to implement distributed divide-and-conquer decomposition into strongly connected components. The code written in HipG is independent of a particular graph representation, to the point that the graph can be created on-the-fly, i.e. by the algorithm that computes on this graph, which we used to implement a distributed model checker. HipG programs are in general short and elegant; they achieve good portability, memory utilization, and performance.

大規模グラフストリーム処理系

graph crest のプロジェクトのスコープの一つとして、大規模グラフストリーム処理系があります。インクリメンタルな汎用グラフ処理モデルは、西井君・雁瀬君が追求してもらっていますし、今それぞれ案を出してもらっている graph database for stream computing のあるべき姿もこの範疇に入ってきます。上野君と Miyuru 君のStreamScale をベースにするかどうかも再度議論する必要がありますが、オープンソースとして公開可能な汎用的な「大規模グラフストリーム処理系」を構築する必要があります。また、雁瀬君、上野君、竹野君の動的二部グラフの関連解析処理も、この処理系の上で動く必要がありますし、その他のアプリケーションでも使用可能に成り得る機能は、処理系が提供すべきものです。また、当然、バッチ処理系とのインタフェースも必須となってくるでしょう。