鈴村グループ (旧・鈴村研究室) : exagraph

ラベル exagraph の投稿を表示しています。すべての投稿を表示

2011年12月19日月曜日

The Fourth Paradigm: Data-Intensive Scientific Discovery

昨今注目されている書籍
http://research.microsoft.com/en-us/collaboration/fourthparadigm/

2011年12月17日土曜日

2部グラフ関連性解析の高速化およびシステムプロファイリング

- GPGPU による高速化
- システムプロファイリング：どこにどのぐらい時間を費やしているのか？
- 下位の基盤をSystem S 非依存にし、TSUBAME で大規模実行できるようにする

2011年12月3日土曜日

http://www.csie.ntu.edu.tw/~cjlin/papers/psc08.pdf
Parallel Spectral Clustering in Distributed Systems

Spectral clustering algorithms have been shown to be more effective in ﬁnding clusters than some traditional
algorithms such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and
computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two
representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix
with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors
and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through
an empirical study on a document data set of 193; 844 instances and a photo data set of 2; 121; 863, we show that
our parallel algorithm can effectively handle large problems.

Parallelization of spectral clustering algorithm on multi-core processors and GPGPU
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4625449

Spectral clustering is a widely-used algorithm in the field of information retrieval, data mining, machine learning and many others. It can help to cluster a large number of data into several categories without requiring any additional information about the dataset or the categories, so that people can find information by categories easily. In this paper, we parallelize the algorithm proposed by Andrew Y. Ng, Michael I. Jordan and Yair Weiss. We provide two versions of implementation: one is parallelized in OpenMP; the other is programmed in the NVIDIA CUDA (compute unified device architecture), which is the environment provided by NVIDIA to program on its CUDA-Enabled GPGPUs (general-purpose graphic processing unit). We can achieve about three times speedup in OpenMP and around ten times speedup using CUDA in our experiments.

On a strategy for spectral clustering with parallel computation
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science

pectral Clustering is one of the most important method based on space dimension reduction used in Pattern Recognition. This method consists in selecting dominant eigenvectors of a matrix called affinity matrix in order to define a low-dimensional data space in which data points are easy to cluster. By exploiting properties of Spectral Clustering, we propose a method where we apply independently the algorithm on particular subdomains and gather the results to determine a global partition. Additionally, with a criterion for determining the number of clusters, the domain decomposition strategy for parallel spectral clustering is robust and efficient.

2011年12月1日木曜日

Graph Stream Mining

http://www.utdallas.edu/~hamlen/parveen-passat11.pdf

Insider Threat Detection using Stream Mining and Graph Mining

Evidence of malicious insider activity is often
buried within large data streams, such as system logs
accumulated over months or years. Ensemble-based stream
mining leverages multiple classiﬁcation models to achieve
highly accurate anomaly detection in such streams even
when the stream is unbounded, evolving, and unlabeled.
This makes the approach effective for identifying insider threats who attempt to conceal their activities by
varying their behaviors over time. This paper applies
ensemble-based stream mining, unsupervised learning, and
graph-based anomaly detection to the problem of insider
threat detection, demonstrating that the ensemble-based
approach is signiﬁcantly more effective than traditional
single-model methods.
Index Terms—anomaly d

Gremlin: グラフ解析特化型言語

https://github.com/tinkerpop/gremlin/wiki
http://www.infoq.com/jp/news/2010/01/Gremlin
http://www.youtube.com/watch?v=5wpTtEBK4-E

グラフ応用 - Cyber Security

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5482734&tag=1

Measure Large Scale Network Security Using Adjacency Matrix Attack Graphs

An Attack Graph capable of disclosing causal relationships between multiple vulnerabilities has become a desirable tool for administrators to analyze and locate potential risks to protect critical networked resources against internal or external multi-step attacks. However, probabilistic security metric computations, using currently applied attack graphs, have complexity problems due to their scale. It is hard or even impossible for current attack graphs to be applied to large scale networks. This paper proposes a novel approach that combines the advantages of exploit-dependency attack graphs and adjacency matrices, which results in quadratic complexity. We first give a motivating example to introduce the approach. We then define the adjacency matrix attack graphs. We show that computing probabilistic cumulative scores by means of adjacency matrix attack graphs is efficient and readily scalable.

http://research.microsoft.com/pubs/79413/botgraph.pdf
BotGraph: Large Scale Spamming Botnet Detection

Network security applications often require analyzing
huge volumes of data to identify abnormal patterns or
activities. The emergence of cloud-computing models
opens up new opportunities to address this challenge by
leveraging the power of parallel computing.
In this paper, we design and implement a novel system
called BotGraph to detect a new type of botnet spamming
attacks targeting major Web email providers. Bot-
Graph uncovers the correlations among botnet activities
by constructing large user-user graphs and looking for
tightly connected subgraph components. This enables us
to identify stealthy botnet users that are hard to detect
when viewed in isolation. To deal with the huge data
volume, we implement BotGraph as a distributed application
on a computer cluster, and explore a number of
performance optimization techniques. Applying it to two
months of Hotmail log containing over 500 million users,
BotGraph successfully identified over 26 million botnetcreated
user accounts with a low false positive rate. The
running time of constructing and analyzing a 220GB Hotmail
log is around 1.5 hours with 240 machines. We believe
both our graph-based approach and our implementations
are generally applicable to a wide class of security
applications for analyzing large datasets.

グラフ応用

http://www.indiana.edu/~cortex/networks_nrn.pdf

Complex brain networks: graph theoretical analysis of structural and functional systems

Recent developments in the quantitative analysis of complex networks, based
largely on graph theory, have been rapidly translated to studies of brain network organization.
The brain’s structural and functional systems have features of complex networks — such as
small-world topology, highly connected hubs and modularity — both at the whole-brain
scale of human neuroimaging and at a cellular scale in non-human animals. In this article, we
review studies investigating complex brain networks in diverse experimental modalities
(including structural and functional MRI, diffusion tensor imaging, magnetoencephalography
and electroencephalography in humans) and provide an accessible introduction to the
basic principles of graph theory. We also highlight some of the technical challenges and key
questions to be addressed by future developments in this rapidly moving field.

Madduri's work

http://www.cc.gatech.edu/~bader/papers/ShortestPaths.html

Parallel Shortest Path Algorithms for Solving Large-Scale Instances

We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on largescale graphs using the Delta-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel computer. The MTA-2 is a high-end shared memory system offering two unique features that aid the efficient parallel implementation of irregular algorithms: the ability to exploit fine-grained parallelism, and low-overhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with competitive sequential algorithms, for low-diameter sparse graphs. For instance, Delta-stepping on a directed scale-free graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA-2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a shortest path problem on realistic graph instances in the order of billions of vertices and edges.

Distributed Graph Analytics

http://dl.acm.org/citation.cfm?id=958940
Δ-stepping: a parallelizable shortest path algorithm

The single source shortest path problem for arbitrary directed graphs with n nodes, m edges and nonnegative edge weights can sequentially be solved using O(n ċ log n + m) operations. However, no work-efficient parallel algorithm is known that runs in sublinear time for arbitrary graphs. In this paper we present a rather simple algorithm for the single source shortest path problem. Our new algorithm, which we call Delta-stepping, can be implemented very efficiently in sequential and parallel setting for a large class of graphs. For random edge weights and arbitrary graphs with maximum node degree d, sequential Δ-stepping needs O(n + m + d ċ L) total average-case time, where L denotes the maximum shortest path weight from the source node s to any node reachable from s. For example, this means linear time on directed graphs with constant maximum degree. Our best parallel version for a PRAM takes O(d ċ L ċ log n + log2 n) time and O(n + m + d ċ L ċ log n) work on average. For random graphs, even O(log2 n) time and O(n + m) work on average can be achieved. We also discuss how the algorithm can be adapted to work with nonrandom edge weights and how it can be implemented on distributed memory machines. Experiments indicate that already a simple implementation of the algorithm achieves significant speedup on real machines.

Single-Source Shortest Paths with the Parallel Boost Graph Library
http://www.dis.uniroma1.it/~challenge9/papers/edmonds.pdf
The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributed-memory computation on large graphs. Developed with the Generic Programming
paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary
vertex and edge properties, and diﬀerent communication media. In this paper, we describe the implementation of two parallel variants of Dijkstra’s single-source shortest paths algorithm in the Parallel BGL.We also provide an experimental evaluation of these implementations using synthetic and real-world benchmark graphs from the 9th DIMACS Implementation Challenge.

http://eecs.wsu.edu/~cs460/cs550/parallelShortestPath.pdf
A Parallel Shortest Path Algorithm Based on Graph-Partitioning and Iterative Correcting, 2010
In this paper, we focus on satisfying the actual demands of quickly finding the shortest paths over real-road networks in an intelligent transportation system. A parallel shortest path algorithm based on graph partitioning and iterative correcting is proposed. After evaluating the algorithm using three real road networks, we conclude that our graph- partitioning and iterative correcting based parallel algorithm has good performance. In addition, it achieves more than a 15-fold speedup on 16 processors in an IBM cluster over these real road networks.

http://scidok.sulb.uni-saarland.de/volltexte/2004/207/pdf/UlrichMeyer_ProfDrKurtMehlhorn.pdf
Design and Analysis of Sequential and Parallel Single–Source Shortest–Paths Algorithms, Ulrich Meyer

e study the performance of algorithms for the Single-Source Shortest-Paths (SSSP) problem on graphs with nodes and edges with nonnegative random weights. All previously known SSSP algorithms for directed graphs required superlinear time. We give the first SSSP algorithms that provably achieve linear average-case execution time on arbitrary directed graphs with random edge weights. For independent edge weights, the linear-time bound holds with high probability, too. Additionally, our result implies im- proved average-case bounds for the All-Pairs Shortest-Paths (APSP) problem on sparse graphs, and it yields the first theoretical average-case analysis for the “Approximate Bucket Implementation” of Dijkstra’s SSSP algorithm (ABI–Dijkstra). Furthermore, we give con- structive proofs for the existence of graph classes with random edge weights on which ABI–Dijkstra and several other well-known SSSP algorithms require superlinear average- case time. Besides the classical sequential (single processor) model of computation we also consider parallel computing: we give the currently fastest average-case linear-work parallel SSSP algorithms for large graph classes with random edge weights, e.g., sparse random graphs and graphs modeling the WWW, telephone calls or social networks.

http://www.cse.psu.edu/~madduri/papers/ThorupSSSP-MTAAP07.pdf

http://www.sandia.gov/~sjplimp/papers/pc11.pdf
MapReduce in MPI for Large-scale Graph Algorithms, 2010

We describe a parallel library written with message-passing (MPI) calls that allows algo- rithms to be expressed in the MapReduce paradigm. This means the calling program does not need to include explicit parallel code, but instead provides “map” and “reduce” functions that operate independently on elements of a data set distributed across processors. The library performs needed data movement between processors. We describe how typical MapReduce functionality can be implemented in an MPI context, and also in an out-of-core manner for data sets that do not fit within the aggregate memory of a parallel machine. Our motivation for creating this library was to enable graph algorithms to be written as MapReduce opera- tions, allowing processing of terabyte-scale data sets on traditional MPI-based clusters. We outline MapReduce versions of several such algorithms: vertex ranking via PageRank, triangle finding, connected component identification, Luby’s algorithm for maximally independent sets, and single-source shortest-path calculation. To test the algorithms on arbitrarily large artifi- cial graphs we generate randomized R-MAT matrices in parallel; a MapReduce version of this operation is also described. Performance and scalability results for the various algorithms are presented for varying size graphs on a distributed-memory cluster. For some cases, we com- pare the results with non-MapReduce algorithms, different machines, and different MapReduce software, namely Hadoop. Our open-source library is written in C++, is callable from C++, C, Fortran, or scripting languages such as Python, and can run on any parallel platform that supports MPI.

Apache HAMA http://blogs.apache.org/hama/

http://homepages.dcc.ufmg.br/~tmacam/publications/macambira2010middleware.pdf
A middleware for parallel processing of large graphs, 2010
With the increasing “data deluge” scientists face today, the analysis and processing of large datasets of structured data is a daring task. Among such data, large graphs are gaining particular importance with the growing interest on social networks and other complex networks. Given the dimensions considered, parallel pro- cessing is essential. However, users are generally not experts in writing parallel code to handle such structures. In this work we present Rendero, a middleware that makes it possible to easily de- scribe graph algorithms in a form adequate for parallel execution. The system is based on the Bulk-Synchronous programming model and offers a vertex-based abstraction. Our current implementation offers good speed-up and scale-up results for large graphs ranging from tens of thousands to millions of vertices and edges in some cases.

http://www.cslab.ntua.gr/~anastop/files/papers/mtaap09dijkstra.pdf
Early Experiences on Accelerating Dijkstra’s Algorithm Using Transactional Memory

In this paper we use Dijkstra’s algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several par- allelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurrent accesses to shared data and compare its perfor- mance with straightforward parallel versions of the algorithm based on traditional synchronization primitives. To increase the granularity of parallelism and avoid excessive synchro- nization, we combine TM with Helper Threading (HT). Our simulation results demonstrate that the straightforward par- allelization of Dijkstra’s algorithm with traditional locks and barriers has, as expected, disappointing performance. On the other hand, TM by itself is able to provide some performance improvement in several cases, while the version based on TM and HT exhibits a significant performance improvement that can reach up to a speedup of 1.46.

GPU-based SSSP

http://www.ijcaonline.org/volume10/number10/pxc3871992.pdf
Performance Improvement in Large Graph Algorithms on GPU using CUDA: An Overview, 2010

The basic operations on the graphs with millions of vertices are common in various applications. To have faster execution of such operations is very essential to reduce overall computation time. Today’s Graphics processing units (GPUs) have high computation power and low price. This device can be treated as an array of Single Instruction Multiple Data (SIMD) processors using CUDA software interface by Nvidia. Massively Multithreaded architecture of a CUDA device makes various threads to run in parallel and hence making optimum use of available computation power of GPU. In case of graph algorithms, vertices of the graphs are processed in parallel by mapping them to various threads on device. By making thousands of threads to run in parallel, computation time required for these algorithms is drastically decreased as compared to their CPU implementation.
We studied different parallel algorithms for Breadth first search, all pairs shortest path that are carried out on GPU using CUDA and make their comparative study with respect to execution time, data structure used, input data etc. In the paper, we presented overview of various parallel methods carried out on GPU using its multithreaded architecture for BFS, APSP by various authors.

http://ppl.stanford.edu/papers/ppopp070a-hong.pdf
Accelerating CUDA Graph Algorithms at Maximum Warp, 2007
Graphs are powerful data representations favored in many compu- tational domains. Modern GPUs have recently shown promising re- sults in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this study, we first observe that the poor performance is caused by work imbal- ance and is an artifact of a discrepancy between the GPU program- ming model and the underlying GPU architecture. We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users. Our method signif- icantly improves the performance of applications with heavily im- balanced workloads, and enables trade-offs between workload im- balance and ALU underutilization for fine-tuning the performance.
Our evaluation reveals that our method exhibits up to 9x speedup over previous GPU algorithms and 12x over single thread CPU execution on irregular graphs. When properly configured, it also yields up to 30% improvement over previous GPU algorithms on regular graphs. In addition to performance gains on graph algo- rithms, our programming method achieves 1.3x to 15.1x speedup on a set of GPU benchmark applications. Our study also confirms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.

http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F6067545%2F6075092%2F06075214.pdf%3Farnumber%3D6075214&authDecision=-203
A modified parallel approach to Single Source Shortest Path Problem for massively dense graphs using CUDA, 2011
Today's Graphics Processing Units (GPUs) possess enormous computation power with highly parallel and multithreaded architecture, and the most attractive feature being their low costs. NVIDIA's CUDA provides an interface to the developers to use the GPUs for General Purpose Parallel Computing. In this paper, we present a modified algorithm of Single Source Shortest Path Problem on GPU using CUDA. First, we modify the standard BELLMAN-FORD algorithm to remove its drawbacks and make it suitable for parallel implementation, and then implement it using CUDA. For dense graphs, our Algorithm gains a speedup of 10×–12× over the previously proposed parallel algorithm. Our Algorithm also accept graphs with negative weighted edges and can detect any reachable Negative Weighted Cycle, which widens its scope to accept generalized problems.

http://www.springerlink.com/content/a254839r565r176p/
CUDA Solutions for the SSSP Problem
We present several algorithms that solve the single-source shortest-path problem using CUDA. We have run them on a database, composed of hundreds of large graphs represented by adjacency lists and adjacency matrices, achieving high speedups regarding a CPU implementation based on Fibonacci heaps. Concerning correctness, we outline why our solutions work, and show that a previous approach [10] is incorrect.

http://ldc.usb.ve/~vtheok/cursos/ci6323/pdf/lecturas/Accelerating%20Large%20Graph%20Algorithms%20on%20the%20GPU%20Using%20CUDA.pdf
Accelerating Large Graph Algorithms on the GPU Using CUDA, Pawan Harish and P.J. Narayanan,

Largegraphsinvolvingmillionsofverticesarecommoninmanyprac- tical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA pro- gramming model. We present a few fundamental algorithms – including breadth first search, single source shortest path, and all-pairs shortest path – using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architec- ture. GPUs have great potential as high-performance co-processors.

----

http://sc.chat-shuffle.net/paper/uid:110006828689
CUDAによる全点対最短経路問題の高速化(アプリケーション高速化,「ハイパフォーマンスコンピューティングとアーキテクチャの評価」に関する北海道ワークショップ(HOKKE-2008))
本論文では全点対最短経路(APSP:All-Pairs Shortest Path)問題をGPU(Graphics Processing Unit)を用いて高速化した結果を述べる.提案手法は,GPUで動作するプログラムをGPU向けの開発環境CUDA(Compute Unified Device Architecture)を用いて記述する.アルゴリズムには単一始点最短経路を繰り返し求める手法(SSSP反復法)を用いる.問題全体での逐次処理を減らしてより高い速度向上を得るために,1っのSSSP問題を1つのタスクとし,それらのタスクを並列処理する.さらに,共有メモリを用いてタスク間でデータを共有し,グローバルメモリの参照を削減する.結果,既存手法よりも3.5〜18倍高速であった.また,SSSP反復法の性能がグラフの特性に依存して変動することを示す.
http://hagi-www.ics.es.osaka-u.ac.jp/research/papers/200812_t-okuyam_ispa.pdf

2011年11月20日日曜日

Graphity

http://architects.dzone.com/articles/graphity-how-retrieve

2011年11月19日土曜日

[Graph500] Graph500 on Amazon EC2

Graph500 on Amazon EC2 に関して

- Amazon EC2 のクラスタインスタンスの仕様
Amazon EC2 supports Cluster instance"
Xeon X5570 (2.95GHz), 23 GB of RAM, 1690 GB of local
storage, and 10 Gbps network, all for $1.60 per hour  recently GPU (Tesla Fermi) is available
(2010年後半の仕様なので変化があるはず）

- 研究内容
- 仮想化環境において Graph500 を実行し、bare metal の環境との性能特性を行う
- ベンチマークを完遂させることを目的としない。BFS を何回か、validation は実行しない。
- Scale は 1ノードにつき Scale 26でいけるので、TSUBAME で取った性能プロファイルと比較できる
- マイクロベンチマークを実行し、ネットワークトポロジー予測を行う（関連研究があるはずなので調査）
- CPU 課金が１時間単位なのでノード数を少なくできるか？
- 予測トポロジーに応じて、MPI のランク最適配置を計算し、性能を向上させる
- 128ノードを用いて12時間走らせて約20万円
- 上記研究内容をもとに、Amazon が Graph500 にサブミッションするかどうかは別の話し
- Target Conference: IEEE Cloud 2012 締め切りは 2012年1月後半

[参考文献]
http://www.stratosphere.eu/files/TopologyInferenceEnd2End_11.pdf

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5948654

http://dl.acm.org/citation.cfm?id=1833691
The impact of virtualization on network performance of amazon EC2 data center, InfoCOM 2010

Cloud computing services allow users to lease computing resources from large scale data centers operated by service providers. Using cloud services, users can deploy a wide variety of applications dynamically and on-demand. Most cloud service providers use machine virtualization to provide flexible and costeffective resource sharing. However, few studies have investigated the impact of machine virtualization in the cloud on networking performance.

In this paper, we present a measurement study to characterize the impact of virtualization on the networking performance of the Amazon Elastic Cloud Computing (EC2) data center. We measure the processor sharing, packet delay, TCP/UDP throughput and packet loss among Amazon EC2 virtual machines. Our results show that even though the data center network is lightly utilized, virtualization can still cause significant throughput instability and abnormal delay variations. We discuss the implications of our findings on several classes of applications.

Meeting w/ Andy Yoo (LLNL)

Here are the minutes of the meeting we had with Andy,

-------------------------------------

There is no MatLab code or MPI code. Miyuru need to create an MPI implementation of the graph generator.

The reason why current methods (e.g. RMAT) not good is there is a low level of clustering coefficient. We need a generator that creates data set with very high clustering coefficient of data points.

The proposed method starts with Quasi Cliques. There are non-zero elements in the Matrix.

Miyuru should read the look for Chung-Lu model. There is a paper published around 2001-2002.

The basic idea is given a degree distribution the implementation should extract the model out of it.
So the main activities are,

- Create a MatLab model of the generator
- Port MatLab code to MPI
- Do experiments of the graph generator with Tsubame

2011年11月15日火曜日

Andy Yoo's Talk at Google

http://youtu.be/PBLgUBGWcz8

2011年10月10日月曜日

Solving Large, Irregular Graph Problems using Adaptive Work-stealing

X10 のワークスティーリングフレームワーク XWS。Sun Niagara 上で BFS/DFS を X10+XWS で実行し、Cilk に対する優位性を示す。

http://gee.cs.oswego.edu/dl/papers/icpp08.pdf

Solving large, irregular graph problems efﬁciently is challenging. Current software systems and commoditymultiprocessors do not support ﬁne-grained, irregular parallelism well. We present XWS, the X10 Work Stealing framework, an open-source runtime for the parallel programming language X10 and a library to be used directly by application writers. XWS extends the Cilk work-stealing framework with several features necessary to efﬁciently implement graph algorithms, viz., support for improperly nested procedures, global termination detection, and phased computation. We also present a strategy to adaptively control the granularity of parallel tasks in the work-stealing scheme, depending on the instantaneous size of the work queue. We compare the performance of the XWS implementations of spanning tree algorithms with that of the hand-written C and Cilk
implementations using various graph inputs. We show that XWS programs (written in Java) scale and exhibit comparable or better performance.

HipG: parallel processing of large-scale graphs

汎用グラフ処理システム HipG. 強連結成分分解を実装し、PBGL (Parallel Boost Graph Library) と hand-written の C/MPI を用いて速度的な優位性を示す。

http://dl.acm.org/citation.cfm?id=2007185

Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-defined pieces of sequential work on graph nodes. To make the user code high-level, the framework provides a unified interface to executing methods on local and non-local graph nodes and an abstraction of exclusive execution. The graph computations are managed by logical objects called synchronizers, which we used, for example, to implement distributed divide-and-conquer decomposition into strongly connected components. The code written in HipG is independent of a particular graph representation, to the point that the graph can be created on-the-fly, i.e. by the algorithm that computes on this graph, which we used to implement a distributed model checker. HipG programs are in general short and elegant; they achieve good portability, memory utilization, and performance.

2011年8月1日月曜日

Bagel: Pregel のオープンソース実装

汎用グラフ処理モデル Pregel オープンソース実装が出ているようです。グラフ関係者は要チェック。http://d.hatena.ne.jp/smly/20110730/1312022963
https://github.com/mesos/spark/pull/48

2011年7月13日水曜日

Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory

Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory, SC2010

2011年6月21日火曜日

[Graph500] 最適化に向けて

Network Coding や、Stream Algorithms の Lossy Count による最適化。
如何にメモリに載り切らない大規模グラフを扱うか。スケールフリー性を利用し、記憶装置の各階層に記憶,などなど。

2011年6月20日月曜日

[Graph500] 90%

IISWC への論文投稿に向けて、土日のほとんどの時間、書き続け、完成度は90%ぐらいになった。今週の平日は来週からの出張に向けていろいろごたごたしているので、ペースは下がるが、提出できそうだ。

登録: 投稿 (Atom)

鈴村グループ (旧・鈴村研究室)

2011年12月19日月曜日

The Fourth Paradigm: Data-Intensive Scientific Discovery

2011年12月17日土曜日

2部グラフ関連性解析の高速化およびシステムプロファイリング

2011年12月3日土曜日

Spectral Clustering

2011年12月1日木曜日

Graph Stream Mining

Gremlin: グラフ解析特化型言語

グラフ応用 - Cyber Security

グラフ応用

Madduri's work

Distributed Graph Analytics

GPU-based SSSP

2011年11月20日日曜日

Graphity

2011年11月19日土曜日

[Graph500] Graph500 on Amazon EC2

Meeting w/ Andy Yoo (LLNL)

2011年11月15日火曜日

Andy Yoo's Talk at Google

2011年10月10日月曜日

Solving Large, Irregular Graph Problems using Adaptive Work-stealing

HipG: parallel processing of large-scale graphs

2011年8月1日月曜日

Bagel: Pregel のオープンソース実装

2011年7月13日水曜日

Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory

2011年6月21日火曜日

[Graph500] 最適化に向けて

2011年6月20日月曜日

[Graph500] 90%

メニュー

マイブログリスト

フォロワー

ブログアーカイブ

参加ユーザー

2011年12月19日月曜日

2011年12月17日土曜日

2011年12月3日土曜日

2011年12月1日木曜日

2011年11月20日日曜日

2011年11月19日土曜日

2011年11月15日火曜日

2011年10月10日月曜日

2011年8月1日月曜日

2011年7月13日水曜日

2011年6月21日火曜日

2011年6月20日月曜日

メニュー

マイブログ リスト

フォロワー

ブログ アーカイブ

参加ユーザー

マイブログリスト

ブログアーカイブ