2011年12月31日土曜日

ANUChem

X10-based SDP など X10 関係者に関係するページ

計算化学に関係する X10 ベースの計算ライブラリ
http://cs.anu.edu.au/~Josh.Milthorpe/anuchem.html
ソースコードも公開されています。

上記に関係する論文は以下の IPDPSの論文。
X10 as a parallel language for scientific computation: practice and experience
http://cs.anu.edu.au/~Josh.Milthorpe/publications/Milthorpe2011_IPDPS.pdf

2011年12月23日金曜日

Near-term Target Conferences

IPDPS2012 併設ワークショップ
2012 International Workshop on High Performance Data Intensive Computing(HPDIC2012)
http://cloud.hdu.edu.cn/HPDIC2012/
締め切り:2012/01/08
Research Contents: StreamGPU (Matsuura)



IPDPS2012 併設ワークショップ
ParLearning 2012
https://researcher.ibm.com/researcher/view_project.php?id=2591
- Real-time solutions for learning algorithms on parallel platforms
Due: 2012/01/18
Notification: 2012/02/01
Research Contents:
- StreamGPU (Morita and Ueno)
- Bipartite Graph

SYSTOR 2012
http://www.research.ibm.com/haifa/conferences/systor2012/dates.shtml
Due: 2012/2/13
Notification: 2012/3/26


IEEE CLOUD 2012
- VCTLPerf + Linear Road Benchmark
- Graph500 on Amazon EC2
Submission Deadline: February 10, 2012
Full Paper Submission Due Date: February 15, 2012
Decision Notification (Electronic): March 15, 2012



HPDC 2012
http://www.hpdc.org/2012/home/important-dates/
- Graph500 (Ueno)
- StreamGPU : GPU Task Parallelism (Ueno)
Abstracts Due: 16 January 2012
Papers Due: 23 January 2012, 11:59 PM Pacific Standard Time
Author Notifications: 19 March 2012

SACSIS 2012 (Domestic Conference)
http://sacsis.hpcc.jp/2012/
- StreamGPU (Ueno)
- StreamAPU (Matsuura)
- I-GIMV (Ganse)
- Incremental Spectral Clustering (Nishii)
- SDP with X10 (Watanabe)
Due: 2012/1/20
Notification: 2012/3


PLDI X10 Workwhop
http://x10-lang.org/workshop
Paper submission deadline: February 28, 2012
Author Notification: March 28, 2012
- X10-based X10 : Watanabe
- Performance Evaluation of X10-based Graph Algorithms on TSUBAME: Hashikawa and Watanabe

EuroPar 2012
http://europar2012.cti.gr/
Deadline for abstracts: January 31, 2012
Deadline for full papers: February 7, 2012
Decision notification: May 11, 2012

Supercomputing 2012
http://sc12.supercomputing.org/
- Graph500 (Ueno) (It depends on the result of HPDC)

Winter Simulation Conference 2012
http://www.wintersim.org/futureconf.htm
- Due: 2012/4
- Notification: 2012/6
- X10-based Large simulation platform

ACM Middleware Conferece
http://2011.middleware-conference.org/
- Due: 2012/5

IEEE IISWC 2012
- Due: 2012/6

2011年12月19日月曜日

The Fourth Paradigm: Data-Intensive Scientific Discovery

昨今注目されている書籍
http://research.microsoft.com/en-us/collaboration/fourthparadigm/

Of Hammers and Nails: An Empirical Comparison of Three Paradigms for Processing Large Graphs

マイクロソフトの新たなグラフ処理の論文。

Of Hammers and Nails: An Empirical Comparison of Three Paradigms for Processing Large Graphs
http://research.microsoft.com/pubs/155533/paper.pdf

Many phenomena and artifacts such as road networks, social networks and the web can be modeled as large graphs and analyzed
using graph algorithms. However, given the size of the underlying
graphs, efficient implementation of basic operations such as connected component analysis, approximate shortest paths, and linkbased ranking (e.g.PageRank) becomes challenging.
This paper presents an empirical study of computations on such
large graphs in three well-studied platform models, viz., a relational
model, a data-parallel model, and a special-purpose in-memory
model. We choose a prototypical member of each platform model
and analyze the computational efficiencies and requirements for
five basic graph operations used in the analysis of real-world graphs
viz., PageRank, SALSA, Strongly Connected Components (SCC),
Weakly Connected Components (WCC), and Approximate Shortest Paths (ASP). Further, we characterize each platform in terms of
these computations using model-specific implementations of these
algorithms on a large web graph. Our experiments show that there
is no single platform that performs best across different classes of
operations on large graphs. While relational databases are powerful and flexible tools that support a wide variety of computations,
there are computations that benefit from using special-purpose storage systems and others that can exploit data-parallel platforms

2011年12月17日土曜日

2部グラフ関連性解析の高速化およびシステムプロファイリング

- GPGPU による高速化
- システムプロファイリング:どこにどのぐらい時間を費やしているのか?
- 下位の基盤をSystem S 非依存にし、TSUBAME で大規模実行できるようにする

アプリケーション屋の要件

アプリ屋の requirement および実際に使っている手法等を知るため、情報交換の場をセットしていきましょう。
候補としては、まず上がるのが、東大松尾先生、東大鹿島先生、東工大杉山先生、IBM TRLのデータマイニング屋でお客様案件を担当している研究者など。

MovieLens

土屋さんの指導教官:横浜国立大学の長尾先生
http://www.nlab.sogo1.ynu.ac.jp/ynu/members/nagao.html
ネット企業との共同研究

MovieLens (映画に対する Rating) というデータセットを使って協調フィルタリング+ベイジアンネットを使って解析
http://www.grouplens.org/node/73

2011年12月14日水曜日

Random Walks on Directed and Undirected Graphs

http://www.math.ucsd.edu/~phorn/math261/

Random walk with restart に対する高速な Top-k 検索

Random walk with restart に対する高速な Top-k 検索
db-event.jpn.org/deim2011/proceedings/pdf/d3-1.pdf

2011年12月13日火曜日

An In-Depth Study of Stochastic Kronecker Graphs

ICDM に参加中の村田先生からのメッセージ。

"An In-Depth Study of Stochastic Kronecker Graphs"
C. Seshadhri, Ali Pinar, and Tamara G. Kolda,
Proceedings of the 11th IEEE International Conference on
Data Mining, pp.587-596, 2011.
http://arxiv.org/abs/1102.5046

タイトルの通り、(Graph500で用いられる)Kronecker Graphの研究です。発表を聞いただけで論文はまだ読んでいないのですが、
・次数分布はheavy tailでもpower lawでもなく、論文中の図の
ように何度もバウンドしたような曲線になる
・ノイズを入れたnoisy SKGになると次数分布が直線に近づく
・孤立点が多い。k=42で74%が孤立点
・coreのサイズは実ネットワークのそれより小さい

MLG 2010 Workshop

http://www.cs.purdue.edu/mlg2011/speakers.html

RAMGRAPH: large scale graph processing in RAMCloud

http://code.google.com/p/ramgraph/

Recently, the pay-as-you-go computing paradigm in public clouds such as Amazon EC2 offers users to execute their computational tasks on the rented virtualized computation resources. Due to the lower cost than hosting a private cloud, this paradigm has become popular for medium- and small-sized Web companies such as Alexa and SmugMug. Mining and processing the large web and social networks is one of the common regular tasks for those Web companies. For example, a search engine typically uses a graph-based ranking scheme such as PageRank to give an order to the pages. Moreover, those mining and processing tasks are highly customized with various user-defined logics applied on the entire graph. The requirement of high efficiency in large scale graph processing challenges existing tools. This is because, most of current tools such as MapReduce and Hadoop are disk-based, where the hard disk is typically 100-1000x slower than the main memory. The random access nature of graph processing further harnesses the performance of disk-based graph processing. The low efficiency of current disk-based systems limits the popularity of graph mining and business intelligence in Web companies, and can result in low utilization of existing investment and lose potential business opportunities.

To unleash the computation power of current cloud offerings, we propose Thunder, a large graph processing system in the main memories of hundreds or thousands of machines. Thunder stores and processes graphs entirely in the main memory, and uses hard disks only for backup and archrivals. The system provides APIs similar to Map and Reduce functions in MapReduce for users to implement their user-defined logics. These logics are automatically executed on the graph in a distributed manner.

The goal of designing and implementing Thunder is to exploit the advantage of in-memory processing, while remaining all the merits of conventional disk-based tools, namely excellent scalability, good fault-tolerance and ability of expressing arbitrary and complex customized logic. In particular, we are facing challenging issues like scalability, availability, complex memory management, network traffic reducing and so on.

Polonium: Tera-Scale Graph Mining and Inference for Malware Detection

http://siam.omnibooksonline.com/2011datamining/data/papers/052.pdf

We present Polonium, a novel Symantec technology
that detects malware through large-scale graph infer-
ence. Based on the scalable Belief Propagation algo-
rithm, Polonium infers every le's reputation,
agging
les with low reputation as malware. We evaluated
Polonium with a billion-node graph constructed from
the largest le submissions dataset ever published (60
terabytes). Polonium attained a high true positive rate
of 87% in detecting malware; in the eld, Polonium
lifted the detection rate of existing methods by 10 ab-
solute percentage points. We detail Polonium's design
and implementation features instrumental to its success.
Polonium has served 120 million people and helped an-
swer more than one trillion queries for le reputation.

Router: A Message Passing Model for Large-Scale Graph Mining

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6079403&tag=1

ABSTRACT

many parallel computational models have been employed in many papers to process the large-scale graph. In this paper, we propose a message passing model Router which could be invoked by most of current parallel computational models to process the large graph. The model is good at solving the multi-source traversal problem which often occurs in many complex graph algorithms. As the model can traverse the graph from different source at the same time, the multi-source traversal will finish in much less iteration than before. In this way, the total time of the algorithm involves multi-source traversal will be reduced in a large scale. Besides, the Router model is flexible enough to express a broad set of algorithms by implementing the Router's abstract method. Finally, the experiment shows the efficiency and scalability of the model.

Router: A Message Passing Model for Large-Scale Graph Mining

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6079403&tag=1

Large-Scale Graph Mining Using Backbone Refinement Classes

We present a new approach to large-scale graph mining
based on so-called backbone refinement classes. The method
efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, using classes
of fragments to reduce feature set size and running times.
The classes are defined in terms of fragments sharing a common backbone. The method is able to optimize structural
inter-feature entropy as opposed to occurrences, which is
characteristic for open or closed fragment mining. In the
experiments, the proposed method reduces feature set sizes
by >90 % and >30 % compared to complete tree mining and
open tree mining, respectively. Evaluation using crossvalidation runs shows that their classification accuracy is similar to the complete set of trees but significantly better than
that of open trees. Compared to open or closed fragment
mining, a large part of the search space can be pruned due
to an improved statistical constraint (dynamic upper bound
adjustment), which is also confirmed in the experiments in
lower running times compared to ordinary (static) upper
bound pruning. Further analysis using large-scale datasets
yields insight into important properties of the proposed descriptors, such as the dataset coverage and the class size
represented by each descriptor. A final cross-validation run
confirms that the novel descriptors render large training sets
feasible which previously might have been intractable.

Tools for Large Graph Mining

http://www.cs.cmu.edu/~deepay/thesis.pdf

Graphs show up in a surprisingly diverse set of disciplines, ranging from computer networks to sociology, biology, ecology and many more. How do such “normal” graphs look like? How can we spot abnormal subgraphs within them? Which
nodes/edges are “suspicious?” How does a virus spread over a graph? Answering
these questions is vital for outlier detection (such as terrorist cells, money laundering rings), forecasting, simulations (how well will a new protocol work on a realistic
computer network?), immunization campaigns and many other applications.
We attempt to answer these questions in two parts. First, we answer questions
targeted at applications: what patterns/properties of a graph are important for solving
specific problems? Here, we investigate the propagation behavior of a computer virus
over a network, and find a simple formula for the epidemic threshold (beyond which
any viral outbreak might become an epidemic). We find an “information survival
threshold” which determines whether, in a sensor or P2P network with failing nodes
and links, a piece of information will survive or not. We also develop a scalable,
parameter-free method for finding groups of “similar” nodes in a graph, corresponding
to homogeneous regions (or CrossAssociations) in the binary adjacency matrix of the
graph. This can help navigate the structure of the graph, and find un-obvious patterns.
In the second part of our work, we investigate recurring patterns in real-world
graphs, to gain a deeper understanding of their structure. This leads to the development
of the R-MAT model of graph generation for creating synthetic but “realistic” graphs,
which match many of the patterns found in real-world graphs, including power-law
and lognormal degree distributions, small diameter and “community” effects.

Workshop on Graph Mining

MLG 2011 : Ninth Workshop on Mining and Learning with Graphs
http://www.cs.purdue.edu/mlg2011/speakers.html


http://www.ourglocal.com/event/?eventid=10468%2C1
http://www.cs.umd.edu/mlg2010/
MLG2009

Temporal analysis of semantic graphs using ASALSAN ∗

http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/ICDM07.pdf

ASALSAN is a new algorithm for computing three-way
DEDICOM, which is a linear algebra model for analyzing
intrinsically asymmetric relationships, such as trade among
nations or the exchange of emails among individuals, that
incorporates a third mode of the data, such as time. ASALSAN is unique because it enables computing the three-way
DEDICOM model on large, sparse data. A nonnegative version of ASALSAN is described as well. When we apply these
techniques to adjacency arrays arising from directed graphs
with edges labeled by time, we obtain a smaller graph on latent semantic dimensions and gain additional information
about their changing relationships over time. We demonstrate these techniques on international trade data and the
Enron email corpus to uncover latent components and their
transient behavior. The mixture of roles assigned to individuals by ASALSAN showed strong correspondence with
known job classifications and revealed the patterns of communication between these roles. Changes in the communication pattern over time, e.g., between top executives and
the legal department, were also apparent in the solutions.

Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching

http://briangallagher.net/pubs/gallagher-aaaifall-2006.pdf

Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity

http://cameronmarlow.com/media/backstrom-geographical-prediction_0.pdf

Geography and social relationships are inextricably inter-
twined; the people we interact with on a daily basis almost
always live near us. As people spend more time online,
data regarding these two dimensions { geography and so-
cial relationships { are becoming increasingly precise, allow-
ing us to build reliable models to describe their interaction.
These models have important implications in the design of
location-based services, security intrusion detection, and so-
cial media supporting local communities.
Using user-supplied address data and the network of asso-
ciations between members of the Facebook social network,
we can directly observe and measure the relationship be-
tween geography and friendship. Using these measurements,
we introduce an algorithm that predicts the location of an
individual from a sparse set of located users with perfor-
mance that exceeds IP-based geolocation. This algorithm
is e cient and scalable, and could be run on hundreds of
millions of users.

SIAM AN10 Minisymposium on Analyzing Massive Real-World Graphs

http://www.graphanalysis.org/workshop2010.html

10:30-10:55 People You May Know
Lars Backstrom, Facebook
Facebook's friend recommendation system helps people connect with their friends. Our system, called People You May Know, uses a combination of results from sociology and machine learning to make the best suggestions possible. We will look at some of the challenges involved in building a system that can handle the scale of Facebook and provide high quality recommendations. In this talk I will discuss both the algorithmic and machine learning challenges that we have faced and overcome in building this system.


11:00-11:25 Modularity and Graph Algorithms
Joe McCloskey, National Security Agency; David A. Bader, Georgia Institute of Technology
A number of graph partitioning algorithms are based on the concept of modularity. In particular Clauset, Newman and Moore (CNM) have developed a greedy agglomerative graph partitioning algorithm that scales well but is known to have several flaws. Fortunato and Barthelemy have performed a rigorous analysis of the CNM algorithm that elucidates it problems. More recently Berry, Hendrickson, Laviolette, and Phillips have derived a weighted variant of CNM that performs much better in practice. This talk will focus on a different version of the parent CNM algorithm based on a statistical re-interpretation of CNM that also addresses some of the issues with the original algorithm.
11:30-11:55 Exploiting Sparsity in the Statistical Analysis of Gene Expression Data
Padma Raghavan, Anirban Chatterjee, and Francesca Chiaromonte, Pennsylvania State University

12:00-12:25 Scalable Methods for Representing, Characterizing, and Generating Large Graphs
Ali Pinar, Sandia National Laboratories

Monday, July 12
4:00 PM - 6:00 PM
Room: Spirit of Pittsburgh B - Level 3

4:00-4:25 Hybrid Parallel Programming for Massive Graph Analysis
Kamesh Madduri, Lawrence Berkeley National Laboratory

4:30-4:55 Tools and Primitives for High-performance Graph Computation
John Gilbert, University of California, Santa Barbara

5:00-5:25 Practical Heuristics for Inexact Subgraph Isomorphism
Jon Berry, Sandia National Laboratories

5:30-5:55 Spectral Methods for Subgraph Detection
Nadya Bliss and Benjamin A. Miller, Massachusetts Institute of Technology; Patrick J. Wolfe, Harvard University
We describe a statistical test for subgraph detection and localization using spectral properties of the so-called modularity matrix, a type of residual under the Chung-Lu random graph model. We show that the resultant algorithmic procedure can be applied to very large graphs ($< 10^6$ vertices), with complexity dominated by that of standard sparse eigensolver methods, and can successfully isolate anomalous vertices in real data examples.
Workshop Organizer:

Graph-based technologies for intelligence analysis

Cyber Security

http://dl.acm.org/citation.cfm?id=971643
http://ngc.sandia.gov/assets/documents/IAProblemsNetworkFormulations_JICRD2010.pdf

A graph-theoretic analysis of the human protein-interaction network using multicore parallel algorithms

http://dl.acm.org/citation.cfm?id=1456988&CFID=73076166&CFTOKEN=19740521


Due to fundamental physical limitations and power constraints, we are witnessing a paradigm shift in commodity microprocessor architecture to multicore designs. Continued performance now requires the exploitation of concurrency at the algorithm level. In this article, we demonstrate the application of high performance computing techniques in systems biology and present multicore algorithms for the important research problem of protein-interaction network (PIN) analysis. PINs play an important role in understanding the functional and organizational principles of biological processes. Promising computational techniques for key systems biology research problems such as identification of signaling pathways, novel protein function prediction, and the study of disease mechanisms, are based on topological characteristics of the protein interactome. Several complex network models have been proposed to explain the evolution of protein networks, and these models primarily try to reproduce the topological features observed in yeast, the model eukaryote interactome. In this article, we study the structural properties of a high-confidence human interaction network, constructed by assimilating recent experimentally derived interaction data. We identify topological properties common to the yeast and human protein networks. Betweenness is a quantitative measure of centrality of an entity in a complex network, and is based on computing all-pairs shortest paths in the graph. A novel contribution of our work is the analysis of the degree-betweenness centrality correlation in the human PIN. Jeong et al. empirically showed that betweenness is positively correlated with the essentiality and evolutionary age of a protein. We observe that proteins with high betweenness, but low degree (or connectivity) are abundant in the human PIN. We have designed efficient and portable parallel implementations for the exact calculation of betweenness and other compute-intensive centrality metrics relevant to interactome analysis. For example, on the Sun Fire T2000 server with the UltraSparc T1 (Niagara) processor, we achieve a relative speedup of about 16 using 32 threads for a typical instance of betweenness centrality on a PIN, reducing the running time from nearly 312min to 13s.

Andy's work on dataflow language

http://datasys.cs.iit.edu/events/MTAGS09/a05-yoo-slides.pdf

Survey (2)

Closeness Centralityの高いノードを発見する高速アルゴリズム
http://www.ieice.org/ken/paper/20111021b0j3/

グラフマイニングとその統計的モデリングへの応用

グラフマイニングとその統計的モデリングへの応用

http://www.ism.ac.jp/editsec/toukei/pdf/54-2-315.pdf

逆探索法によるグラフ系列マイニングの高速化
http://db-event.jpn.org/deim2011/proceedings/pdf/b10-2.pdf


グラフ系列マイニング
http://latent-dynamics.net/02/13_Inokuchi.ppt.pdf

頂点により誘導される頻出グラフ系列パターンのマイニング
http://www-erato.ist.hokudai.ac.jp/html/php/seminar5_docs/inokuchi.pdf

多頻度グラフマイニングを利用した動画の解析
http://www.ieice.org/ken/paper/20081218kahU/
多頻度グラフマイニングとは,大量のグラフ構造から,頻出するグラフパターンを意味のあるものとして抽出する手法である.本研究では,多頻度グラフマイニングを動画像の解析に用いる.本稿では,監視カメラの前を移動物体が通過するような動画を対象に多頻度グラフマイニングを用いてグラフベースの背景除去手法を提案する.具体的には,各ビデオフレームを領域分割し,各領域を節点,各領域間の隣接関係を辺としてグラフ構造を作る.この時,背景部分が頻出パターンとなることを利用して背景除去を行
う.

非整列RNA配列群からの頻出パターンのマイニング
http://www.mizuho-ir.co.jp/publication/giho/pdf/001_10.pdf

ページ閲覧時間を考慮した Web ログマイニング手法の提案
http://www.aiet.ecei.tohoku.ac.jp/~mihara/reference/refICS148.pdf
近年,Web サイトの複雑化・大規模化に伴い,Web サイト管理者がユーザのニーズ
に合わせて Web サイトを適宜改善していく必要が高まっている.このユーザのニーズを
知る手がかりの一つとして有用なのが Web アクセスログである.Web アクセスログを解
析することで,ユーザの Web サイト内での行動の様子を知ることができる.本稿では,
Web アクセスログ解析にデータマイニングを応用する Web ログマイニングに関連して,
ユーザが各ページを閲覧した時間を考慮したアクセスパターンを抽出する手法を提案す
る.

グラフ分析を利用した文書集合からの話題構造マイニング
http://sc.chat-shuffle.net/paper/uid:110007380661

グラフマイニングを応用した系列データ解析
http://www.ai-gakkai.or.jp/jsai/conf/2008/program/pdf/100318.pdf

メタデータ付き推薦のためのグラフマイニング
https://kaigi.org/jsai/webprogram/2011/pdf/231.pdf


時系列データを用いたWebグラフマイニング
http://db-event.jpn.org/deim2009/proceedings/files/B9-2.pdf

背景の分割に対応したグラフマイニングベースの動画像からの背景除去
Graph Mining based Background Removal from Videos Solving Background Separation

グラフマイニングアルゴリズムを用いたギャップを含むコードクローン情報の生成
http://iss.ndl.go.jp/books/R000000004-I10809124-00
http://ci.nii.ac.jp/naid/110008584014


グラフマイニングを用いたブロードバンドサービスエリア分類手法の一考察
https://kaigi.org/jsai/webprogram/2010/paper-422.html

多段GBIツールを利用した化学物質の部分構造情報の有効利用
http://www.sccj.net/event/nenkai/2005au/program/abstract-pdf/2O-02.pdf

線形グラフのマイニングアルゴリズム
http://www-erato.ist.hokudai.ac.jp/html/php/symposium3_docs/tabei-preview-20101129.pdf

多頻度グラフマイニングの一般化

阪大・猪口先生 (元TRL) のご研究
http://www.ar.sanken.osaka-u.ac.jp/papers/2006-12/jsai_v19_n5.pdf

大規模社会ネットワーク分析の事例と展望

http://crev.jp/zinbun/zinbun_v094.pdf

2011年12月12日月曜日

TextGraphs-4: Graph-based Methods for Natural Language Processing

TextGraphs-4: Graph-based Methods for Natural Language Processing
http://www.textgraphs.org/ws09/index.html

Graph Crest Meeting @ 東工大

以下、渡部君議事録
--
CREST meeting

uno先生
情報学研究所
グラフネットワーク、離散構造アルゴリズムの研究
計算量(オーダ)、ビッグデータの高速処理


計算量 ( n=1億(現在)→10億(将来) )
|NP-comp
|n^3 ( 藤澤先生 )

|【n^2】 ← 重要 !!
| クラスタリング,etc..
| m*n^1/2
|n log n ( Google )
|n
|log n
| web-service


データと問題
・系列データ
→具体的なタスクはなにか?定まっていない。
・離散&グラフ (中心性やクラスタリング、直径)
→過去、最大流からクラスタリング、最小カットを求めるモデルもあったが…
→得られる解が求めたい解と食い違っていることが多々ある(グラフ全体に関して)。
→グラフ全体の評価よりも局所的な分析の方が重要性が高い。

藤沢先生、避難シュミレーションでは局所的なものよりグラフ全体によるものの影響が大きい?
→京都市などのグラフはグラフ全体が関与する(グラフの性質によるものである)
→全部が全部ではないが最大流、最小費用流がネットワークに役立ちにくい?

・マッチング問題(1対1の暗号化、匿名化 anonimization)(一対一)
→アルゴリズムの開発はされているがスケールが大きいと動かせない(せいぜい1万件程度?)
→暗号化をしても情報を公開してもよいのか?政治的問題。
鈴村先生、グラフの分割性は高いのでは?
→それほど並列性は高くない?
→sensitiveなものなのであまりデータがない

・最小費用流(マッチングの拡張)割り当てが解ける(多対多)   行列の最小固有値問題

→大規模スケールで有用な単純なアルゴリズムしかモデル化に利用されない(KMeans,BFS,etc…)
→目的解を得るためための様々なアルゴリズムの構築が必要
→高度なクラスタリングアルゴリズムなら目的解(グラフの性質、局所性)を見つけれる
→がオーダーが問題で大規模データに対応できない
→KMeansなら動くは動くが…十分な目的解は得られない...

脇田先生、スーパーノードを除外して考えてみるのは?

・大規模ストリーム処理
データ解析に時間の概念をいれる、
データを変化するものとして考えなければならない
データを捨てる。genom generator?爆発的なデータは?
→メモリ上にのらないデータはSSDなど。
→モデルによる。場合によりけり。

バッチならできる処理も(定数メモリに対して無限(高速)に流れてくるデータ)には対応できない
→ストリーム処理
誰が誰と通信を行っているか解析(sparseな行列ならば可能になる)
→あるデータにメモリで処理できる枠を決め、時間軸をずらせば実現できるので工学的にストリームと違いがない?
→ストリーミングアルゴリズムはあまり将来性がない? ストリーミングでは粒度が細かすぎる、
問い合わせの頻度が高いものはインクリメンタルなアルゴリズムはおもしろい。

・大規模ストレージ
cps (cyber phisical system)
実システムにたくさんのセンサーを置いて、人間や経済の動向などを分析するシステム
アプリケーションによるが系列データとは限らない。
今まで見えなかったデータを見たいという要望(例:人の動向からどの程度運動不足の人がいるか)
事故が起きる→過去のデータは使わないが→cpsでどこの交差点で事故が起きやすいかがわかるようになる
時間の軸は

2次記憶を利用すると大規模データは手に負えない。
メモリとSSDの階層化はアルゴリズム屋は考えてない。理論の方向から難しいのでは?
メモリに収まっても時間がかかりすぎる場合も。
ネットワーク型のアルゴリズムは並列化がむずかしい

- - - - - - - - - -
集合の類似性を調べるアプリケーションについて
たくさん(1億)のデータから前後のつながり?から単語の意味が似ているなど。
頻度の高いデータを外すことで計算量を減らす→3,4日かかる
|A∩B| / |A∪B|
このような考えはグラフ解析そのものに用いるのではなくグラフに関するものの高速化アルゴリズムに使える?
- - - - - - - - - -

・グラフ視覚化
大規模なグラフに関してはあまり議論されていない(バネモデル)
ゲノムではドットプロット
上の類似性が調べられれば十分できる
大規模なデータを可視化しても人間には直感的に理解出来ないので
うまく表現できるようなモデル、方法を考える必要があるのではないか?

藤澤先生、グラフが理解できるような技術者を養成!訓練によって見えるようになるかも!
脇田先生、26日にitohさんがくる?

クリークぐらい密な情報は必要ない。密な情報は求められていない。


ネットフリックス
想像技術、作り込み
個々のデータを集めてきたパラメータを変えてチューニング→動いた人が勝ち?

雁瀬さん
グラフのデータを抜き出してきたときにそれが有用かそうでないかの判定は?
少なくとも、といった条件を抜き出しておく。主観の問題がある?
グラフ処理は並列化しにくいが、グラフに対して前処理を行う場合は?
→前処理を行っても2転3転計算し直し、速度があがらない場合がおおい

遠藤先生
コンピュータが大きくなってグラフ解析は有用になるか?
知識抽出の世界の人にはあまり全体を意識していない(局所的なものだけ)
グラフアルゴリズムは並列化しにくいものが多い
並列化によって精度が下がってしまうものは許容されるか?
アプリケーションによる。クラスタリングなら許容される?
近似は?
近似によって受け入れられないものもある(バイオ)一方ヒューリスティックな解法を使っていたり。
自然言語処理分野では結果のみを意識するので途中の近似による
TSUBAMEの運用、ネットワークの可視化

Stream Computing を用いたリアルタイム交通管理

http://sites.computer.org/debull/A10june/Anand.pdf

交通量予測アルゴリズム

TRL の加藤さんが書いた論文 http://www-06.ibm.com/ibm/jp/provision/no64/pdf/64_report.pdf
の 5.2 章参照

予測アルゴリズムは IBM ワトソンの Wanli が以下の論文で提唱。
http://transp-or2.epfl.ch/tristan/FullPapers/010wynter.pdf

2011年12月9日金曜日

Google Map 上のリアルタイム交通状況可視化

Android 携帯でユーザーが現在位置情報の機能を有効しているのみ Google のサーバーに送信。位置データと速度から交通状況を把握
http://googlejapan.blogspot.com/2011/12/google_09.html

グラフ可視化

http://www.kecl.ntt.co.jp/as/members/tatsushi/graphdrawing.html

2011年12月8日木曜日

IBM Parallel Debugger for X10 Programming

https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/communityview?communityUuid=e3fb8100-3ee3-4402-bf67-bf66b29797ea

NVIDIA ブースでの SC11 でのトーク

http://www.nvidia.com/object/sc11.html

Flume + Cassandra

http://www.slideshare.net/geminimobile/flume-cassandra-real-time-log-processing

2011年12月3日土曜日

Spectral Clustering

http://www.csie.ntu.edu.tw/~cjlin/papers/psc08.pdf
Parallel Spectral Clustering in Distributed Systems

Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional
algorithms such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and
computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two
representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix
with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors
and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through
an empirical study on a document data set of 193; 844 instances and a photo data set of 2; 121; 863, we show that
our parallel algorithm can effectively handle large problems.


Parallelization of spectral clustering algorithm on multi-core processors and GPGPU
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4625449

Spectral clustering is a widely-used algorithm in the field of information retrieval, data mining, machine learning and many others. It can help to cluster a large number of data into several categories without requiring any additional information about the dataset or the categories, so that people can find information by categories easily. In this paper, we parallelize the algorithm proposed by Andrew Y. Ng, Michael I. Jordan and Yair Weiss. We provide two versions of implementation: one is parallelized in OpenMP; the other is programmed in the NVIDIA CUDA (compute unified device architecture), which is the environment provided by NVIDIA to program on its CUDA-Enabled GPGPUs (general-purpose graphic processing unit). We can achieve about three times speedup in OpenMP and around ten times speedup using CUDA in our experiments.


On a strategy for spectral clustering with parallel computation
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science

pectral Clustering is one of the most important method based on space dimension reduction used in Pattern Recognition. This method consists in selecting dominant eigenvectors of a matrix called affinity matrix in order to define a low-dimensional data space in which data points are easy to cluster. By exploiting properties of Spectral Clustering, we propose a method where we apply independently the algorithm on particular subdomains and gather the results to determine a global partition. Additionally, with a criterion for determining the number of clusters, the domain decomposition strategy for parallel spectral clustering is robust and efficient.

2011年12月1日木曜日

Graph500 on Amazon EC2 (Cluster Computer Instances)

http://aws.amazon.com/jp/hpc-applications/

23 GB メモリ
33.5 EC2 Compute Unit(2 x Intel Xeon X5570、quad-core「Nehalem」アーキテクチャ)
1690 GB インスタンスストレージ
64ビット プラットフォーム
I/O 性能: 超高速(10 ギガビットイーサネット)
API 名: cc1.4xlarge

Cluster Compute Eight Extra Large を使用すると、1時間 2.40 $.
128 ノードを1時間使用すると、1ドル80円計算で 約 25000円.

Graph Stream Analytics : Pattern Matching

http://hpcdb.org/sites/hpcdb.org/files/choudhury_graphstreams.pdf
Large-Scale Continuous Subgraph Queries on Streams

Graph pattern matching involves finding exact or approximate matches for a query subgraph in a larger graph. It has been studied extensively and has strong applications in domains such as computer vision, computational biology, social networks, security and finance.The problem of exact graph pattern matching is often described in terms of subgraph isomorphism which is NP-complete. The exponential growth in streaming data from online social networks, news and video streams and the continual need for situational awareness motivates a solution for finding patterns in streaming updates. This
is also the prime driver for the real-time analytics market. Development of incremental algorithms for graph pattern matching on streaming inputs to a continually evolving graph is a nascent area of research. Some of the challenges associated with this problem are the same as found in continuous query (CQ) evaluation on streaming databases. This paper reviews some of the representative work from the exhaustively researched field of CQ systems and identifies important semantics, constraints and architectural features that are also appropriate for HPC systems performing real-time graph analytics. For each of these features we present a brief discussion of the challenge encountered in the database realm, the approach to the
solution and state their relevance in a high-performance, streaming
graph processing framework.

グラフマイニング

http://www.db.soc.i.kyoto-u.ac.jp/index.php/ja/component/content/article/104

GraphStream: A Dynamic Graph Library

http://graphstream-project.org/

静的・動的グラフの可視化ライブラリ. Version 1.1. YouTube のビデオあり
http://graphstream-project.org/doc/Gallery/

X10 ベースの大規模交通シミュレーション実行基盤

道路ネットワークを領域分割することによって複数ノードで高速化されることは明らかであり、TSUBAME 上での妥当な結果が出つつある。ただ、一部の領域においては非常に時間がかかっており、これは車両の経路が通る重要な道路の許容量が少ないから。現在出ている性能分析により、今後は、より研究としての落としどころをつめていかないといけませんが、「Work Stealing 機構によって、動的に混雑度の高い領域に X10 アクティビティを生成する方式」を実装することにより、研究としての価値が生まれるでしょう。

Graph Stream Mining

http://www.utdallas.edu/~hamlen/parveen-passat11.pdf

Insider Threat Detection using Stream Mining and Graph Mining

Evidence of malicious insider activity is often
buried within large data streams, such as system logs
accumulated over months or years. Ensemble-based stream
mining leverages multiple classification models to achieve
highly accurate anomaly detection in such streams even
when the stream is unbounded, evolving, and unlabeled.
This makes the approach effective for identifying insider threats who attempt to conceal their activities by
varying their behaviors over time. This paper applies
ensemble-based stream mining, unsupervised learning, and
graph-based anomaly detection to the problem of insider
threat detection, demonstrating that the ensemble-based
approach is significantly more effective than traditional
single-model methods.
Index Terms—anomaly d

Gremlin: グラフ解析特化型言語

https://github.com/tinkerpop/gremlin/wiki
http://www.infoq.com/jp/news/2010/01/Gremlin
http://www.youtube.com/watch?v=5wpTtEBK4-E

Graph database

http://blog.infinitegraph.com/2011/09/02/real-time-relationship-analytics-from-large-scale-graph-processing/


Neo4J
http://www.youtube.com/watch?v=UodTzseLh04&feature=related

Neo4J and Real Scenario
http://www.youtube.com/watch?v=UNH0JC0grOQ&feature=related

グラフ応用 - Cyber Security

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5482734&tag=1

Measure Large Scale Network Security Using Adjacency Matrix Attack Graphs

An Attack Graph capable of disclosing causal relationships between multiple vulnerabilities has become a desirable tool for administrators to analyze and locate potential risks to protect critical networked resources against internal or external multi-step attacks. However, probabilistic security metric computations, using currently applied attack graphs, have complexity problems due to their scale. It is hard or even impossible for current attack graphs to be applied to large scale networks. This paper proposes a novel approach that combines the advantages of exploit-dependency attack graphs and adjacency matrices, which results in quadratic complexity. We first give a motivating example to introduce the approach. We then define the adjacency matrix attack graphs. We show that computing probabilistic cumulative scores by means of adjacency matrix attack graphs is efficient and readily scalable.


http://research.microsoft.com/pubs/79413/botgraph.pdf
BotGraph: Large Scale Spamming Botnet Detection

Network security applications often require analyzing
huge volumes of data to identify abnormal patterns or
activities. The emergence of cloud-computing models
opens up new opportunities to address this challenge by
leveraging the power of parallel computing.
In this paper, we design and implement a novel system
called BotGraph to detect a new type of botnet spamming
attacks targeting major Web email providers. Bot-
Graph uncovers the correlations among botnet activities
by constructing large user-user graphs and looking for
tightly connected subgraph components. This enables us
to identify stealthy botnet users that are hard to detect
when viewed in isolation. To deal with the huge data
volume, we implement BotGraph as a distributed application
on a computer cluster, and explore a number of
performance optimization techniques. Applying it to two
months of Hotmail log containing over 500 million users,
BotGraph successfully identified over 26 million botnetcreated
user accounts with a low false positive rate. The
running time of constructing and analyzing a 220GB Hotmail
log is around 1.5 hours with 240 machines. We believe
both our graph-based approach and our implementations
are generally applicable to a wide class of security
applications for analyzing large datasets.

グラフ応用

http://www.indiana.edu/~cortex/networks_nrn.pdf

Complex brain networks: graph theoretical analysis of structural and functional systems

Recent developments in the quantitative analysis of complex networks, based
largely on graph theory, have been rapidly translated to studies of brain network organization.
The brain’s structural and functional systems have features of complex networks — such as
small-world topology, highly connected hubs and modularity — both at the whole-brain
scale of human neuroimaging and at a cellular scale in non-human animals. In this article, we
review studies investigating complex brain networks in diverse experimental modalities
(including structural and functional MRI, diffusion tensor imaging, magnetoencephalography
and electroencephalography in humans) and provide an accessible introduction to the
basic principles of graph theory. We also highlight some of the technical challenges and key
questions to be addressed by future developments in this rapidly moving field.

Madduri's work

http://www.cc.gatech.edu/~bader/papers/ShortestPaths.html

Parallel Shortest Path Algorithms for Solving Large-Scale Instances

We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on largescale graphs using the Delta-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel computer. The MTA-2 is a high-end shared memory system offering two unique features that aid the efficient parallel implementation of irregular algorithms: the ability to exploit fine-grained parallelism, and low-overhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with competitive sequential algorithms, for low-diameter sparse graphs. For instance, Delta-stepping on a directed scale-free graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA-2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a shortest path problem on realistic graph instances in the order of billions of vertices and edges.

HPC Graph Analytics

http://www.graphanalysis.org/benchmark/GraphAnalysisBenchmark-v1.0.pdf

Distributed Graph Analytics

http://dl.acm.org/citation.cfm?id=958940
Δ-stepping: a parallelizable shortest path algorithm

The single source shortest path problem for arbitrary directed graphs with n nodes, m edges and nonnegative edge weights can sequentially be solved using O(n ċ log n + m) operations. However, no work-efficient parallel algorithm is known that runs in sublinear time for arbitrary graphs. In this paper we present a rather simple algorithm for the single source shortest path problem. Our new algorithm, which we call Delta-stepping, can be implemented very efficiently in sequential and parallel setting for a large class of graphs. For random edge weights and arbitrary graphs with maximum node degree d, sequential Δ-stepping needs O(n + m + d ċ L) total average-case time, where L denotes the maximum shortest path weight from the source node s to any node reachable from s. For example, this means linear time on directed graphs with constant maximum degree. Our best parallel version for a PRAM takes O(d ċ L ċ log n + log2 n) time and O(n + m + d ċ L ċ log n) work on average. For random graphs, even O(log2 n) time and O(n + m) work on average can be achieved. We also discuss how the algorithm can be adapted to work with nonrandom edge weights and how it can be implemented on distributed memory machines. Experiments indicate that already a simple implementation of the algorithm achieves significant speedup on real machines.


Single-Source Shortest Paths with the Parallel Boost Graph Library
http://www.dis.uniroma1.it/~challenge9/papers/edmonds.pdf
The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributed-memory computation on large graphs. Developed with the Generic Programming
paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary
vertex and edge properties, and different communication media. In this paper, we describe the implementation of two parallel variants of Dijkstra’s single-source shortest paths algorithm in the Parallel BGL.We also provide an experimental evaluation of these implementations using synthetic and real-world benchmark graphs from the 9th DIMACS Implementation Challenge.


http://eecs.wsu.edu/~cs460/cs550/parallelShortestPath.pdf
A Parallel Shortest Path Algorithm Based on Graph-Partitioning and Iterative Correcting, 2010
In this paper, we focus on satisfying the actual demands of quickly finding the shortest paths over real-road networks in an intelligent transportation system. A parallel shortest path algorithm based on graph partitioning and iterative correcting is proposed. After evaluating the algorithm using three real road networks, we conclude that our graph- partitioning and iterative correcting based parallel algorithm has good performance. In addition, it achieves more than a 15-fold speedup on 16 processors in an IBM cluster over these real road networks.


http://scidok.sulb.uni-saarland.de/volltexte/2004/207/pdf/UlrichMeyer_ProfDrKurtMehlhorn.pdf
Design and Analysis of Sequential and Parallel Single–Source Shortest–Paths Algorithms, Ulrich Meyer

e study the performance of algorithms for the Single-Source Shortest-Paths (SSSP) problem on graphs with nodes and edges with nonnegative random weights. All previously known SSSP algorithms for directed graphs required superlinear time. We give the first SSSP algorithms that provably achieve linear average-case execution time on arbitrary directed graphs with random edge weights. For independent edge weights, the linear-time bound holds with high probability, too. Additionally, our result implies im- proved average-case bounds for the All-Pairs Shortest-Paths (APSP) problem on sparse graphs, and it yields the first theoretical average-case analysis for the “Approximate Bucket Implementation” of Dijkstra’s SSSP algorithm (ABI–Dijkstra). Furthermore, we give con- structive proofs for the existence of graph classes with random edge weights on which ABI–Dijkstra and several other well-known SSSP algorithms require superlinear average- case time. Besides the classical sequential (single processor) model of computation we also consider parallel computing: we give the currently fastest average-case linear-work parallel SSSP algorithms for large graph classes with random edge weights, e.g., sparse random graphs and graphs modeling the WWW, telephone calls or social networks.


http://www.cse.psu.edu/~madduri/papers/ThorupSSSP-MTAAP07.pdf


http://www.sandia.gov/~sjplimp/papers/pc11.pdf
MapReduce in MPI for Large-scale Graph Algorithms, 2010

We describe a parallel library written with message-passing (MPI) calls that allows algo- rithms to be expressed in the MapReduce paradigm. This means the calling program does not need to include explicit parallel code, but instead provides “map” and “reduce” functions that operate independently on elements of a data set distributed across processors. The library performs needed data movement between processors. We describe how typical MapReduce functionality can be implemented in an MPI context, and also in an out-of-core manner for data sets that do not fit within the aggregate memory of a parallel machine. Our motivation for creating this library was to enable graph algorithms to be written as MapReduce opera- tions, allowing processing of terabyte-scale data sets on traditional MPI-based clusters. We outline MapReduce versions of several such algorithms: vertex ranking via PageRank, triangle finding, connected component identification, Luby’s algorithm for maximally independent sets, and single-source shortest-path calculation. To test the algorithms on arbitrarily large artifi- cial graphs we generate randomized R-MAT matrices in parallel; a MapReduce version of this operation is also described. Performance and scalability results for the various algorithms are presented for varying size graphs on a distributed-memory cluster. For some cases, we com- pare the results with non-MapReduce algorithms, different machines, and different MapReduce software, namely Hadoop. Our open-source library is written in C++, is callable from C++, C, Fortran, or scripting languages such as Python, and can run on any parallel platform that supports MPI.


Apache HAMA http://blogs.apache.org/hama/


http://homepages.dcc.ufmg.br/~tmacam/publications/macambira2010middleware.pdf
A middleware for parallel processing of large graphs, 2010
With the increasing “data deluge” scientists face today, the analysis and processing of large datasets of structured data is a daring task. Among such data, large graphs are gaining particular importance with the growing interest on social networks and other complex networks. Given the dimensions considered, parallel pro- cessing is essential. However, users are generally not experts in writing parallel code to handle such structures. In this work we present Rendero, a middleware that makes it possible to easily de- scribe graph algorithms in a form adequate for parallel execution. The system is based on the Bulk-Synchronous programming model and offers a vertex-based abstraction. Our current implementation offers good speed-up and scale-up results for large graphs ranging from tens of thousands to millions of vertices and edges in some cases.


http://www.cslab.ntua.gr/~anastop/files/papers/mtaap09dijkstra.pdf
Early Experiences on Accelerating Dijkstra’s Algorithm Using Transactional Memory

In this paper we use Dijkstra’s algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several par- allelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurrent accesses to shared data and compare its perfor- mance with straightforward parallel versions of the algorithm based on traditional synchronization primitives. To increase the granularity of parallelism and avoid excessive synchro- nization, we combine TM with Helper Threading (HT). Our simulation results demonstrate that the straightforward par- allelization of Dijkstra’s algorithm with traditional locks and barriers has, as expected, disappointing performance. On the other hand, TM by itself is able to provide some performance improvement in several cases, while the version based on TM and HT exhibits a significant performance improvement that can reach up to a speedup of 1.46.

GPU-based SSSP

http://www.ijcaonline.org/volume10/number10/pxc3871992.pdf
Performance Improvement in Large Graph Algorithms on GPU using CUDA: An Overview, 2010

The basic operations on the graphs with millions of vertices are common in various applications. To have faster execution of such operations is very essential to reduce overall computation time. Today’s Graphics processing units (GPUs) have high computation power and low price. This device can be treated as an array of Single Instruction Multiple Data (SIMD) processors using CUDA software interface by Nvidia. Massively Multithreaded architecture of a CUDA device makes various threads to run in parallel and hence making optimum use of available computation power of GPU. In case of graph algorithms, vertices of the graphs are processed in parallel by mapping them to various threads on device. By making thousands of threads to run in parallel, computation time required for these algorithms is drastically decreased as compared to their CPU implementation.
We studied different parallel algorithms for Breadth first search, all pairs shortest path that are carried out on GPU using CUDA and make their comparative study with respect to execution time, data structure used, input data etc. In the paper, we presented overview of various parallel methods carried out on GPU using its multithreaded architecture for BFS, APSP by various authors.


http://ppl.stanford.edu/papers/ppopp070a-hong.pdf
Accelerating CUDA Graph Algorithms at Maximum Warp, 2007
Graphs are powerful data representations favored in many compu- tational domains. Modern GPUs have recently shown promising re- sults in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this study, we first observe that the poor performance is caused by work imbal- ance and is an artifact of a discrepancy between the GPU program- ming model and the underlying GPU architecture. We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users. Our method signif- icantly improves the performance of applications with heavily im- balanced workloads, and enables trade-offs between workload im- balance and ALU underutilization for fine-tuning the performance.
Our evaluation reveals that our method exhibits up to 9x speedup over previous GPU algorithms and 12x over single thread CPU execution on irregular graphs. When properly configured, it also yields up to 30% improvement over previous GPU algorithms on regular graphs. In addition to performance gains on graph algo- rithms, our programming method achieves 1.3x to 15.1x speedup on a set of GPU benchmark applications. Our study also confirms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.


http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F6067545%2F6075092%2F06075214.pdf%3Farnumber%3D6075214&authDecision=-203
A modified parallel approach to Single Source Shortest Path Problem for massively dense graphs using CUDA, 2011
Today's Graphics Processing Units (GPUs) possess enormous computation power with highly parallel and multithreaded architecture, and the most attractive feature being their low costs. NVIDIA's CUDA provides an interface to the developers to use the GPUs for General Purpose Parallel Computing. In this paper, we present a modified algorithm of Single Source Shortest Path Problem on GPU using CUDA. First, we modify the standard BELLMAN-FORD algorithm to remove its drawbacks and make it suitable for parallel implementation, and then implement it using CUDA. For dense graphs, our Algorithm gains a speedup of 10×–12× over the previously proposed parallel algorithm. Our Algorithm also accept graphs with negative weighted edges and can detect any reachable Negative Weighted Cycle, which widens its scope to accept generalized problems.


http://www.springerlink.com/content/a254839r565r176p/
CUDA Solutions for the SSSP Problem
We present several algorithms that solve the single-source shortest-path problem using CUDA. We have run them on a database, composed of hundreds of large graphs represented by adjacency lists and adjacency matrices, achieving high speedups regarding a CPU implementation based on Fibonacci heaps. Concerning correctness, we outline why our solutions work, and show that a previous approach [10] is incorrect.


http://ldc.usb.ve/~vtheok/cursos/ci6323/pdf/lecturas/Accelerating%20Large%20Graph%20Algorithms%20on%20the%20GPU%20Using%20CUDA.pdf
Accelerating Large Graph Algorithms on the GPU Using CUDA, Pawan Harish and P.J. Narayanan,

Largegraphsinvolvingmillionsofverticesarecommoninmanyprac- tical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA pro- gramming model. We present a few fundamental algorithms – including breadth first search, single source shortest path, and all-pairs shortest path – using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architec- ture. GPUs have great potential as high-performance co-processors.


----

http://sc.chat-shuffle.net/paper/uid:110006828689
CUDAによる全点対最短経路問題の高速化(アプリケーション高速化,「ハイパフォーマンスコンピューティングとアーキテクチャの評価」に関する北海道ワークショップ(HOKKE-2008))
本論文では全点対最短経路(APSP:All-Pairs Shortest Path)問題をGPU(Graphics Processing Unit)を用いて高速化した結果を述べる.提案手法は,GPUで動作するプログラムをGPU向けの開発環境CUDA(Compute Unified Device Architecture)を用いて記述する.アルゴリズムには単一始点最短経路を繰り返し求める手法(SSSP反復法)を用いる.問題全体での逐次処理を減らしてより高い速度向上を得るために,1っのSSSP問題を1つのタスクとし,それらのタスクを並列処理する.さらに,共有メモリを用いてタスク間でデータを共有し,グローバルメモリの参照を削減する.結果,既存手法よりも3.5〜18倍高速であった.また,SSSP反復法の性能がグラフの特性に依存して変動することを示す.
http://hagi-www.ics.es.osaka-u.ac.jp/research/papers/200812_t-okuyam_ispa.pdf