2013年12月27日金曜日

TurboGraph

 TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
http://dl.acm.org/citation.cfm?id=2487581

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, calledpin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files.

2013年12月2日月曜日

System Software Papers in ICWS 2012

Here is a list of papers on system softwares in ICWS 2012. A keyword "service" is widely used in this conference, so important thing is how our efforts could be abstracted out towards the combination with "Service".

  • Highly Resilient Systems for Cloud
  • Parallel Computing Framework as a Cloud Service
  • Overcoming Large Data Transfer Bottlenecks in RESTful Service Orchestrations, ICWS 2012
  • RESTful Web Service Mashup Based Coal Mine Safety Monitoring and Control Automation with Wireless Sensor Network, , ICWS 2012
  • Intelligent Database Placement in Cloud Environment, , ICWS 2012
  • Andes: A Highly Scalable Persistent Messaging System , ICWS 2012
  • Enabling Advanced Loading Strategies for Data Intensive Web Services, ICWS 2012
  • Green Web Services: Modeling and Estimating Power Consumption of Web Services, ICWS 2012
  • A Network Coordinate Based Web Service Positioning Framework for Response Time Prediction, ICWS 2012
  • Disk-Offload Middleware for Web-Services Using the Application-Caching Paradigm, ICWS 2012
  • Scaling Spatial Alarm Services on Road Networks, ICWS 2012

2013年12月1日日曜日

ICWS 2014

I have been invited as a program committee member for ICWS 2014. Over the past several years, I was involved with some projects related to service oriented computing by focusing more on system software level optimization methods such as differential parsing/deserialization method for high performance XML processing. By just reading through a series of titles in ICWS 2013, most of the researches do not really sound attractive in a practical setting in real world. They still find out a way on how to formulate composite web services in an automatic manner, which is a long standing problem and never solved.  That's not really what we should pursue in this research area.

However we can not really ignore this conference since it is treated as a top-tier conference. So what do we do in this context ? We have been pursing on big data / graph processing, stream computing, large-scale agent simulation technology, and so forth. Sometimes you can find a good research topic by looking at the boarder space between two different research areas. In that way, we can start new research topic. So what are they ? 

2013年7月9日火曜日

道路ネットワーク縮退化による高速化

金刺君の実験により、Open Street Map の冗長な道路ネットワークから、必要のない交差点を統合すると、かなり高速化できることが確認できた。現状はナイーブな方法(一本道は統合)だが、精度を加味して、動的に統合することも面白いだろう。単に統合するのではなく、仮想的な道路を作るのでも良い。また統合することにより、車両の位置(緯度、経度)が不正確になるが、これはマッピング情報を作るべき。


2013年6月6日木曜日

Facebook's TAO

TAO: Facebook’s Distributed Data Store for the Social Graph
We introduce a simple data model and API tailored for serving the social graph, and TAO, an implementation of this model. TAO is a geographically distributed data store that provides efficient and timely access to the social graph for Facebook’s demanding workload using a fixed set of queries. It is deployed at Facebook, replacing memcache for many data types that fit its model. The system runs on thousands of machines, is widely distributed, and provides access to many petabytes of data. TAO can process a billion reads and millions of writes each second
https://www.usenix.org/conference/atc13/tao-facebook%E2%80%99s-distributed-data-store-social-graph

2013年5月30日木曜日

Graph Applications in Cybersecurity Domain

Large Scale Graph Analytics and Randomized Algorithms for Applications
in Cybersecurity
http://dl.acm.org/citation.cfm?id=2459984
http://www.graphanalysis.org/SIAM-CSE13/02_Johnson.pdf

A network hacking attack in which hackers repeatedly steal password hashes and move through a computer network with the goal of reaching a computer with high level administrative privileges is known as a pass-the-hash attack. In this paper we apply graph coarsening on graphs obtained from computer network data for the purpose of (a) detecting hackers using this attack and (b) assessing the risk level of the network's current state. We repeatedly contract edges (obtaining a graph minor), which preserves the existence of paths in the graph, and take powers of the adjacency matrix to count the paths. This allows us to detect the existence of paths as well as find paths that have high risk of being exploited by adversaries.


W. Eberle and L. Holder. Applying graph-based anomaly detection approaches to the discovery of insider threats. In IEEE International Conference on Intelligence and Security Informatics (ISI), 2009.

D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. In Proc. 40th Annual ACM Symposium on Theory of Computing, 2008.


S. Jajodia, S. Noel, and B. O'Berry. Topological analysis of network attack vulnerability. In V. Kumar, J. Srivastava, and A. Lazarevic, editors, Managing Cyber Threats: Issues, Approaches and Challenges, pages 248--266. Kluwer Academic Publisher, 2005.


http://vulcan.ee.iastate.edu/~gmani/personal/papers/journals/IEEE-PS-08.pdf

Measuring Security Risk of Networks Using Attack Graphs
http://users.encs.concordia.ca/~wang/papers/ijngc10.pdf

U Kang's Research Goal Statement
http://www.cs.cmu.edu/~ukang/ukang-research.pdf

http://www.eecs.wsu.edu/~holder/pubs/EberleCATCH09.pdf

2013年5月22日水曜日

Cassovary: Twitter's Graph Processing Library

http://engineering.twitter.com/2012/03/cassovary-big-graph-processing-library.html

2013年4月21日日曜日