2013年12月27日金曜日

TurboGraph

 TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
http://dl.acm.org/citation.cfm?id=2487581

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, calledpin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files.

2013年12月2日月曜日

System Software Papers in ICWS 2012

Here is a list of papers on system softwares in ICWS 2012. A keyword "service" is widely used in this conference, so important thing is how our efforts could be abstracted out towards the combination with "Service".

  • Highly Resilient Systems for Cloud
  • Parallel Computing Framework as a Cloud Service
  • Overcoming Large Data Transfer Bottlenecks in RESTful Service Orchestrations, ICWS 2012
  • RESTful Web Service Mashup Based Coal Mine Safety Monitoring and Control Automation with Wireless Sensor Network, , ICWS 2012
  • Intelligent Database Placement in Cloud Environment, , ICWS 2012
  • Andes: A Highly Scalable Persistent Messaging System , ICWS 2012
  • Enabling Advanced Loading Strategies for Data Intensive Web Services, ICWS 2012
  • Green Web Services: Modeling and Estimating Power Consumption of Web Services, ICWS 2012
  • A Network Coordinate Based Web Service Positioning Framework for Response Time Prediction, ICWS 2012
  • Disk-Offload Middleware for Web-Services Using the Application-Caching Paradigm, ICWS 2012
  • Scaling Spatial Alarm Services on Road Networks, ICWS 2012

2013年12月1日日曜日

ICWS 2014

I have been invited as a program committee member for ICWS 2014. Over the past several years, I was involved with some projects related to service oriented computing by focusing more on system software level optimization methods such as differential parsing/deserialization method for high performance XML processing. By just reading through a series of titles in ICWS 2013, most of the researches do not really sound attractive in a practical setting in real world. They still find out a way on how to formulate composite web services in an automatic manner, which is a long standing problem and never solved.  That's not really what we should pursue in this research area.

However we can not really ignore this conference since it is treated as a top-tier conference. So what do we do in this context ? We have been pursing on big data / graph processing, stream computing, large-scale agent simulation technology, and so forth. Sometimes you can find a good research topic by looking at the boarder space between two different research areas. In that way, we can start new research topic. So what are they ?