2011年3月24日木曜日
[MapReduce] Mars マルチGPU/マルチノード化・関連論文
Multi-GPU MapReduce on GPU Clusters, IPDPS 2011
http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1051
We present GPMR, our MapReduce library that leverages the power of GPU clusters for large-scale computing. To better utilize the GPU, we modify MapReduce by combining large amounts of map and reduce items into chunks and using partial reductions and accumulation. We use persistent map and reduce tasks and stress aspects of GPMR with a set of standard MapReduce benchmarks. We run these benchmarks on a GPU cluster and achieve desirable speedup and efficiency for all benchmarks. We compare our implementation to the current-best GPU-MapReduce library (runs only on a solo GPU) and a highly-optimized multi-core MapReduce to show the power of GPMR. We demonstrate how typical MapReduce tasks are easily modified to fit into GPMR and leverage a GPU cluster. We highlight how total and relative amounts of communication affect GPMR. We conclude with an exposition on the types of MapReduce tasks well-suited to GPMR, and why some tasks need more modifications than others to work well with GPMR.
Designing efficient sorting algorithms for manycore GPUs, IPDPS 2009
http://portal.acm.org/citation.cfm?id=1587667
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling, PACT 2010,
http://portal.acm.org/citation.cfm?id=1854337&CFID=14942240&CFTOKEN=86587012
2011年2月16日水曜日
MapReduce の GPU 処理系 Mars のマルチGPU・マルチノード化
GIM-V 実装は MapReduce の一アプリケーションとして考えられるので、より汎用的に考えて、「MapReduce プログラミングモデルの GPU 処理系 Mars のマルチGPU・マルチノード化」をまず実現することが先決でしょう。
また、当然、HDFS/Gfarm などの分散ファイルシステムとの協調も重要ですが、これに関しては他のアプリケーションが重要でしょう。実装に関しては以下の3通りが考えられます。
(1) X10 で実装する。X10 はそもそも上記の目的を達成するための言語です。ただし、CUDA のコード生成を当然最適化する必要があるでしょう。
(2) CUDA を用いてスクラッチから実装する. 分散対応に関してはデザインが必要
(3) OpenCL を用いてスクラッチから実装する. 分散対応に関してはデザインが必要
2011年1月4日火曜日
Hadoop 上の機械学習ライブラリ Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
High performance java collections (previously colt collections)
A vibrant community
2010年11月25日木曜日
VLDB 2010
Dawei Jiang (National University of Singapore, Republic of Singapore), Beng Chin Ooi (National University of Singapore, Republic of Singapore), Lei Shi (National University of Singapore, Republic of Singapore), Sai Wu (National University of Singapore, Republic of Singapore)
Willis Lang (University of Wisconsin-Madison, United States of America), Jignesh Patel (University of Wisconsin-Madison, United States of America)
Bogdan Alexe (University of California Santa Cruz, United States of America), Mauricio Hernández (IBM Research, United States of America), Lucian Popa (IBM Almaden Research Center, United States of America), Wang-Chiew Tan (University of California Santa Cruz, United States of Americ
Wook-Shin Han (Kyungpook National University, Republic of Korea), Jinsoo Lee (Kyungpook National University, Republic of Korea), Minh-Duc Pham (Kyungpook National University, Republic of Korea), Jeffrey Yu (The Chinese University of Hong Kong, People’s Republic of China)
Peixiang Zhao (University of Illinois at Urbana-Champaign, United States of America), Jiawei Han (University of Illinois at Urbana-Champaign, United States of America)
Sergey Melnik (Google, United States of America), Andrey Gubarev (Google, United States of America), Jing Jing Long (Google, United States of America), Geoffrey Romer (Google, United States of America), Shiva Shivakumar (Google, United States of America), Matt Tolton (Google, United States of America), Theo Vassilakis (Google, United States of America)
Yingyi Bu (University of Washington, United States of America), Bill Howe (University of Washington, United States of America), Magdalena Balazinska (University of Washington, United States of America), Michael Ernst (University of Washington, United States of America)
p.264: Graph Pattern Matching: From Intractable to Polynomial Time
Wenfei Fan (University of Edinburgh, United Kingdom), Jianzhong Li (Harbin Institute of Technology, People’s Republic of China), Shuai Ma (University of Edinburgh, United Kingdom), Nan Tang (University of Edinburgh, United Kingdom), Yinghui Wu (University of Edinburgh, United Kingdom), Yunpeng Wu (National University of Defense Technology, People’s Republic of China)
p.276: GRAIL: Scalable Reachability Index for Large Graphs
Hilmi Yildirim (Rensselaer Polytechnic Institute, United States of America), Vineet Chaoji (Yahoo! Research Labs, United States of America), Mohammed Zaki (Rensselaer Polytechnic Institute, United States of America
p.220: High-Performance Dynamic Pattern Matching over Disordered Streams
Badrish Chandramouli (Microsoft Research, United States of America), Jonathan Goldstein (Microsoft Research, United States of America), David Maier (Portland State University, United States of America)
p.232: SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems
Irina Botan (Eidgenössische Technische Hochschule Zürich, Switzerland), Roozbeh Derakhshan (Eidgenössische Technische Hochschule Zürich, Switzerland), Nihal Dindar (Eidgenössische Technische Hochschule Zürich, Switzerland), Laura Haas (IBM Almaden Research Center, United States of America), Renée Miller (University of Toronto, Canada), Nesime Tatbul (Eidgenössische Technische Hochschule Zürich, Switzerland)
p.244: Recognizing Patterns in Streams with Imprecise Timestamps
Haopeng Zhang (University of Massachusetts Amherst, United States of America), Yanlei Diao (University of Massachusetts Amherst, United States of America), Neil Immerman (University of Massachusetts Amherst, United States of America)
2010年11月7日日曜日
[StreamGraph] Fast Counting of Triangles in Large Real Networks: Algorithms and Laws
Fast Counting of Triangles in Large Real Networks: Algorithms and Laws
How can we quickly nd the number of triangles in a large graph, without actually counting them? Triangles are important for real world social networks, lying at the heart of the clustering coe cient and of the transitivity ratio. However, straight-forward and even approximate counting algorithms can be slow, trying to execute or approximate the equivalent of a 3-way
database join. In this paper, we provide two algorithms, the Eigen-Triangle for counting the total number of triangles in a graph, and the EigenTriangleLocal algorithm that gives the count of triangles that contain a desired node. Additional contributions include the following: (a) We show that both algorithms achieve excellent accuracy, with up to 1000x faster execution time, on several, real graphs and (b) we discover two new power laws ( Degree-Triangle and TriangleParticipa-
tion laws) with surprising properties.
MapReduce 上のグラフアルゴリズム
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Schatz_MLG2010.pdf
Jimmy Lin, Michael Schatz, University of Maryland
Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of protein-protein interaction networks, and privacy analysis of social networks. MapReduce provides an enabling technology for large-scale graph processing. However, there appears to be a paucity of knowledge on designing scalable graph algorithms. Existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. We present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%.
Graph Processing with MapReduce
http://horicky.blogspot.com/2010/07/graph-processing-in-map-reduce.html
Twister: Iterative MapReduce
Twister http://www.iterativemapreduce.org/
2010年11月6日土曜日
[StreamGraph] Max-Cover in Map-Reduce
http://research.google.com/pubs/pub36582.html
The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to maximize the size of the union. This classic problem occurs commonly in many settings in web search and advertising. For moderately-sized instances, a greedy algorithm gives an approximation of (1-1/e). However, the greedy algorithm requires updating scores of arbitrary elements after each step, and hence becomes intractable for large datasets. We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm.
Apache HAMA
Apache Hama is a distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations (e.g., matrix, graph, network, ..., etc), currently being incubated as one of the incubator project by the Apache Software Foundation.
Paper
http://salsahpc.indiana.edu/CloudCom2010/papers.html
DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining
PDF http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.163.1677&rep=rep1&type=pdf
Huge datasets are becoming prevalent; even as researchers, we now routinely have to work with datasets that are up to a few terabytes in size. Interesting real-world applications produce huge volumes of messy data. The mining process involves several steps, starting from pre-processing the raw data to estimating the final models. As data become more abundant, scalable and easy-to-use tools for distributed processing are also emerging. Among those, Map-Reduce has been widely embraced by both academia and industry.
[StreamGraph] PEGASUS (Peta Graph Mining System)
Paper URL ->
http://www.cs.cmu.edu/~ukang/papers/PegasusICDM2009.pdf
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations
In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. Many graph mining operations (PageRank, spectral clustering,diameter estimation, connected components etc.) are essentially a repeated matrix-vector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines (b) linear running time on thenumber of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with 6,7 billion edges.
2010年11月5日金曜日
MapReduce Online
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this paper, we propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop, and can run unmodified user-defined MapReduce programs.
2010年10月14日木曜日
MapReduce 関係の最適化
http://personals.ac.upc.edu/dcarrera/papers/ICPP09.pdf
Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System, Richard M. Yoo
http://csl.stanford.edu/~christos/publications/2009.scalable_phoenix.iiswc.pdf
Phoenix Project at Stanford University
http://mapreduce.stanford.edu/
Phoenix Project Slides
Deduce : 前のエントリ