2010年9月3日金曜日

[StreamGraph] Proximity Tracking on Time-Evolving Bipartite Graphs

Hanghang Tong, Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos, Proximity Tracking on Time-Evolving Bipartite Graphs, SDM 2008, Atlanta, USA. [PDF]   Best paper award

http://www.siam.org/proceedings/datamining/2008/dm08_64_Tong.pdf


Given an author-conference network that evolves over time,
which are the conferences that a given author is most closely
related with, and how do they change over time? Large
time-evolving bipartite graphs appear in many settings, such
as social networks, co-citations, market-basket analysis, and
collaborative filtering.
Our goal is to monitor (i) the centrality of an individual
node (e.g., who are the most important authors?); and
(ii) the proximity of two nodes or sets of nodes (e.g., who
are the most important authors with respect to a particular
conference?) Moreover, we want to do this efficiently and
incrementally, and to provide “any-time” answers. We propose
pTrack and cTrack, which are based on random walk
with restart, and use powerful matrix tools. Experiments on
real data show that our methods are effective and efficient:
the mining results agree with intuition; and we achieve up to
15∼176 times speed-up, without any quality loss

[StreamGraph] グラフアルゴリズムの GPU による高速化

Accelerating large graph algorithms on the GPU using CUDA
http://cvit.iiit.ac.in/papers/Pawan07accelerating.pdf


fast GPU algorithm for graph connectivity
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5470817

Collaborative Filtering with Temporal Dynamics

http://research.yahoo.com/pub/2824 こちらも必見

Collaborative Filtering with Temporal Dynamics
Authors:
Koren, Y.
Source:
KDD 2009, ACM, Paris, France (2009)
Abstract:
Customer preferences for products are drifting over time. Product perception and popularity are constantly changing as new selection emerges. Similarly, customer inclinations are evolving, leading them to ever redefine their taste. Thus, modeling temporal dynamics should be a key when designing recommender systems or general customer preference models. However, this raises unique challenges. Within the eco-system intersecting multiple products and customers, many different characteristics are shifting simultaneously, while many of them influence each other and often those shifts are delicate and associated with a few data instances. This distinguishes the problem from concept drift explorations, where mostly a single concept is tracked. Classical time-window or instance-decay approaches cannot work, as they lose too much signal when discarding data instances. A more sensitive approach is required, which can make better distinctions between transient effects and long term patterns. The paradigm we offer is creating a model tracking the time changing behavior throughout the life span of the data. This allows us to exploit the relevant components of all data instances, while discarding only what is modeled as being irrelevant. Accordingly, we revamp two leading collaborative filtering recommendation approaches. Evaluation is made on a large movie rating dataset by Netflix. Results are encouraging and better than those previously reported on this dataset.
Notes:
Won KDD'09 Best Research Paper award

2010年9月1日水曜日

[StreamGraph] Pregel: a system for large-scale graph processing

SIGMOD 2010で発表された Google の論文。必見。

http://portal.acm.org/citation.cfm?doid=1807167.1807184

Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

有害 Tweet リアルタイム監視

データストリーム処理が応用できる例

http://japan.internet.com/busnews/20100831/5.html

"利用者のツイートが掲載される企業のキャンペーンページなどに無関係な宣伝や公序良俗に反するツイートが投稿された際は、リアルタイムでブロックしたり、メールでアラート"
"オプションで、ハッシュタグや一般キーワードの監視、商品購入を薦めるフレーズなどの自動ピックアップ機能"