2010年12月24日金曜日
[StreamGraph] Mining Large Time-evolving Data Using Matrix and Tensor Tools, SIGMOD 20007 tutorial
http://www.cs.cmu.edu/~christos/TALKS/SIGMOD-07-tutorial/
How can we find patterns in sensor streams (eg., a sequence of temperatures, water-pollutant measurements, or machine room measurements)? How can we mine Internet traffic graph over time? Further, how can we make the process incremental? We review the state of the art in four related fields: (a) numerical analysis and linear algebra (b) multi-linear/tensor analysis (c) graph mining and (d) stream mining. We will present both theoretical results and algorithms as well as case studies on several real applications. Our emphasis is on the intuition behind each method, and on guidelines for the practitioner.
[StreamGraph] 既存のインクリメンタルグラフマイニング
2. Incremental Pattern Discovery on Streams, Graphs and Tensors, 2007
2010年12月23日木曜日
[StreamGraph] インクリメンタルな Proximity 計算
http://portal.acm.org/citation.cfm?id=1390269
In this paper we investigate two aspects of ranking problems on large graphs. First, we augment the deterministic pruning algorithm in Sarkar and Moore (2007) with sampling techniques to compute approximately correct rankings with high probability under random walk based proximity measures at query time. Second, we prove some surprising locality properties of these proximity measures by examining the short term behavior of random walks. The proposed algorithm can answer queries on the fly without caching any information about the entire graph. We present empirical results on a 600, 000 node author-word-citation graph from the Citeseer domain on a single CPU machine where the average query processing time is around 4 seconds. We present quantifiable link prediction tasks. On most of them our techniques outperform Personalized Pagerank, a well-known diffusion based proximity measure.
[StreamGraph] Incremental PageRank 関連論文
http://homepages.inf.ed.ac.uk/miles/msc-projects/abdullah.pdf
Fast Incremental and Personalized PageRank
http://www.citeulike.org/user/seeding-films/article/7795698
2010年12月19日日曜日
[StreamGraph] RWR の並列化・高速化
2010年12月16日木曜日
[StreamGraph] 本プロジェクトの方向性
あえて、上記で、「リアルタイム性」という表現を使って、データストリーム処理というキーワードを使わなかったかだが、それはGIM-Vの計算モデルを、GPUのようなアクセラレータを用いて既存のバッチ処理モデルを超並列処理してリアルタイム性を実現する手法、なども考えられるからだ。また、データストリーム処理というコンテキストで言えば、(1)インクリメンタルな差分更新による処理の効率化を実現する手法、(2)ヒューリスティックス(例:雁瀬くんの二次の隔たりの特性を生かしたグラフ分割と処理の分散)を用いた性能最適化、(3)メモリに載りきらないようなグラフストリームが到着した際に Persistent な分散ファイルシステムと協調し階層的にグラフ構造を格納、頻繁にアクセスされる部分グラフのみをオンメモリ上に確保するといった最適化手法、そしてそれらのグラフデータに対して開発者からシームレスにアクセスできるような処理基盤、と言った研究テーマが考えられるでしょう。
上記の研究テーマを遂行するにあたって、まずはとっかかりとして、バッチ処理の統一的な処理モデルとして提案されているGIM-Vおよびそのリファレンス実装であるPEGASUSを軸に、上記テーマの実現手法を考えていきましょう。例えば、このブログの前の記事に Incremental PageRank の論文を載せましたが、それをPEGASUS上(もしくは System S 上で)で実装してみて、GIM-V,プログラミングモデル、処理系の gap を分析し、上記(2)や(3)((1)もできれば)を自然に表現できるモデルを提案できると良い研究成果につながると思います。
2010年12月15日水曜日
[StreamGraph] 時間的に変化する動的グラフに関する研究
Mining Temporally Evolving Graphs, 2004
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.6282
>Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that include Web Search, Web Classification, Web Personalization etc. Most research on Web mining has been from a ‘data-centric ’ point of view. The focus has been primarily on developing measures and applications based on data collected from content, structure and usage of Web till a particular time instance. In this project we examine another dimension of Web Mining, namely temporal dimension. Web data has been evolving over time, reflecting the ongoing trends. These changes in data in the temporal dimension reveal new kind of information. This information has not captured the attention of the Web mining research community to a large extent. In this paper, we highlight the significance of studying the evolving nature of the Web graphs. We have classified the approach to such problems at three levels of analysis: single node, sub-graphs and whole graphs. We provide a framework to approach problems of this kind and have identified interesting problems at each level. Our experiments verify the significance of such analysis and also point to future directions in this area. The approach we take is generic and can be applied to other domains, where data can be modeled as graph, such as network intrusion detection or social networks.
[StreamGraph] インクリメンタルなネットワーク分析手法
2010年12月14日火曜日
[Data Distribution] 投稿予定
Technical Papers Due: | 17 January 2011 |
PAPER DEADLINE EXTENDED: | 24 January 2011 at 12:01 PM (NOON) Eastern Time |
Author Notifications: | 28 February 2011 |
Paper Submission: | January 31, 2011 |
Paper Submission to highlights track: | March 4, 2011 |
Paper Notification: | March 21, 2011 |
[StreamCloud] 学会投稿予定
[StreamGraph] Graph 500 Update
2010年12月10日金曜日
[StreamGraph] SNAP (Stanford Network Analysis Package)
[StreamGraph] Jon Kleinberg's Survey
- Survey Paper
- J. Kleinberg. Temporal Dynamics of On-Line Information Streams. Draft chapter for the forthcoming book Data Stream Management: Processing High-Speed Data Streams (M. Garofalakis, J. Gehrke, R. Rastogi, eds.), Springer.
- Survey Paper
- Short Time Scales: Usage Data and Bursty Dynamics
- L.R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In Proc. IEEE, Vol. 77, No. 2, pp. 257-286, Feb. 1989
- J. Allan, J.G. Carbonell, G. Doddington, J. Yamron, Y. Yang, Topic Detection and Tracking Pilot Study: Final Report. Proc. DARPA Broadcast News Transcription and Understanding Workshop, Feb. 1998.
- R. Swan, J. Allan, Automatic generation of overview timelines. Proc. SIGIR Intl. Conf. on Research and Development in Information Retrieval, 2000.
- S. Havre, B. Hetzler, L. Nowell, ThemeRiver: Visualizing Theme Changes over Time. Proc. IEEE Symposium on Information Visualization, 2000.
- J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.
- J. Aizen, D. Huttenlocher, J. Kleinberg, A. Novak. Traffic-Based Feedback on the Web. Proceedings of the National Academy of Sciences 101(Suppl.1):5254-5260, 2004.
- R. Kumar, J. Novak, P. Raghavan, A. Tomkins. On the bursty evolution of Blogspace. Proc. International WWW Conference, 2003.
- Y. Zhu and D. Shasha. Efficient Elastic Burst Detection in Data Streams. Proc. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
- E. Gabrilovich, S. Dumais, E. Horvitz. NewsJunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty. Proceedings of the Thirteenth International World Wide Web Conference. May 2004.
- M. Vlachos, C. Meek, Z. Vagena, D. Gunopulos. Identifying Similarities, Periodicities and Bursts for Online Search Queries. Proc. ACM SIGMOD International Conference on Management of Data, 2004.
- A..-L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature 435, 207-211 (2005).
- Micah Dubinko, Ravi Kumar, Joseph Magnani, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins. Visualizing Tags over Time. WWW2006 Conference. See also the demo of flickr tag visualization.
- Xuerui Wang and Andrew McCallum. Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. Conference on Knowledge Discovery and Data Mining (KDD) 2006.
- Xuanhui Wang, ChengXiang Zhai, Xiao Hu, and Richard Sproat, Mining Correlated Bursty Topic Patterns from Coordinated Text Streams. Proceedings of the 2007 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07 ), pages 784-793.
- Short Time Scales: Usage Data and Bursty Dynamics
Temporal Analysis and Bursty Phenomena
- J. Leskovec, L. Backstrom, J. Kleinberg. Meme-tracking and the dynamics of the news cycle. Proc. 15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2009.
- See also the commpanying MemeTracker site, which uses the ideas in this paper to visualize the news cycle.
- L. Backstrom, J. Kleinberg, R. Kumar. Optimizing Web traffic via the Media Scheduling Problem. Proc. 15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2009.
- J. Kleinberg. Temporal Dynamics of On-Line Information Streams. In Data Stream Management: Processing High-Speed Data Streams, (M. Garofalakis, J. Gehrke, R. Rastogi, eds.), Springer, 2004.
- J. Aizen, D. Huttenlocher, J. Kleinberg, A. Novak. Traffic-Based Feedback on the Web. Proceedings of the National Academy of Sciences 101(Suppl.1):5254-5260, 2004. (Also available in pre-print form.)
- J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.
- See also some sample results from the burst detection algorithm described in this paper.
[StreamGraph] スタンフォード大の公開データ
[StreamGraph] Facebook Data
[StreamGraph] Livedoor Data
[StreamGraph] データストリームに対する相関ルールを用いたコミュニティの時系列解析
Body sensor data processing using stream computing
Body sensor data processing using stream computing
2010年12月9日木曜日
2010年12月8日水曜日
[StreamGraph] 食べログ
[StreamGraph] Mixi Graph API
[StreamGraph] Facebook Graph API
Facebook Graph API を用いれば実データは取れるそうだ。
------
The announcement was glossed over during Facebook’s many launches at f8 yesterday, but the company has introduced ways for developers to conduct a variety of searches for content in its Graph API. This could benefit any number of search companies and other developers who are looking for lots of public data to aggregate and organize.
The Open Graph allows applications, Pages, web sites and other software services to add Facebook features, like the “Like” button, to their own sites. Taking actions on other sites results in those actions being shared with your friends on Facebook, and may allow friends on those sites to see what you’re doing, too.
Many users have their privacy settings configured to allow status updates, likes and other activity to be public. Its terminology for this data is “public objects.” The generic search format is: https://graph.facebook.com/search?q=QUERY&type=OBJECT_TYPE.
Here are the specific types of searches that can be done:
News feeds of individuals can be searched, too, but you’ll need authorization. Here’s an example:
News Feed: https://graph.facebook.com/me/home?q=facebook
This search is restricted to public data about users and their immediate friends (not, say, their friends of friends), in keeping with Facebook’s existing data-sharing policies.