2011年3月30日水曜日

近況

さてそろそろ新年度に入ります。石井君の SACSIS 2011 に関してはまとまりつつありますが、、実験に関しては再現できるようにドキュメントを残しておいてください。ACS論文誌に関しては、雁瀬君・上野君が鋭意実装・執筆中です。頑張りましょう。

研究室の節電対策、仮想マシンを使ったサーバーの集約化に関しては、上記論文が落ち着いたら考えていきましょう。

インターンに関しては、France INRIA, Tennessee (米国HPC界を牛耳る Jack Dongarra教授がリード), GPMR (MapReduceのマルチGPUを実装したチーム) にコンタクトを取りました。連絡を待ちましょう。また、SanDiego SuperComputer Center, IBM Watson, なども考えています。本人同士の面識があるとベターですが、反応を見ながら一つずつコンタクトを取っていきたいと思います。IBM TRL にしろ、海外のインターンに関して紹介するというのも当たり間なことだと思わないでください。確かな信頼を得た人、結果を出した人がそのような機会が得られるのは当然のことです。



2011年3月24日木曜日

[MapReduce] Mars マルチGPU/マルチノード化・関連論文

 この分野(特にGPU関連)はスピード勝負です。特に以下の IPDPS 2011 (5月) の論文は必見。

Multi-GPU MapReduce on GPU Clusters, IPDPS 2011
http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1051

We present GPMR, our MapReduce library that leverages the power of GPU clusters for large-scale computing. To better utilize the GPU, we modify MapReduce by combining large amounts of map and reduce items into chunks and using partial reductions and accumulation. We use persistent map and reduce tasks and stress aspects of GPMR with a set of standard MapReduce benchmarks. We run these benchmarks on a GPU cluster and achieve desirable speedup and efficiency for all benchmarks. We compare our implementation to the current-best GPU-MapReduce library (runs only on a solo GPU) and a highly-optimized multi-core MapReduce to show the power of GPMR. We demonstrate how typical MapReduce tasks are easily modified to fit into GPMR and leverage a GPU cluster. We highlight how total and relative amounts of communication affect GPMR. We conclude with an exposition on the types of MapReduce tasks well-suited to GPMR, and why some tasks need more modifications than others to work well with GPMR.

Designing efficient sorting algorithms for manycore GPUs, IPDPS 2009
http://portal.acm.org/citation.cfm?id=1587667


Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling, PACT 2010,
http://portal.acm.org/citation.cfm?id=1854337&CFID=14942240&CFTOKEN=86587012

2011年3月16日水曜日

2011年3月12日土曜日

[StreamCloud] 関連論文

Balancing Load in Stream Processing with the Cloud
http://lsds.doc.ic.ac.uk/research/doc/smdb11-cloud_streaming.pdf

Stream processing systems must handle stream data coming from real-time, high-throughput applications, for example in financial trading. Timely processing of streams is important and requires sufficient available resources to achieve high throughput and deliver accurate results. However, static allocation of stream processing resources in terms of machines is inefficient when input streams have significant rate variations— machines remain underutilised for long periods of average load. We present a combined stream processing system that, as the input stream rate varies, adaptively balances workload between a dedicated local stream processor and a cloud stream processor. This approach only utilises cloud machines when the local stream processor becomes overloaded. We evaluate a prototype system with financial trading data. Our results show that it can adapt effectively to workload variations, while only discarding a small percentage of input data.

2011年3月9日水曜日

[StreamCloud] SACSIS採択

石井君の論文が査読付き国内会議 SACSIS2011 に採択されました。おめでとうございます。

[StreamGraph] 情報処理学会ACS論文誌・条件付き採録

雁瀬君が出したACS論文誌が条件付き採録。すべての条件を満たす必要が当然あるが、論文のわかりやすさを明確にするのはもちろんのこと、グラフ分割の再計算、マイグレーションの問題を解く必要がある。

[StreamGPU] 論文の方向性

上野君と議論し、今後2本の論文を出すことにする。

- GPUタスク並列の論文ーSVD と IKA-SST によって評価。IKA-SSTに関しては、カーネル実行とデータ転送のオーバーラップ、複数カーネル実行、データ差分転送による最適化は無しで純粋にタスク並列で勝負

- もう一本は、「異常検知アルゴリズム IKA-SST のGPUによる最適化」これはタスク並列+カーネル実行とデータ転送のオーバーラップ、複数カーネル実行、データ差分転送による最適化あり

2011年3月8日火曜日

A list of international conferences / workshops

[2Q]

- IFIP Performance, 2011/04/23 締め切り
Submission deadline: April 22, 2011
Acceptance notification: June 30, 2011

- Middleware 2011, 2011/5/23 Abstract, 5/30 Paper Submission, 8/5 Notification
The Middleware 2011 proceedings will be published in Lecture Notes in Computer Science (LNCS) both in print and in electronic form. Papers must not exceed 20 pages, including abstract, all figures, tables, and references.

DISC: International Symposium on Distributed Computing 
Registration & abstract submission: April 27th, 2011
Submission: May 1st, 2011
Acceptance notification: June 23rd, 2011


SOCC: ACM Symposium on Cloud Computing

Submission Deadline: April 30, 2011
Acceptance Notification: July 11, 2011


ICSOC
Abstract due : May 27, 2011
Full paper due : June 3, 2011
Notification : August 5, 2011

IMC: Internet Measurement Conference
May 6, 2011, 11:59PM EDT: Registration of title and 250-word abstract
May 13, 2011, 11:59PM EDT: Hard submission deadline
July 22, 2011: Notification

LADIS: ACM SIGOPS & SIGACT Workshop on Large Area Distributed Systems & Middleware
Submissions Due June 12, 2011
Notification of Acceptance :July 15, 2011

ASPLOS
Abstract deadline: Monday, July 18, 2011 (noon, EDT)
Full paper deadline: Monday, July 25, 2011 (noon, EDT)
Rebuttal period: Tuesday–Thursday, October 4–6, 2011
Notification of acceptance: Friday, October 28, 2011

IISWC (2011 IEEE International Symposium on Workload Characterization)
Abstracts Due: June 10, 2011
Paper Submission: June 17, 2011
Acceptance Notification: August 31, 2011

- ICDE 7月
Abstracts due July 12th, 2011
Full papers due July 19th, 2011
Author feedback on September 8th-13th, 2011
Author notification on September 27th, 2011


- CIKM: ACS Internal Conference on Information and Knowledge Management http://www.cikm2011.org/
  • Abstracts due: 17th May, 2011
  • Papers due: 24th May, 2011
  • Notification of Acceptance: 19th July, 2011
[3Q]
CGO: Symposium on Code Generation and Optimization
Abstract Submission: September 13, 2011
Paper Submission: September 20, 2011
Author Rebuttal Period: October 23 to 26, 2011
Notification to Authors: November 1, 2011



EDBT
IEEE INFOCOM
IPDPS
SC

[1Q]
HotCloud:USENIX Workshop on Hot Topics in Cloud Computing
  • Paper submissions due: March 14, 2011, 11:59 p.m. EDT Deadline Extended!
  • Notification of acceptance: April 11, 2011

[その他」
KDD
ICDM
Mobicom
Mobisys
PACT
PLDI
SDM
SIGMOD/PODS
SOSP
Supercomputing
USENIX:
USENIX NSDI
VLDB

SRDS: IEEE International Symposium on Reliable Distributed Systems
PerCom: IEEE International Conference on Pervasive Computing and Communications
MobiQuitous: Mobile and Ubiquitous Systems
HotMobile: Workshop on Mobile Computing Systems and Applications
MDM: International Conference on Mobile Data Management
MobiHoc: ACM International Symposium on Mobile Ad Hoc Networks and Computing
SPAA: ACM Symposium on Parallelism in Algorithms and Architectures
CF: Computing Frontiers
VEE: Virtual Execution Environment

[秋]
- WWW, 12月
- ICDCS 11月後半
- ISPASS 11月20日
- ACM SIGMETRICS 11月初旬
MASCOTS


[1Q]
ACM SIGCOMM 1月
ICS: International Conference on Supercomputing, 1月締め切り

DEBS 2011
Abstracts for Research and Industry papers February 28, 2011 (23.59 GMT)
March 7, 2011 (23.59 GMT)
Tutorials, Research and Industry papers due March 7, 2011 (23.59 GMT)
March 14, 2011 (23.59 GMT)
Author notification for Tutorials, Research,
Industry and Ph.D. Workshop papers April 25, 2011

Abstract submission: 1 April 2011 (strict)
Paper and tool submission: 8 April 2011,

ICWS 1月中旬締め切り
Euro-Par: Parallel and Distributed Computing 1月締め切り

2011年3月5日土曜日

DEBS - ACM Conference on Distributed Event Based System

DEBS 2011 http://debs2011.fzi.de/

過去の DEBS のストリーム関連の論文。

DEBS 2010 http://debs10.doc.ic.ac.uk/
- Scalable, Elastic Distributed Stream Processing, David Alves
- Semantic Quality-Assurance in Distributed and Heterogeneous Stream Processing Systems
- Processing out-of-order event streams in ETALIS
- StreamNetFlux: Birth of Transparent Integrated CEP-DBs (demo)
- Flood: Elastic Streaming MapReduce (demo)
- Placement of Replicated Tasks for Distributed Stream Processing Systems (IBM Watson)
- Workload Characterization for Operator-Based Distributed Stream Processing Applications (IBM Watson)
- Evaluation of Streaming Aggregation on Parallel Hardware Architectures


DEBS 2009 http://debs09.isis.vanderbilt.edu/
- Distributed Event Stream Processing with Non-deterministic Finite Automata
- Implementing Reliable Event Streams in Large Systems via Distributed Data Flows and Recursive Delegation
- Processing Publish/Subscribe Queries over Distributed Data Streams (

DEBS 2008 http://debs08.dis.uniroma1.it/
- Replica Placement for High Availability in Distributed Stream Processing Systems
- Real-Time, Load-Adaptive Processing of Continuous Queries Over Data Streams

DEBS 2007 http://www.debs.msrg.utoronto.ca/
- Mythbusters: event stream processing versus complex event processing, Invited Talk, Tim Bass
- High frequency distributed data stream event correlation to improve neonatal clinical management
- A practical approach for enabling online analysis of event streams
- Persisting and querying biometric event streams with hybrid relational-XML DBMS

2011年3月3日木曜日

良い論文を書く

 良い論文を書く(=国際会議に通す)上で必要なことを書きます。

(1) 実装を始める前に、徹底的に既存研究を調査し、それとの差異を明確にする。差異が明確にできたら、深い実装を始める前にスライドに一つの筋書きが書けるはずです。研究によっては、少し実装を始めなければわからないこともあるでしょう。この際にはできるだけ、クイックにプロトタイプ実装し、iterative に(1) を繰り返していきましょう。

(2) 実装中には、新たな技術的発見、既存研究との差異をより明確化するような要素を見つけてください。その研究の価値が集りますし、新規性を主張しやすくなります。

(3) 実装が終了した後は、徹底的な定性的・定量的評価をしてください。(a) 評価対象アプリケーション・シナリオの豊富さ(マイクロベンチマーク、実アプリケーションを含めて3つが定石)、(b) 提案手法の優位性を示すデータ、(c) (b)に関するプロファイリング(なぜ良くなったか、なぜ悪いかを定量的に示す)(d) 類似技術との比較、の4点セットが必須です。特にほっとして(3)が poor になりがちですが、非常に大切です。

論文は最終的に国際会議(査読付き)に通してこそ、その研究が終了したと思ってください。(1) の段階で研究の方向性を変えていくことは悪いことではありません。むしろ研究の方向性が間違ったまま突っ走ることこそ良くありません。但し、(2) まで進んだ際には必ずその研究をしっかりと完遂していきましょう。

関連エントリ

[StreamCloud] 情報処理学会全国大会発表終了

石井君の情報処理学会全国大会@東工大の発表が終了。お疲れ様でした。

以下、指摘された点などを含めて、今後やるべきことを列挙
- 負荷予測アルゴリズム(SDAR)の向上とそれと組み合わせた実験を行う。負荷予測アルゴリズムの説明は必須。松浦君の StreamDS に関するロードバランシングに関する論文を読む。
- データストリーム処理以外の研究分野における類似技術の列挙とそれらとの差異
- Timeslot の概念と、 Amazon EC2 が現在定めている1時間単位の課金、の乖離の説明及び対応
- どのようなリアルタイム性が必要なアプリケーションかを定量的に明確にする
- 評価アプリケーションの種類を増やす。実験の大規模化。


[StreamCloud] Amazon 東京上陸

Amazon のデータセンターが東京にも作られました。
http://aws.amazon.com/jp/about-aws/whats-new/2011/03/02/announcing-asia-pacific-tokyo-region/

[StreamDS] 負荷予測手法に関する論文

以下は負荷予測手法に関する関連論文3本。

Trace-based evaluation of job runtime and queue wait time predictions in grids
http://portal.acm.org/citation.cfm?id=1551609.1551632

Large-scale distributed computing systems such as grids are serving a growing number of scientists. These environments bring about not only the advantages of an economy of scale, but also the challenges of resource and workload heterogeneity. A consequence of these two forms of heterogeneity is that job runtimes and queue wait times are highly variable, which generally reduces system performance and makes grids difficult to use by the common scientist. Predicting job runtimes and queue wait times have been widely studied for parallel environments. However, there is no detailed investigation on how the proposed prediction methods perform in grids, whose resource structure and workload characteristics are very different from those in parallel systems. In this paper, we assess the performance and benefit of predicting job runtimes and queue wait times in grids based on traces gathered from various research and production grid environments. First, we evaluate the performance of simple yet widely used time series prediction methods and the effect of applying them to different types of job classes (e.g., all jobs submitted by single users or to single sites). Then, we investigate the performance of two kinds of queue wait time prediction methods for grids. Last, we investigate whether prediction-based grid-level scheduling policies can have better performance than policies that do not use predictions.


Swift: Fast, Reliable, Loosely Coupled Parallel Computation
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.519

A common pattern in scientific computing involves the execution of many tasks that are coupled only in the sense that the output of one may be passed as input to one or more others—for example, as a file, or via a Web Services invocation. While such “loosely coupled” computations can involve large amounts of computation and communication, the concerns of the programmer tend to be different than in traditional high performance computing, being focused on management issues relating to the large numbers of datasets and tasks (and often, the complexities inherent in “messy ” data organizations) rather than the optimization of interprocessor communication. To address these concerns, we have developed Swift, a system that combines a novel scripting language called SwiftScript with a powerful runtime system based on CoG Karajan and Falkon to allow for the concise specification, and reliable and efficient execution, of large loosely coupled computations. Swift adopts and adapts ideas first explored in the GriPhyN virtual data system, improving on that system in many regards. We describe the SwiftScript language and its use of XDTM to describe the logical structure of complex file system structures. We also present the Swift system and its use of CoG Karajan, Falkon, and Globus services to dispatch and manage the execution of many tasks in different execution environments. We summarize application experiences and detail performance experiments that quantify the cost of Swift operations. 1.


Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
http://portal.acm.org/citation.cfm?id=1755940

As organizations start to use data-intensive cluster computing systems like Hadoop and Dryad for more applications, there is a growing need to share clusters between users. However, there is a conflict between fairness in scheduling and data locality (placing tasks on nodes that contain their input data). We illustrate this problem through our experience designing a fair scheduler for a 600-node Hadoop cluster at Facebook. To address the conflict between locality and fairness, we propose a simple algorithm called delay scheduling: when the job that should be scheduled next according to fairness cannot launch a local task, it waits for a small amount of time, letting other jobs launch tasks instead. We find that delay scheduling achieves nearly optimal data locality in a variety of workloads and can increase throughput by up to 2x while preserving fairness. In addition, the simplicity of delay scheduling makes it applicable under a wide variety of scheduling policies beyond fair sharing.


Generating Adaptation Policies for Multi-tier Applications in Consolidated Server Environments
http://portal.acm.org/citation.cfm?id=1444674
Creating good adaptation policies is critical to building complex autonomic systems since it is such policies that define the system configuration used in any given situation. While online approaches based on control theory and rule-based expert systems are possible solutions, each has its disadvantages. Here, a hybrid approach is described that uses modeling and optimization offline to generate suitable configurations, which are then encoded as policies that are used at runtime. The approach is demonstrated on the problem of providing dynamic management in virtualized consolidated server environments that host multiple multi-tier applications. Contributions include layered queuing models for Xen-based virtual machine environments, a novel optimization technique that uses a combination of bin packing and gradient search, and experimental results that show that automatic offline policy generation is viable and can be accurate even with modest computational effort.