2012年1月11日水曜日

NIMBLE: A Toolkit for the Implementation of Parallel Data Mining and Machine Learning Algorithms on MapReduce

NIMBLE: A Toolkit for the Implementation of Parallel Data
Mining and Machine Learning Algorithms on MapReduce
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p334.pdf


In the last decade, advances in data collection and storage technologies have led to an increased interest in designing and implementing large-scale parallel algorithms for machine learning and
data mining (ML-DM). Existing programming paradigms for expressing large-scale parallelism such as MapReduce (MR) and the
Message Passing Interface (MPI) have been the de facto choices
for implementing these ML-DM algorithms. The MR programming paradigm has been of particular interest as it gracefully handles large datasets and has built-in resilience against failures. However, the existing parallel programming paradigms are too low-level
and ill-suited for implementing ML-DM algorithms. To address
this deficiency, we present NIMBLE, a portable infrastructure that
has been specifically designed to enable the rapid implementation
of parallel ML-DM algorithms. The infrastructure allows one to
compose parallel ML-DM algorithms using reusable (serial and
parallel) building blocks that can be efficiently executed using MR
and other parallel programming models; it currently runs on top of
Hadoop, which is an open-source MR implementation. We show
how NIMBLE can be used to realize scalable implementations of
ML-DM algorithms and present a performance evaluation

0 件のコメント:

コメントを投稿