2011年12月3日土曜日

Spectral Clustering

http://www.csie.ntu.edu.tw/~cjlin/papers/psc08.pdf
Parallel Spectral Clustering in Distributed Systems

Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional
algorithms such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and
computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two
representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix
with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors
and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through
an empirical study on a document data set of 193; 844 instances and a photo data set of 2; 121; 863, we show that
our parallel algorithm can effectively handle large problems.


Parallelization of spectral clustering algorithm on multi-core processors and GPGPU
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4625449

Spectral clustering is a widely-used algorithm in the field of information retrieval, data mining, machine learning and many others. It can help to cluster a large number of data into several categories without requiring any additional information about the dataset or the categories, so that people can find information by categories easily. In this paper, we parallelize the algorithm proposed by Andrew Y. Ng, Michael I. Jordan and Yair Weiss. We provide two versions of implementation: one is parallelized in OpenMP; the other is programmed in the NVIDIA CUDA (compute unified device architecture), which is the environment provided by NVIDIA to program on its CUDA-Enabled GPGPUs (general-purpose graphic processing unit). We can achieve about three times speedup in OpenMP and around ten times speedup using CUDA in our experiments.


On a strategy for spectral clustering with parallel computation
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science

pectral Clustering is one of the most important method based on space dimension reduction used in Pattern Recognition. This method consists in selecting dominant eigenvectors of a matrix called affinity matrix in order to define a low-dimensional data space in which data points are easy to cluster. By exploiting properties of Spectral Clustering, we propose a method where we apply independently the algorithm on particular subdomains and gather the results to determine a global partition. Additionally, with a criterion for determining the number of clusters, the domain decomposition strategy for parallel spectral clustering is robust and efficient.

0 件のコメント:

コメントを投稿