Font Size: a A A

The Research And Application Of Large-Scale Image Classification And Robust Subspace Clustering Algorithm For Big Data

Posted on:2017-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y XueFull Text:PDF
GTID:2308330485998796Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Under the background of big data, traditional image classification and subspace clustering have been facing serious challenges. In order to analyze and process data in real time, it is necessary to use distributed computing framework. There are many frameworks in distributed computing area currently, in this paper, the Spark distributed computing framework is used to realize the image classification and subspace clustering algorithm which is suitable for big data.Firstly, the paper conducts an in-depth study of Spark’s operation architecture and programming model, concurrent mechanism, as well as RDD’s common operation. Secondly, Spark’s Machine Learning library (MLlib), image classification and subspace clustering algorithm are introduced in detail. In this paper, a distributed image classification framework and distributed subspace clustering algorithm LBF are realized on top of Spark platform.The main contributions of this paper are:(1) Build the Spark distributed computing platform on top of the Ubuntu operating system, and the performance of Hadoop and Spark platform in machine learning area is compared by K-Means clustering which needs iteration.(2) A distributed image classification framework which is suitable for big data is achieved on top of Spark platform, by analyzing the concurrent property of traditional image classification, implement Bag-of-Words (BoW) model in parallel. The performance of this framework is tested by Ali-image dataset.(3) Local Best-fit Flat (LBF) subspace clustering algorithm is implemented on top of Spark platform, by analyzing the concurrent property of LBF’s combinatorial optimization problem, the optimal subspace selection is achieved in parallel, in order to adapt subspace clustering to massive data.It is showed by the experimental results that compared with the traditional single PC version, the performance and efficiency of Spark platform based image classification framework and sunspace clustering algorithm has been greatly improved.
Keywords/Search Tags:big data, Spark, distributed computing, image classification, subspace clustering
PDF Full Text Request
Related items