Research And Application Of Clustering Method For Big Visual Data

Posted on:2018-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q T Zhou

Full Text:PDF

GTID:2348330518498257

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of big data era, a variety of information appears explosive growth,especially image data, not only rich and abstract. Traditional image retrieval techniques and clustering algorithms are constrained by stand-alone architecture and cannot meet the real-time requirements. In order to accomplish the image retrieval quickly and effectively, there is an urgent need for distributed processing architecture to help implement the relevant machine learning algorithm. There are a lot of distributed framework, including Spark, a memory-based computing model, which is very suitable for iterative of machine learning algorithms. Compared to the popular Hadoop framework, Spark can even be more than 100 times faster on machine learning tasks. So this paper uses Spark distributed computing framework to implement image-retrieval related clustering algorithm under the background of big data.This paper first elaborates the operational principle, programming model and machine learning library of Spark distributed computing framework, and discusses the programming model, thread organization structure and corresponding hardware structure of CUDA architecture, and then discusses the bag of features model in the application of the image retrieval framework and its bottlenecks. On this basis,combined with Spark distributed computing framework and CUDA architecture, we realize the commonly used image clustering algorithm K-means based on the Spark-GPU technology, and realize a subspace clustering algorithm Local Best- Fit Flat (LBF)in the Spark and CUDA architecture respectively. The main contributions of this paper are as follows:(1) In the Linux operating system environment, the Spark distributed computing platform and the GPU computing environment are built, and the clustering algorithm K-means is implemented on the Spark and Spark-GPU platforms respectively.Through the performance comparison of two methods, Spark-GPU technology effectively accelerates K-means clustering algorithm.(2) Through the analysis of the concurrency of the subspace clustering algorithm Local Best-Fit Flat (LBF ) , this paper completes its implementation on the Spark cloud platform and CUDA architecture. Then we analyze the root causes of the inefficiency of Spark-LBF, and transfer the most time-consuming K in C optimal subspace of the task to the parallel computing GPU, making the algorithm performance effectively improved.(3) Summarizes the image retrieval solution based on the bag of features model. In the process of model construction,the bottleneck of the architecture is observed, and the image clustering algorithm K-means and subspace clustering algorithm LBFinvolved in the architecture are optimized to serve the image retrieval architecture.After several experiments,it is shown that the K-means clustering algorithm based on Spark-GPU technology and the subspace clustering algorithm LBF under CUDA architecture have achieved higher retrieval efficiency, which has significant application and guidance value for image retrieval and classification.

Keywords/Search Tags:

big data, Spark distributed computing, Spark-GPU technology, image retrieval, subspace clustering

PDF Full Text Request

Related items

1	The Research And Application Of Large-Scale Image Classification And Robust Subspace Clustering Algorithm For Big Data
2	Optimization And Implementation Of Clustering Algorithms Based On Spark Platform
3	Research And Realization Of Clustering Algorithm Based On Spark Platform
4	Research On Memory Data Management Technology In Spark
5	Research On Data Stream Clustering Method Based On Spark
6	A System For Distributed MD Data Analysis Based On Spark
7	Research Of The Clustering Algorithm Based On The Spark
8	Research On Apache Spark Distributed Parallel Computing Framework Optimization Technology
9	Designand Implementation Of Data Stream Clustering Algorithm StreamCKS Based On Spark Streaming
10	The Research And Implementation Of Mining Large Data Based On Spark