K-Means Clustering Algorithm Optimization And Its Application In Image Deduplication

Posted on:2017-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Yin

Full Text:PDF

GTID:2348330503489874

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With high speed development and the popularity of cloud storage service, the multimedia datas such as image and video have gradually become the main way of record and share information. Compared with the traditional word records, photos and other multimedia data need much larger space to store. So how to effectively compress image and reduce image storage capacity is also a new challenge. Research observed that near-duplicate images occupy a large proportion in the total graphics in many social networking sites like facebook, qq, baidu cloud. Near-duplicates images are defined as image exchanged with common transformations such as changing contrast, saturation,scaling, cropping, framing, etc. According to the findings, this paper proposes a image deduplication solution.Images deduplication system divides into two parts. In the first part, it will use content-based image retrieval system to classify all images and put near-duplicates photos together. In image retrieval technology, it preprocess all photos, extract all images feature values, execute k-means clustering algorithm to all feature values that extracted from photos, then quantify all features by using Bag-of-Words model with the bag of visual words which is the cluster center. As a result, it can use a fixed dimension feature vector to stand for a image. At last, it uses inverted index to put near-duplicates images together. In the second part, due to the fact that the similarity of images which are classified is very high, it adopts the method of video compression algorithms compressed images to greatly reduce images storage capacity.K-Means algorithm is the key technology of image similarity clustering, its speed of execution and the results will directly affect near-duplicates images’ compression In other words, the K-Means algorithm can be a performance bottleneck of the whole system.When dealing with large numbers of features, the number of data point n and the number of center point k will become very large in traditional k-means clustering algorithm. As a result, the effective of the traditional k-means clustering algorithm will become very low.This system adopts a k-means clustering algorithm optimization scheme to speed up k-means algorithm in the case of larger scale n and k. According to the test results it shows that the optimized k-means algorithm has better performance under large scale number.

Keywords/Search Tags:

image deduplication, near-duplicates, k-means clustering algorithm

PDF Full Text Request

Related items

1	Research And Implementation Of Text Clustering Algorithm Based On Memory Calculation
2	Improvent Of K-means Clustering Algorithm And Its Application
3	Research On Web News Extraction And Duplicates Elimination
4	Research On Technology Of Image Segmentation And Its Application
5	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
6	Research On Segmentation Algorithm Based On Neutrosophic C-means Clustering
7	The Application Of Improved Fuzzy C Means Clustering Algorithm In Image Segmentation
8	Study On The Application Of The Improved K-means Clustering Algorithm In Image Retrieval
9	Research Of Algorithm For Image Segmentation Based On The C-means Clustering
10	Research And Comparison Of Several Kinds Of Clustering Algorithm For Image Segmentation