Font Size: a A A

K-Means Clustering Algorithm Optimization And Its Application In Image Deduplication

Posted on:2017-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YinFull Text:PDF
GTID:2348330503489874Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With high speed development and the popularity of cloud storage service, the multimedia datas such as image and video have gradually become the main way of record and share information. Compared with the traditional word records, photos and other multimedia data need much larger space to store. So how to effectively compress image and reduce image storage capacity is also a new challenge. Research observed that near-duplicate images occupy a large proportion in the total graphics in many social networking sites like facebook, qq, baidu cloud. Near-duplicates images are defined as image exchanged with common transformations such as changing contrast, saturation,scaling, cropping, framing, etc. According to the findings, this paper proposes a image deduplication solution.Images deduplication system divides into two parts. In the first part, it will use content-based image retrieval system to classify all images and put near-duplicates photos together. In image retrieval technology, it preprocess all photos, extract all images feature values, execute k-means clustering algorithm to all feature values that extracted from photos, then quantify all features by using Bag-of-Words model with the bag of visual words which is the cluster center. As a result, it can use a fixed dimension feature vector to stand for a image. At last, it uses inverted index to put near-duplicates images together. In the second part, due to the fact that the similarity of images which are classified is very high, it adopts the method of video compression algorithms compressed images to greatly reduce images storage capacity.K-Means algorithm is the key technology of image similarity clustering, its speed of execution and the results will directly affect near-duplicates images’ compression In other words, the K-Means algorithm can be a performance bottleneck of the whole system.When dealing with large numbers of features, the number of data point n and the number of center point k will become very large in traditional k-means clustering algorithm. As a result, the effective of the traditional k-means clustering algorithm will become very low.This system adopts a k-means clustering algorithm optimization scheme to speed up k-means algorithm in the case of larger scale n and k. According to the test results it shows that the optimized k-means algorithm has better performance under large scale number.
Keywords/Search Tags:image deduplication, near-duplicates, k-means clustering algorithm
PDF Full Text Request
Related items