Font Size: a A A

Content-Context Information Bottleneck For Image Clustering

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q HouFull Text:PDF
GTID:2428330602973817Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,people's life has entered the era of intelligent information,and the acquisition of image data has become more and more simple and convenient with the popularization of various image acquisition devices.The huge amount of data and complex data types have urged people to use machine learning to process large amounts of data,which has also promoted the development of computer vision technology.Image clustering is one of the most important issues in this field.With the diversity of image datasets and themes in recent years,how to make better use of the information in the image to improve the quality of image clustering has become a hot spot and difficult point in the current image clustering algorithm research.For decades,many classic and effective image clustering algorithms have emerged.Most of them use the content information or context information contained in the image to cluster the image data.The content information of the image,such as color,shape and other features,well expresses the inherent characteristics contained in the image and provides a basis for image clustering;and the context information of images,such as distance or similarity between images,can effectively express the close correlation between images,which is conducive to image clustering.However,most existing image clustering algorithms only consider one or the other when clustering image data.Therefore,information loss is inevitable in the process of data processing,which will affect the final clustering results.To solve this problem,this thesis proposes a novel image clustering algorithm based on Information Bottleneck method: C2IB(Content-Context Information Bottleneck).In this algorithm,data analysis is regarded as the compression process of data,and the content information and context information contained in the image are retained maximally during the compression process.Specifically,for the content information in the image,the method firstly extracts the SIFT features of each image to obtain the visual vocabulary vector,and then use the classic Bag-of-Visual-Words image model to define the relevant variables of the image;For the context information in the image,firstly,the gaussian kernel function is used as the similarity measurement method to calculate the similarity between images,and then the local neighbor relation between data points is transformed into a model to construct the similarity matrix.The C2 IB algorithm comprehensively considers the content information and context information contained in the image data,so it can better dig out potential patterns in the image data and improve the clustering accuracy.At last,a new "extract-merge" sequence method is designed to optimize the objective function of the algorithm.The experimental results on five image datasets show that the C2 IB algorithm can effectively fuse the content information and context information of the image,and has good performance and stability.
Keywords/Search Tags:image clustering, information bottleneck, content information, context information
PDF Full Text Request
Related items