Research On Interactive Information Bottleneck Clustering Algorithm For Large-scale High-dimensional Data

Posted on:2021-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:R B Wang

Full Text:PDF

GTID:2428330602976354

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of information and internet technology,data sets with large scale and high dimensionality are growing exponentially.Due to the influence of"data explosion"and"curse of dimensionality",traditional clustering algorithms are difficult to achieve expecting results when facing large-scale high-dimensional data.Therefore,to meet the requirements of data in practical applications and characteristics of different fields,developing an effective and efficient large-scale and high-dimensional data clustering algorithm has important theoretical significance and applicable value.For clustering analysis of large-scale high-dimensional data,co-clustering algorithms provide a way by clustering the row-wise data points and column-wise features simultaneously,which reveal internal relationships between them and integrate overall information of data,and use correlation between them to improve clustering performance.Existing co-clustering algorithms consider eliminating redundancy or noise by reducing the feature dimensionality,which takes the harmful original features into data clustering and thus weakens the final clustering performance.To address the aforementioned problems,inspired by co-clustering algorithms,we propose an effective Interactive Information Bottleneck(I~2B)clustering algorithm.Compared with existing co-clustering algorithms,I~2B considers dimension-reduced features clustering for data in row direction and uses clustered data points for column-wise feature clustering,by which the satisfactory final clustering result may probably be obtained.Several advantages of this method are as follows:(1)It can obtain effective discriminant features and eliminate harmful redundant or noisy features,which will be conducive to clustering of data after each iteration;(2)Clustered data points can be used as supervisory information to guide feature clustering.To our knowledge,this is the first work addressing this problem in a co-clustering way.Finally,a new twin�draw-and-merge�method is designed and optimized,time complexity of this optimized algorithm is related with the scale and dimension of data linearly,which can process large-scale and high-dimensional data efficiently.Experimental results show that performance of I~2B algorithm is better than the previous original IB algorithms and other traditional clustering algorithms.Compared with state-of-the-art large-scale and high-dimensional data clustering algorithms,I~2B also achieves better stability and higher clustering accuracy.

Keywords/Search Tags:

Clustering, Large-scale data, High-dimensional data, Information bottleneck

PDF Full Text Request

Related items

1	Similarity Search On Large-scale High-dimensional Data
2	Developing efficient algorithms for data mining large scale high dimensional data
3	Research And Design Of Clustering Method Based On Large Data And High Dimensional Data
4	Fast Sparse Affinity Propagation Clustering Algorithm For Large-Scale And High-Dimensional Data
5	Learning And Indexing Structural Representations Of Large Scale High Dimensional Data
6	Research On Clustering Algorithms For Large-scale Complex Data
7	Application And Research On Clustering Algorithm In Large Scale High Dimensional Datasets
8	Efficient Query Processing Over Large-Scale Multimedia Databases
9	Research On Clustering Algorithm For Large-Scale High-Dimensional Data
10	Research And Application Of Rough Clustering Algorithm For High Dimensional Data Sets