Font Size: a A A

Study On Bi-clustering Algorithms Towards High Dimensional Data

Posted on:2019-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y K JiaFull Text:PDF
GTID:2348330542491069Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of bioinformatics and e-commerce,a lot of high-dimensional data have been accumulated.How to get valuable information from these high-dimensional data by data mining technology is of great significance.For clustering analysis,the traditional clustering method can only cluster on rows or columns of data matrix,and it can only dig out the global information of data matrix.However,there is a large number of local information in high-dimensional data.In order to better extract local information in high-dimensional data,the bi-clustering algorithm has been proposed.The bi-clustering algorithm can cluster both on rows and columns of data matrix at the same time,and it can more effectively mine large amounts of local information in high-dimensional data.The bi-clustering algorithm can effectively solve the dimension disaster problem of the traditional clustering algorithm encountered in high-dimensional data.However,the research of the bi-clustering algorithm is still in the initial stage,the current variety of bi-clustering algorithm has many shortcomings,so it is very important for the study of the bi-clustering algorithm.Aiming at the most widely applied field of biometric data and electronic commerce in the bi-clustering algorithm,this paper studies and analyzes the characteristics of two different data sets,and designs two kinds of bi-clustering algorithms applied in different fields.In this paper,we proposed an efficient Weighted Mutual Information Bi-clustering(WMIB)algorithm for high dimensional gene expression data.Due to the complex nonlinear relation between the genetic data,this paper presents a weighted mutual information similarity method to measure the correlation between the genetic data,and considering the importance of the set of conditions influenced on the bi-clustering,we proposed a newly objective function,by optimizing the weights of the set of conditions we selected the condition set of bi-clustering.And the experimental results show that our weighted mutual information bi-clustering algorithm achieves excellent clustering results.Aiming at the high sparsity and cold start problem of high dimensional image recommendation data,this paper proposes an Asynchronous Bi-Clustering Collaborative Filtering(ABCCF)algorithm based on the general collaborative filtering algorithm in recommendation system.Considering the cold start problem often encountered on the dimension of image clustering,we use a multi-view clustering algorithm combined image clicking features and image visual features to obtain accurate image clustering,considering the dimension disaster problem encountered when clustering the user dimension,we use the bag of words model to combine the information of image clustering to reduce the matrix dimension,and obtain more accurate user clustering,then we use the collaborative filtering algorithm to combine the information of user clustering and image clustering by similarity fusion strategy,and obtain high quality image recommendations.Finally,the experimental results show that our asynchronous bi-clustering collaborative filtering algorithm proposed in this paper is obviously better than other existing methods.
Keywords/Search Tags:Bi-clustering, High Dimensional Data, Gene Expression Data, Image Recommendation, Data Analysis
PDF Full Text Request
Related items