Font Size: a A A

Distributed Constrained Biclustering Based On Sparse Preserving Submatrix Model

Posted on:2021-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhongFull Text:PDF
GTID:2428330611467011Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a method of local correlation analysis,biclustering technique is proposed to mine qualified submatrix patterns from massive data matrices,so as to find interesting local correlation in data.However,in the research of double clustering technology,if we only focus on ensuring the quality of each submatrix pattern mined,that is,ensuring that each submatrix pattern fully meets the conditions,it is difficult to control the coverage ratio of all submatrix patterns over the complete data matrix and the overlapping degree between the submatrix patterns.On the contrary,if we want to realize that all submatrix patterns completely cover the complete data matrix and all submatrix patterns are mutually disjoint,that is,we want to find as many different local correlations as possible,then it is bound to be difficult to ensure that each mined submatrix pattern meets the expected requirements.In addition,note that the submatrix pattern that theoretically conforms to the expectation reveals a higher local correlation.Therefore,there is a trade-off between the quality of the submatrix pattern and its location distribution over the complete data matrix.In addition,due to the high computational complexity,the existing research on biclustering mainly focuses on the dense data sets with limited scale,which is obviously not applicable to the applications with large sparse data sets such as recommendation systems,text mining and bioinformatics.In view of the above problems,a new problem called constrained biclustering is defined in this paper firstly.In order to ensure the quality of each mined submatrix pattern and improve the accuracy of the local correlation,the goal of constrained biclustering is to mine the submatrix patterns that satisfy the predefined submatrix model conditions.In addition,two constraints,coverage constraint and overlapping constraint,are added to control the location distribution of the mined submatrix patterns on complete data matrix.In this way,the repetitive and redundant information is reduced and the computational efficiency of the method is improved.In order to apply this method to the field of prediction rating for recommendation systems,this paper proposes and implements an effective heuristic constrained biclustering algorithm based on a sparse order-preserving submatrix model.This algorithm mines eligible submatrix patterns by linearly scanning the entire data matrix and predicts rating scores for missing entries in submatrix.Experimental results show that the constrained biclustering algorithm proposed in this paper is not only better than the traditional user-based collaborative filtering algorithm and item-based collaborative filtering algorithm in the accuracy of the prediction rating of the recommendation system,but also better than the other two recommendation system algorithms also based on biclustering technology.In addition,in order to improve the computational efficiency of the algorithm,this paper also designs a distributed computing framework based on the above constrained biclustering algorithm,also known as distributed constrained biclustering algorithm,which is used to deal with large and sparse data sets.Experimental results show that the distributed constraint biclustering algorithm proposed in this paper not only improves the accuracy of prediction rating,but also has high stability and scalability.
Keywords/Search Tags:Biclustering, Pattern Mining, Distributed Computing, Recommender Systems
PDF Full Text Request
Related items