Font Size: a A A

Research On Metric Learning Based Clustering Method With Incomplete Data

Posted on:2016-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:M YanFull Text:PDF
GTID:2308330479990088Subject:Computer Science
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, there are more and more data has been generated by the internet. This procedure often causes problems in data collection, transmission and storage, which are leading to the emergence of incomplete data. People concern about the relationship between data and specific meaning of data, therefore the data mining technology has been widely used. Cluster analysis is one of the core technologies of data mining, however, the traditional clustering analysis results are often ine?ective when facing incomplete data.Because of the poor performance of the Mahalanobis distance metric function in dealing with non-linear data transformation and complex distribution, this paper proposes a leaf index feature representation algorithm and a bayesian specific tree path feature representation algorithm. Then this paper utilizes the random tree structure to construct metric learning function to solve the non-linear transformation problem. The e?ectiveness has been proved in this paper, and the experimental results show its performance.Incomplete data often has missing data problems, since collaborative filtering-based approach and the expectation maximization methods cannot handle the random missing situation. Regression analysis has been used in this paper for incomplete data processing problem. The typical regression analysis has to predict data within a specific distribution,and often needs the same input form, then the auto-encoder method has been proposed which can solve random missing situation and recovery data do not rely on a specific distribution. Through experimental comparison can be seen, the proposed incomplete processing algorithm can do the incomplete data recovery tasks well.While clustering on incomplete data, single clustering method relies on specific assumptions and the distribution of incomplete data are often unknown. Single clustering method has poor performance in practice. In this paper, the Graph Laplacian properties has been used for clustering. This paper takes the random tree metric learning method and the incomplete data processing method to do the clustering analysis task. These methods have been proved to be e?ective in dealing with clustering problem, and experiments on UCI datasets verify the e?ectiveness with the proposed algorithm on incomplete data.
Keywords/Search Tags:Metric Learning, Incomplete Data Processing, Cluster Method, Data Mining
PDF Full Text Request
Related items