Font Size: a A A

Clustering Algorithm Research Based On The Bilinear Probabilistic Principal Component Analysis

Posted on:2019-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:X J SunFull Text:PDF
GTID:2428330542498991Subject:Statistics
Abstract/Summary:PDF Full Text Request
Clustering analysis is a method that classifies objects based on their own characteristics,which is used in many fields,including pattern recognition?Biological information?image analysis and other important areas,and has become a very active research direction.With the rapid development of science and technology,the scale of data set is expanding.and all walks of life contain a lot of high dimensional data,these high-dimensional data can provide more information,but at the same time it also suffer from the curse of dimensionality.Traditional clustering algorithms can achieve stable clustering results in processing low dimensional data,but in the clustering process of high-dimensional data,the traditional clustering algorithms will lose the significance of clustering analysis,and thus reducing the effectiveness of clustering.In order to deal with high-dimensional data efficiently,the focus of clustering analysis will be transferred to high-dimensional space,including preprocessing of high-dimensional datasets or application of data dimensionality reduction.Data reduction techniques can not only effectively solve the high-dimensional data in the "dimension disaster" problem and reduce the complexity of data,but also can reduce the noise and redundancy in the data,it can be extracted from the data structure of interest,so that people can better research and analysis of data.In order to achieve dimensionality reduction in clustering algorithm,a hybrid probabilistic principal component analysis(MPPCA)model is proposed,which combines the mixture algorithm in clustering algorithm with principal component analysis in dimension reduction.The model can simultaneously realize the reduction and clustering.However,to apply MPPCA to high dimensional data(such as image data),one possible solution is to first vectorize the data,which will cause the dimension disaster.In order to better handle high-dimensional data,based on the existing dimensionality reduction model and clustering model,this paper proposes a dimension reduction method based on matrix data,which is a clustering method that can efficiently handle high-dimensional data.The main work of the full text is as follows:1.Referring to the bilinear probability principal component analysis(BPPCA)model,we extended this model to a mixed model.Therefore,this paper propose a mixed bilinear probabilistic principal component analysis model(MBPPCA)and investigates the theoretical properties of the model.2.In this paper,two algorithms of ECM and AECM are proposed to fit the MBPPCA model,and the computational complexity of the two algorithms is analyzed,we found that compared with the ECM algorithm,the AECM algorithm enjoys lower computational complexity.We sample a 2-D synthetic data set to investigate the accuracies and convergence of the two algorithms,and the experiments show that both algorithms converge to the real values of the parameters as the sample size increases,and the MBPPCA model is more accurate than MPPCA model,The convergence rate of the ECM algorithm is faster than that of the AECM algorithm.3.In order to test the recognition effect of the model,this paper carries out experimental analysis on two real data sets: the handwritten digital recognition data and the UMIST face data.Aiming at handwritten numeral recognition data,this paper studies the recognition effect of the MBPPCA model and the MPPCA model when the data is clustered into different categories and reduced to different dimensions.The results show that the recognition effect of the MBPPCA model is inferior to MPPCA model,which is contrary to the expectation,and the specific reasons need to be further investigated.Aiming at UMIST face data,this paper studies the recognition effect of three models,MBPPCA model,MPPCA model and BPPCA model when select a different number of training samples,different dimension and different categories of number,select the best recognition rate on different dimension,results show that the recognition effect of MBPPCA model in different number of training samples and the different categories are better than BPPCA model and MPPCA model,show that the proposed model in the the database recognition effect is better.
Keywords/Search Tags:Dimension reduction, Clustering analysis, mixture of Probabilistic Principal Component Analysis, Bilinear probabilistic principal component analysis, Image recognition
PDF Full Text Request
Related items