Font Size: a A A

Research On Graph-based Semi-supervised Algorithms And Its Applications

Posted on:2014-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F CaiFull Text:PDF
GTID:1268330425476716Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, human society has entered the eraof big data and faced with the rapid growth of massive geometric data. How to obtain usefulknowledge from these massive data is one of the common challenges faced by the globalresearch scientists and technical experts in the present and the future. In addition, more andmore data, such as digital photographs, voice data, web text or gene expression microarrays,usually has the character of high dimensionality such that dimensionality reduction (DR) hasbecome an important tool to handle it and avoid the “curse of dimensionality”. The traditionalDR approaches can not reveal the low dimensional manifold structure of high dimensionaland nonlinear data for its linear nature although it can effectively learn the intrinsic structureof linear data. So we turn to help from manifold learning to solve the above problem.However, in many real world applications of pattern classification and data mining, it is easyto get a large number of unlabeled points and a small portion of labeled points, that’s justwhat semi-supervised learning cares about.Despite their success in many practical applications, those algorithms usually suffer fromsome limitations such as neighborhood parameter selection, sensitivity to noisy, sparse andimbalance data. This paper is a research on graph construction and optimation, especially fordimensionality reduction task. Finally I demonstrate the effectiveness of those methods bysome real world applications including face recognition, cancer classification and otherpractical applications. More concretely, the main contributions include:(1) Proposing a novel algorithm of Semi-supervised Dimensionality Reduction based onLocally Estimated Error (LEESSDR) and applying it to face recognition problem. It is wellknown that graph plays an important role in semi-supervised learning. However, thetopological structure of constructed neighborhood involved in these methods is unstable, byvirtue of sensitivity to the selection of neighborhood parameter and inaccurate in the setting ofthe edge weights of neighborhood graph. Since local models are trained only with the pointsthat are related to the particular data,local learning approaches often outperform global ones.The good performance of local learning methods indicates the label of a point can be wellestimated based on its neighbors. Under this motivation, we design LEESSDR algorithmbased on LLP. The algorithm can set the edge weights of neighborhood graph throughminimizing the local estimated error and can effectively preserve the global geometricstructure of the sampled data set as well as preserving its local one. Since LLP does notrequire local linear input space, for nonlinear local space, LLP maps it to the feature space byusing kernel functions, and then obtains its locally estimated error in the feature space. Theexperimental results on Extended YaleB and CMU PIE face databases demonstrate that LEESSDR is better than other semi-supervised dimensionality reduction algorithms in theperformance of classification and robustness.(2) Presenting a Local and Global Preserving Semi-supervised Dimensionality Reductionbased on Random Subspace (RSLGSSDR) method. Constructing a faithful graph ingraph-based semi-supervised classification is the first and most important step, however thetopology of the neighborhood constructed with most existing approaches is unstable in thepresence of noise. By combining the random subspace with the semi-superviseddimensionality reduction, RSLGSSDR first designs multiple diverse graphs in differentrandom subspace of data sets, then fuses these graphs into a mixture graph on whichdimensionality reduction is performed. It can effectively preserve the global geometricstructure of the sampled data set as well as preserving its local one. Experimental results onpublic data sets demonstrate that the proposed RSLGSSDR not only has superior recognitionperformance to competitive methods, but also is robust against a wide range of values of inputparameters.(3) Random Subspace-based Semi-supervised Dimensionality Reduction algorithmmarked as RSSSDR is proposed in this paper. Precise cancer classification is essential to thesuccessful diagnosis and treatment of cancers. Although semi-supervised dimensionalityreduction approaches perform very well on clean data sets, the topology of the neighborhoodconstructed with most existing approaches is unstable in the presence of noise. By combiningthe random subspace with the semi-supervised dimensionality reduction, RSSSDR first,designs multiple diverse graphs in different random subspaces of data sets and fuse them toform a mixture graph on which dimensionality reduction is performed. Subsequently, the edgeweights of neighborhood graph are determined through minimizing the local reconstructionerror, such that the global geometric structure of data can be preserved without changing thelocal geometric structure. Experimental results on public cancer data sets demonstrate that theproposed RSSSDR algorithm is of high classification accuracy and strong robustness.(4)Proposing Perceptual Relativity-based Semi-Supervised Dimensionality Reduction(RSSDR) Algorithm. Semi-supervised dimensionality reduction approaches perform verywell in many applications, however when dealing with the sparse, noisy and imbalance data,it cannot guarantee to construct a faithful graph which then influence the performance. Basedon the relative cognitive law, the relative transformation is presented in RSSDR, by which therelative space is constructed which may be more line with people’s intuition. It should beindicated that relative transformation can improve the distinguishing ability among datapoints and diminishes the impact of noise on semi-supervised dimensionality reduction. Subsequently the algorithm set the edge weights of neighborhood graph through minimizingthe local reconstruction error in the relative space and can preserve the global geometricstructure of the data as well as preserving its local geometric structure. The experimentalresults on face, gene expression, UCI and noisy data sets prove that our approach often givesthe better results in classification and robustness.
Keywords/Search Tags:Machine Learning, Dimensionality Reduction, Semi-supervised learning, Bioinformatics, Face Recognition, Cancer Classification
PDF Full Text Request
Related items