Font Size: a A A

The Research On Regularization Technology And Its Applications In Data Mining

Posted on:2016-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:1368330473967138Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,especially the advances of highthroughput technology,data has become the most common used information carrier in industry.However,the emergence of mass data makes it impossible to gain useful information that hiding in data,which makes data storage or computing even more difficult.Consequently,how to quickly and accurately obtain useful information from the data has become a fundamental problem in data mining field.Data mining with regularization technology has been emphasized in recent years,which can fix ill-conditioned mathematical model by information fusion.Meanwhile,optimization theory provides a good theoretical support for solving regularized mathematical model.Regularization technology is increasingly being used in bioinformatics,pattern recognition,face recognition,image clustering,etc.Thus,research on regularization technology method and its applications in data mining has profound practical meaning.In this dissertation,we mainly foucs on regularization technology and its applications in data mining.Here,four methods were proposed in this dissertation.Concretely,parameter free sparse representation classifier,feature selection using locality sensitive Laplacian score,feature selection by batch model and clustering based on graph regularized sparsity PCA.Not only theoretical analysis but also externsive real-world experiments to confirm the effectiveness of the proposed methods.1)The purpose,background,and preliminaries of this research are presented in the front of this dissertation.The preliminaries include: the definition of mathematical symbol,review about designing of classifier in regularization framework,review about dimensionality reduction in regularization framework,review about clustering under regularization framework.2)We propose a weighted meta-sample based non-parametric sparse representation classification method for the accurate identification of tumor subtype.The proposed method includes three steps.First,the weighted meta-samples for each sub class from raw data are extracted,and the rationality of the weighting strategy is proven mathematically.Second,sparse representation coefficients can be obtained by 1regularization of underdetermined linear equations.Thus,data dependent sparsity can be adaptively tuned.A simple characteristic function is eventually utilized to achieve classification.Asymptotic time complexity analysis is applied to our method.Compared with some state-of-the-art classifiers,the proposed method has lower time complexity and more flexibility.3)We propose a supervised gene selection method called locality sensitive Laplacian score(LSLS),which incorporates discriminative information into local geometrical structure,by minimizing local within-class information and maximizing local between-class information simultaneously.In addition,variance information is considered in our algorithm framework.Eventually,to find more superior gene subsets,which is significant for biomarker discovery,a two-stage feature selection method that combines the LSLS and wrapper method(sequential forward selection or sequential backward selection)is presented.4)Two efficient feature ranking methods are presented.Multi-target regression and graph embedding are incorporated in an optimization framework,and feature ranking is achieved by introducing structured sparsity norm.Unlike existing methods,the presented methods have two advantages:(1)the feature subset simultaneously account for global margin information as well as locality manifold information.Consequently,both global and locality information are considered.(2)Features are selected by batch rather than individually in the algorithm framework.Thus,the interactions between features are considered and the optimal feature subset can be guaranteed.5)A novel clustering algorithm named as graph regularized and non-negative PCA(SGPCA)is proposed in this dissertation.This is a extension of non-negative matrix factorization based clustering methods,SGPCA takes into account both local manifold structure and sparsity constrains simultaneously.SGPCA has the following superiority in two phases:(1)Unlike traditional clustering method such as K-means or EM algorithm which performance relies heavily on the assumption that data satisfies Gaussian distribution.However,SGPCA performs very well on data with arbitrary distribution;(2)The non-negative and sparsity constrains enhance the discriminating ability of SGPCA.Lastly,we give the solution method as well as its convergence analysis.Experiments on toy and real image data further confirm the superiority of our method.
Keywords/Search Tags:Convex Optimization, Regularization, Manifold Learning, Data Mining, Machine Learning
PDF Full Text Request
Related items