Font Size: a A A

Regularized Regression Methods For Large Scale Data Analysis And Classification

Posted on:2018-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L XuFull Text:PDF
GTID:1317330515479583Subject:Statistics
Abstract/Summary:PDF Full Text Request
With rapid development of the internet technology,the cost of collecting data of large scale and various modalities decrease much in the big data era.Big data analysis technology aims to mine useful information from the massive data,and thus it has social and market demands.Dimension reduction is an effective way in data analysis,in which variable se-lection or feature transformation is usually used to describe the intrinsic data struc-ture,such as locality,discrimination and so on,thus it plays an important role in regression analysis,machine learning,data mining and pattern recognition.There-fore,dimension reduction and feature extraction can often provide essential and profound understanding of given data.In the past two decades,regression methods have attracted general research attentions due to the abundant theories and interesting interpretations.For exam-ple,sliced inverse regression(SIR)and its variants exhibit effectiveness in high-dimensional data discriminant analysis.However,two shortcomings of SIR limit its further applications.First,computational complexity of SIR is very high for high-dimensional data.Second,sparsity of the projection subspaces is not well addressed in improving the feature selection and model interpretation performance.On the oth er-hand,how to improve the understanding and analysis ability for an intelligent system,through extracting discriminant information from auxiliary data sets is an important and attractive topic in the era of big data.It has very wide ap-plications including image classification,video retrieval,economic data clustering or prediction.In the image recognition literature,the training set and the testing set may have a large difference in distribution,which is often caused by different scales,views,and resolutions,and thus it decrease the real performance for direct dimen-sion reduction.It is worth noting that observations of the same object at different scales or different views,which are usually collected from different locations of the same category or topic,often provide complementary data descriptions.The task is then to extract the classification favorable characteristics of each domain,and at the same time improve the classification performance of the target domain.Transfer learning method provides one possible way to deal with cross-domain knowledge transfer and performance boosting.Typically,there are very few labeled data in the target domain,and at the same time there are a large number of labeled data in the source domain,then discriminant information of the source domain is expected to improve the classification performance of the target domain.Note that transfer learning methods are also called domain adaptation in computer vision literature.Based on the two aspects mentioned above,this paper derives some new mod-els to achieve high-dimensional data analysis and domain adaptation.The second chapter proposes a new algorithm to compute the projection vec-tor of SIR in the spectral space,in which an approximate regression solution is obtained at a faster speed.In particular,adaptive lasso is used to obtain a sparse and global optimal solution,which is very important in variable selection.In order to realize robust image classification with occlusion,the second chapter also presents a regression-based classification model,in which the correntropy is used as the robust loss function.More specific contributions are summarized as follows:(1)We solve the discriminant regression coefficients in the feature space,and reduce the compu--tational complexity of data-analysis.(2)We introduce the sparse constraint to the regression coefficients to enhance the interpretation capacity of high-dimensional data analysis model.(3)For outliers and noise data,we design the fast and robust classification algorithm.Extensive experiments are conducted on some benchmark data sets,which include facial image sets and gene micro-array data sets,to evalu-ate the new algorithms.The results show that our new methods obtain competitive classification performance.The third chapter proposes a low rank presentation-based conditional trans-ferable components learning method(LRCT)to address the possible limitations in domain adaptation methods,including computational complexity reduction and the classification performance improvement.Inspired by some successful applications of low rank presentation,LRCT aims to find a set of feature representation,in the form of appropriate locality and smoothness,to build a low-dimensional and shared subspace.An intermediate domain is established by a basic probability hypothesis(I),and then the domain adaptation is transported from the middle domain I to the target domain T.More specifically,the main contributions are summarized as follows:(1)The conditional distribution mismatching problems are studied in the framework of causality,and the low rank presentation between the middle domain I and the target domain T is exploited to transfer discriminant features.Thus the new method obtain s transferable features in the embedded subspace,instead of the original space.(2)In the numerical optimization aspect,the transfer learning model is formulated to a density ratio function,and then re-parameterized to a classical QP problem,which can be efficiently optimized via both alternative optimization strategy and the Augmented Lagrangian Multiplier method.(3)A faster solution method is proposed by the multi-step fixed point proximity iteration rules.The ex-perimental results show that the new proposed algorithm can reduce an inner loop in obtaining the iteration solution procedure,and it is favorable for large-scale data transfer learning and discriminant analysis.The fourth chapter concludes this paper with some future works.
Keywords/Search Tags:Dimension reduction, Sparse representation, Spectral regression, Transfer learning, Fixed-point proximity iteration
PDF Full Text Request
Related items