Font Size: a A A

Dimensionality Reduction And Classification Of High-dimensional Data Using Cosine Metric

Posted on:2016-09-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L LiuFull Text:PDF
GTID:1108330461977692Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, digital and multimedia technology promotes the development of machine learning, where dimensionality reduction and data classification are two very important topics. Most studies used Euclidean distance to evaluate the similarity between samples in dimensionality reduction and classification methods, and few studies related to other methods. With the development of metric learning, evaluation of similarity between samples becomes a new hot topic. Cosine measure is used to develop the dimensionality reduction and classification methods in this paper, which is also applied to human motion time series analysis and data stream learning. On the basis of dimensionality reduction and classification algorithms, we proposed the following innovative achievements:(1) Local tangent space alignment (LTSA) fails to learn locally high curvature dataset. To address the problem, this paper describes the data set of the local curvature by adding new parameter and robust local subspace and presents a new algorithm called Locally minimal deviation space alignment (LMDSA) for nonlinear problem. Considering the low-robust deficiencies in local tangent space, LMDSA can find the locally high curvature while computing locally minimal deviation spaces. The algorithm also reduces the probability of locally high curvature space by parameter control and the joint information between neighborhood information. Then the algorithm applies space alignment technique to reduce dimensionality. Considering the continuity of human motion and the local high curvature in human motion sequence, a new segmentation method using manifold learning is proposed to deal with the segmentation problem in this paper. The method evaluates the coherence of human motion based on the local warp index of sequence data. As the transition clips between the certain adjacent motion units warp largely, the filtering technique as well as the piecewise linear representation is applied to deal with the motion sequence. However, the application of nonlinear dimension reduction is not widespread but linear methods are popular. The global linear algorithms characterize the local sampling information, thereby making it superior to Principal component analysis (PCA). However, these algorithms are all inefficient for extracting the local data features, which leads to incomplete learning. On the basis of analyzing the local subspace, a new global linear algorithm is proposed in this article, which is named Maximal similarity embedding (MSE). Preserving local feature makes this new algorithm distinct from most other methods. The MSE algorithm utilizes the Cosine metric to describe the geometric characteristics of neighborhood and seeks to maximize the global similarity for dimensionality reduction. This new proposed method is robust for sparse dataset and naturally helps in avoiding the problem of small sample size cases.(2) We delve into Linear discriminant analysis (LDA) and Maximum margin criterion (MMC) algorithms and analyze the effects of the degree of scatters to subspace selection. Meanwhile, we give the boundaries of scatters in LDA and MMC algorithms to illustrate the differences and similarities of subspace selection in LDA and MMC in different circumstances. Besides, the effects of outlier classes to subspace selection are also analyzed. According to the above analysis, we propose a new subspace selection method called Angle linear discriminant embedded (ALDE) based on an angle measure. ALDE utilizes the cosine of angle to get new within-class and between-class scatter matrices and avoids the small sample size problem simultaneously. In order to deal with high dimensional data, we extend ALDE to a Two-stages ALDE (TS-ALDE). Furthermore, Because of concept drift problem in data stream, traditional machine learning methods no longer work. Meanwhile, real-time learning is required in data stream and most of concept detection methods can’t support real-time demand. For solving this problem, this paper proposes a data stream learning framework which improves the classical LDA method based on a robust subspace learning method. It can not only detect concept drift in data stream quickly, but also classify data stream in real-time.(3) We analyze an effective classification algorithm-Extreme learning machine (ELM). ELM has been widely used in applications of pattern recognition and data mining etc. for its extremely fast training speed and high recognition rate. But in practical applications, datasets are often under an irregular distribution and with outliers. These problems reduce the classification rate of ELM. This is mainly because: ① Outfitting caused by outliers, unreasonable selection of activation function and kernel function; ② The labeled sample size is small and does not make full use of the unlabeled data. Against the first problem, this paper proposes a Robust Activation Function (RAF) based on analyzing several different activation functions in-depth. RAF keeps the output of activation function away from zero as much as possible and minimizes the impacts of outliers to the algorithm. Thus, it improves the performance of ELM (kernel ELM); simultaneously, RAF can be applied to other kernel methods and neural network learning. To solve the second problem, we propose an extended semi-supervised ELM (ESELM). Furthermore, a semi-supervised kernel ELM (SK-ELM) is proposed to cope with non-linear data.
Keywords/Search Tags:Cosine Metric, Dimensionality Reduction, Classification, Data Stream, Human Motion, Time Series
PDF Full Text Request
Related items