Font Size: a A A

Research On Nonlinear Incremental Feature Extraction For High Dimensional Data

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2518306353478454Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the explosive growth of data scale has become one of the main characteristics of the era of big data.There are many unprecedented challenges in data mining,because of the high dimensionality and sparseness of the data in this era.As an effective way to process high-dimensional data,feature extraction can map high-dimensional feature space to low-dimensional feature space for data analysis and processing by extracting low-dimensional characteristics of data,which is usually divided into linear feature extraction and nonlinear feature extraction.Linear feature extraction is assumption based on a linear structure between date,which has strong limitations and unsatisfactory results.Which has strong limitations and unsatisfactory results.However,non-linear feature extraction performs well in data processing of non-linear structures,as an extraction method that does not rely on linear assumptions,which has become one of the popular directions in data mining.Manifold learning,as a non-linear feature extraction method,utilizes the properties of a manifold on the local structure that is homeomorphic to Euclidean space to extract effective low-dimensional features.However,there are still some shortcomings in manifold learning methods.On the one hand,it ignores the class label information of the data,and the extracted features are not the optimal features for classification or cluster learning.When the features extracted from this are clustered or classified in data,the results often differ greatly from the actual ones.On the other hand,the existing manifold learning is only effective for static data,and there is less research on dynamic data(the amount of data is increasing).Therefore,the use of manifold learning methods to process incremental data is also a shortcoming of manifold learning.Aiming at the above problems,this paper summarizes some classic methods of manifold learning,with semi-supervision and increment as the main research content.Because of some existing semi-supervised manifold learning methods,only the labeled information is used to make local adjustments at the neighboring points,and its global role is ignored.This paper proposes a semi-supervised class preserving local linear embedding method(SSCLLE)based on the local linear embedding(LLE).This method first assigns pseudo labels to the nearest neighbors of the labeled samples and increases the number of labeled samples.Secondly,the distance between labeled samples is locally adjusted to reduce the distance between homogeneous samples and expand the distance between heterogeneous samples.At the same time,the constraints of global homogeneous sample spacing and heterogeneous sample spacing are added to the local linear embedding optimization objective function,so that the extracted low-dimensional features can ensure that homogeneous sample points are close to each other and heterogeneous sample points are separated from each other.The features extracted by this method have good class retention characteristics,and the clustering accuracy and visualization effect are significantly higher than that of unsupervised LLE and other semi-supervised manifold learning methods,which have been proved by a series of experiments.There are also some shortcomings in the existing incremental learning methods.For example,most incremental learning simply uses some neighbor relations to obtain the characteristics of the new data,and ignores some of the label information in the data.At the same time,the points of the neighboring data structure changes of the original data caused by the addition of new data are not updated.In view of these problems,this paper proposes an incremental method based on block matrix factorization(SSILLE)based on SSCLLE.The method considers the label information when calculating the new sample points,also recalculates the points whose neighbors change due to its addition.In this way,low-dimensional coordinates of newly added data points and nearby changed points are obtained.SSILLE does not need to perform feature extraction on all data again,which reduces the computational complexity and reduces the computational burden on the computer.Finally,the validity of the method is verified by experiments.
Keywords/Search Tags:Nonlinear feature extraction, Manifold learning, Incremental, Semi-supervised, Labeled information
PDF Full Text Request
Related items