Font Size: a A A

Research On And Design Of Dimensionality Reduction Algorithm For The High Dimensional Data

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y B YuFull Text:PDF
GTID:2518306050464664Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the information,the dimensions and scale of data to be processed are increasing rapidly,and it is difficult to obtain satisfactory results by directly applying the obtained data.High-dimensional data often leads to dimensional disasters,and effective data dimensionality reduction has attracted much attention from the social from all walks of life.Linear Discriminant Analysis(LDA)has attracted a wide range of attention because of its excellent performance.In LDA,the objective function is solved by the defined inter-class divergence matrix and intra-class divergence matrix.However,the traditional LDA algorithm still has some shortcomings,which limit the application scope of LDA algorithm to a large extent.Because of the time character of time series data,the traditional dimensionality reduction method is difficult to be applied directly to dimensionality reduction.In recent decades,many time series classification models based on deep learning have been proposed and achieved good results in a wide range of tests.But,most of the algorithms concentrate on feature extraction and ignore the role of dimension reduction.To sum up,it is of great theoretical and practical significance to study and implement effective high-dimensional data reduction algorithm and time series reduction algorithm.The topic of this paper is from the surface project of national natural science foundation of China.In view of the shortcomings of the existing algorithm for dimensionality reduction of high dimensional data,an efficient algorithm for dimensionality reduction of high dimensional data is proposed in this paper.The main problems and innovations of the algorithm are as follows:1)When the dimension of the training sample is too high,the LDA algorithm will fall into the problem of small sample,unable to find the inverse matrix for the intra-class divergence matrix and thus unable to solve.In general,most LDA algorithms overcome the problem of small samples of data by pre-reducing dimension,but this will cause a large amount of data discrimination information to be lost.This paper proposes a two-step LDA algorithm,which can deal with the problem of small samples without losing data discrimination information.2)LDA algorithm is very sensitive to noise and outliers in data due to its L2 norm solution and its solution process is dependent on the mean of data,which seriously affects the result of dimension reduction.In order to improve the robustness of LDA algorithm to outliers,this paper proposes a weighted mean based on the median instead of the original class mean,and does not need repeated iterative process to optimize.3)at present most of the LDA algorithm are lack of the local structure information of data,or just keep between class or classes within the local geometry information,when the data becomes more complex when it is often difficult to have a stable dimension reduction effect,this paper fully embedded between class and class within the local structure information,improve performance of multimodal data processing algorithm.Aiming at the defects of the current time series dimensionality reduction algorithm,this paper proposes an effective time series dimensionality reduction algorithm.The main work and innovation of the algorithm are as follows:1)In order to improve the performance of time series processing,a time series algorithm based on full convolutional network is proposed in this paper to realize end-to-end processing of time series.2)The attention mechanism is used to realize the position embedding of sequences,which solves the lacks of the full convolutional network in dealing with timeliness.Through the introduction of expansion convolution,the dimension of time dimension and variable dimension of a multivariate time series can be reduced simultaneously,which can effectively extract the sequence features while saving a lot of computing resources.3)Through simulation experiments on a large number of time series data sets,the algorithm proposed in this paper can effectively solve the problem of time series reduction and classification.The two algorithms proposed in this paper can effectively reduce the dimension of traditional high-dimensional data and time series data,there is still a lot of research space for the proposed algorithm.Both algorithms above require training data to provide category labels,but this is often not easy in practice,so the author will consider using only partial class labels while maintaining algorithm performance in the following research.
Keywords/Search Tags:Dimension Reduction, High-dimensional Data, Time Series, Linear Discriminant Analysis, Fully Convolutional Network
PDF Full Text Request
Related items