Font Size: a A A

Feature Selection For High-dimensional Time-series Data Based On Two Classes Of Metric Functions

Posted on:2024-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2568307157472264Subject:Statistics
Abstract/Summary:PDF Full Text Request
Feature selection is one of the important steps of data preprocessing in data mining.It can reduce the redundant degree and computational complexity of data,and improve the perfor-mance of the model by filtering the noise and irrelevant features.Rough set theory is an effec-tive method for feature selection.Its main advantage is that can directly obtain the potential information from data without providing any prior knowledge.In the era of big data,time-series data widely appeared in all fields of life.Time series is a group of data series recorded according to time sequence.It has the characteristics of large scale,high dimension and continuous up-dating of data.For high-dimensional time series,there is some correlation between variables.How to effectively mine valuable and meaningful information from these complex and huge time series data is one of the important research contents in the field of big data.For feature selection of high-dimensional time-series data,time-series information system and time-series decision information system in the view of rough set,information entropy and dependency function.Two different measurement functions based on neighborhood mutual in-formation and multi-dimensional dynamic time warping distance of generalized Mahalanobis(DTW_M)are proposed to get the feature selection.Time-series neighborhood rough set model and time-series neighborhood decision rough set model are then discussed.The main results in this dissertation can be summarized as follows:(1)For the high-dimensional time-series data,a time-series information system is defined and the time-series neighborhood relation is proposed.Based on the idea of neighborhood gran-ulation,the uncertain information measures including time-series neighborhood entropy,time-series neighborhood conditional entropy and time-series neighborhood mutual information are studied.The nearest and farthest neighbor feature selection method is introduced into high-dimensional time series data,the significance of attributes is defined,and the scale of feature selection is controlled by introducing the cumulative significance contribution rate.Then,the initial feature set with strong classification ability is obtained by using the threshold.Further-more,attribute redundancy is defined by time-series neighborhood mutual information,and the attribute with the lowest importance and the greatest dependence in the initial feature set is removed to obtain the final feature subset.Experiments on UCR data sets verify the effective-ness and superiority of the proposed algorithm for feature selection of high-dimensional time-series data.(2)For the high-dimensional time-series data with unequal length,the generalized Ma-halanobis distance is used to introduce the dynamic time warping(DTW_M).It is used to get the similarity between time-series data.The time-series decision information system is defined for high-dimensional time-series data with decision attributes.And the time-series neighborhood relation and time-series neighborhood rough set are then proposed.By defining internal and external significance of attributes,a feature selection method based onDTW_M metric for high-dimensional time-series data is presented.Experiments on UCR data sets verify the effective-ness and superiority of the proposed algorithm for feature selection of high-dimensional tem-poral data.
Keywords/Search Tags:Neighborhood rough set, Feature selection, High-dimensional time-series data, Time-series neighborhood mutual information, DTW_M metric
PDF Full Text Request
Related items