Font Size: a A A

Research On Time Series Data Mining And Its Application

Posted on:2016-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:B F ZhengFull Text:PDF
GTID:2308330461452662Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the Internet and information technology, time series data has been generated in large quantities, and becomes one of the world’s top ten challenging data mining problems. It is very important for us to take advantage of the time series data, and discovery useful knowledge.Time series data is a sequence of data points, which consists of successive measurements made over a time interval. It exists in many areas such as the telecommunications industry, stock market, network intrusion, biomedical, and e-commerce market. Time series data has some specific characters, such as large volume, high dimensions, updated with time, and usually continuous. So it is hard to get good results when traditional data mining algorithms directly applied to time series data. To solve these problems, we do research on time series data, and propose MBFS (Manifold-learning Based Feature Selection) algorithm and DWSVM (Double Weighted Support Vector Machine) algorithm, and apply them to predict driving fatigue task. The main work of this paper is as follows:Firstly, to deal with complicated space and high dimensions of time series data, we propose MBFS algorithm. This algorithm combines the advantages of metric learning, manifold learning and sparse coefficient vector learning method. According to the contribution of each feature in the sample data, we select features with high contribution. ITML (Information Theory Metric Learning) method maps data to a new Euclidean distance space. Manifold learning method could find low dimension manifold in high dimension space, it helps to find the inherent structure of the data and reduces the dimensions. Compare with the traditional feature selection algorithm, experiments show this feature selection method can greatly reduce the difficulty of classification, and improve the prediction accuracy.Secondly, to overcome the difficulty of classifying unbalanced data set, we proposed DWSVM (Double Weighted Support Vector Machine) model, which based on the weighted sample and weighted sample characteristics. This algorithm is based on the contribution of classified samples. We assign different weight to small number of samples and large amount of samples. We use MBFS algorithm to calculate the weight of each feature and reconstruct the kernel method. The experimental results show that in unbalanced dataset, the performance of double weighted support vector machine is much better.At last, this paper applies this method to predict driving fatigue task. The main tasks of this project include building the experiment platform, collecting data, data pre-processing, data segmentation, feature representation, feature selection, building model and model validation. The experimental results show that this data mining system has achieved a relatively high accuracy in predicting fatigue driving task, and it meets the needs of practical applications.
Keywords/Search Tags:time series, feature selection, manifold learning, metric learning, double weighted support vector machine, fatigue driving
PDF Full Text Request
Related items