Font Size: a A A

Time Series Data Mining Based On Large Margin Theory

Posted on:2013-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X YuFull Text:PDF
GTID:1268330392967567Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The time series have been widely used in each field. Sequence data analysis anddata mining become hot spots and continuous attention has been paid in scientific area.Data of high dimension and features such as non-independent of time dimension informa-tion lead to the difficulty of using information effectively in knowledge discovery fromsequence data. Therefore, many traditional machine learning algorithms can not readilyobtain satisfactory results. Aiming at the particularity of the time series data, the largemargin theory in machine learning is adopted to study the time series data mining in thisdissertation. Some of the important problems are as follows:A sequential similarity measure method is designed based on the large margin the-ory. As a core problem in machine learning, similarity measure directly relates to theeffect of algorithm in the time series data mining. According to various phase shift phe-nomena commonly existed in sequential sample, a dynamic time warping similarity mea-sure method is designed based on the large margin theory. Compared with the Euclideanor dynamic time warping distance, the matching strategy for sequence distortion is im-proved. As for the distance instability phenomenon of high-dimensional data measure,the effectiveness of distance measurement is optimized through the norm learning.The feature extraction of supervised learning/data re-expression algorithm is de-signed based on the sequential characteristics of fragments. One of difficulties in thetime series data mining is that the effective identification information is often hidden inthe local sequence fragments rather than the entire area. This phenomenon often existsin sequence problems such as the trajectory from image edge. By contrasting variousfragments of useful information, several fragments with the largest discriminant capacityare selected to represent the entire sequence. Compared with traditional methods, thisfragment-based feature extraction/data re-expression method is especially suited for thetrajectory from the edge or sequence data obtained by curve. It also can improve theclassification accuracy, efficiency, and interpretability. Besides that, this method is com-pared with the well-known similar algorithm shapelet.The classification performance ofthis model is verified by the experiment.The sequence coarse graining algorithm is proposed based on the large margin the- ory. The changing relationship between useful and useless information is studied duringthe transformation of sequence data from values to symbols. Although some useful infor-mation is lost during the transformation, useless information is also reduced significantly.A supervised discretization method of sequence data is proposed to improve the classifi-cation accuracy and efficiency, which is also verified by the experiment.The sequential classification model is designed based on critical cases. During thedesign of critical sample set, the efficiency of each sample is evaluated by using the largemargin theory. The weights of samples which can produce the largest assumptions mar-gin are increased while the weights of outliers and the redundant samples are decreased.Those above can improve the generalization ability of the classification model. In addi-tion, the computational efficiency of classification model can be improved by reducingredundant training samples. The validity of this method is confirmed by the experiment.
Keywords/Search Tags:Time series data mining, large margin, dynamic time warping, feature seg-ment, prototype selection
PDF Full Text Request
Related items