Font Size: a A A

Instance Selection Strategy For Time Series Classification

Posted on:2015-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:T T ZhaiFull Text:PDF
GTID:2308330461473484Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Time series is a kind of complicated data object closely connected with time. It is universal in real life, such as the annual economic growth rate of one country or the amount of information handled by a switcher per hour. Moreover, much multimedia data can also be transformed in the form of time series for processing. Recently, with the massive growth of time series data, Time Series Data Mining has been a research focus and known as one of the top ten most challenging directions in Data Mining research due to the unique structure of time series data.Time series classification, which has been applied in various areas of society, is an important task in Time Series Data Mining. Generally, in a classification task for time series, data miners routinely encounter datasets that are gigabytes in size and have large amount of noise. Therefore, it is necessary to preprocess data before classification, eliminate those redundant and noisy data and reduce the original dataset to a moderate size so that the subsequent mining can be carried out smoothly. Instance selection is one such technique. By reducing the data size, it reduces the space required for storing instances and the time required for processing instances. And at the same time, it tries to keep the most representative instances in order to obtain more credible and accurate classification result. Exploiting the inherent property of time series data—hubness, this paper studied the instance selection strategy for time series classification. The specific work is as follows:Firstly, we studied the greedy instance selection algorithms based on hubness. For the imbalanced class distribution problem and the redundancy problem of the selected instances existed in the advanced instance selection algorithm INSIGHT, we proposed two improved algorithms. The first one adopted the method of selecting both by class and by proportion to ensure these selected instances distributed uniformly in each class of the training set. The second one discarded those instances with no contribution to accuracy by conducting two selections. The experimental results on 34 time series datasets showed the effectiveness of our proposed improved algorithms.Secondly, we studied the heuristic instance selection algorithms based on hubness. Due to the facts that the greedy instance selection algorithms in the last chapter are difficult to obtain the optimum solution of instance selection problem and can not adaptively determine the optimum number of selected instances according to the characteristics of each dataset, we proposed a new immune binary particle swarm optimization algorithm (IBPSO) to alleviate both problems, whose objective is to find out the smallest instance combination from the original training set with maximal classification accuracy. The proposed IBPSO integrates a novel immune mechanism into the basic binary particle swarm optimization algorithm (BPSO) proposed by Kennedy and Eberhart. Its immune mechanism includes vaccination and immune selection. Vaccination employs the hubness score of time series and the particles’ inertance as heuristic information to direct the search process. Immune selection procedure always discards the particle with the worst fitness in the current swarm for preventing the degradation of the swarm. We demonstrated the effectiveness and scalability of the proposed immune mechanism on the 34 small and medium datasets and 10 larger ones respectively.Finally, we proposed a two-stage ensemble method on the basis of the algorithm in the last two chapters. The algorithm can take full advantage of the merits of greedy and heuristic instance selection algorithms so that an effective instance combination can be selected in a reasonable time.
Keywords/Search Tags:time series classification, instance selection, data reduction, binary particle swarm optimization, immune algorithm
PDF Full Text Request
Related items