Font Size: a A A

Sequential Pattern Miningin Time Series Data

Posted on:2016-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:X DongFull Text:PDF
GTID:2308330479976612Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining as an important branch of time series data mining research, can provide an effective way for the analysis of objective laws and knowledge contained in the satellite telemetry data. The hidden useful information mined from it is helpful for the safety management and healthy operation of on-orbit satellites. The power data, more than 2400000 lines, of a certain satellite’s power supply and distribution system is taken as the analysis object. After eliminating outliers, selecting parameters and cycle analysis, five representative telemetry parameters are extracted for feature representation, motif pattern mining and closed pattern mining. In this paper, the main innovative points are as follows:(1) In view of the problems existing in the piecewise linear representation methods, such as low compression efficiency, too refined sharp points in sub-sequences, a feature representation method based on the key points(FR_KP) is proposed. Each point in the whole sequence is scanned sequentially, and determined whether to be a key point by computing the extreme holding time, the of variation amplitude, the slopes’ difference before and after the turning point. The experimental results show that the change trend of the original sequence can be described accurately, with high compression and no distortion.(2) According to the problems existing in the motif mining methods, such as difficult to find a balance between the quality and the efficiency, the motifs with little matches easily overlooked and so on, a global averaging sequences with penalty for motif mining(PGAS_Motifs) is presented. The clustering algorithm K-Means is used for all sub-sequences, and the cluster centers are output as the motifs. There are two key problems in the process of clustering, that is selecting the distance metric and calculating the cluster centers. The dynamic time warping distance with time penalty PDTW and the global averaging sequence calculation method based on PDTW are put forward respectively. The penalty factor is introduced in PDTW while calculating the shortest path, to solve the distance distortion problem caused by error matches. All sequences in the same cluster are considered as a whole in global averaging method, to avoid the transmission error in iterative process, and solve the high complexity problem brought by over fitting. The experimental results indicate that this method can successfully extract the motif sequences in the satellite telemetry data reflecting its working state. PDTW metric is more effective than DTW. The fitting effect by the global averaging method is much better than NLAAF.(3)Aiming at the limitations of redundant result set in traditional frequent pattern mining and difficult to find effective patterns, as well as the problem of low efficiency in the existing closed pattern mining method based on pattern growth, a closed pattern mining method combined vertical data format with heuristic pruning strategy(Clo PMVP) is put forward. The vertical data format in the algorithm SPADE is introduced while mining closed patterns. Only a simple intersection operation is required when calculating the support of a certain sequence, so as to improve the computational efficiency. In the sequence expansion, backward sub-pattern and backward super-pattern are used inspired on the algorithm Clo Span for effective pruning, to reduce the search space. The experiments show that when the length of the average sequence in data set is long or the minimum support is small, the mining efficiency of Clo PMVP is obviously improved compared with Clo Span. The closed sequence set is more compact compared with SPADE, and efficient information is more likely to be found.
Keywords/Search Tags:Telemetry data, Feature representation, Time penalty, Global averaging sequence, Motif pattern mining, Vertical data format, Closed pattern mining
PDF Full Text Request
Related items