Research On Transaction Sequence Data Mining

Posted on:2012-08-26

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C L Tang

Full Text:PDF

GTID:1118330371965443

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Transaction Sequence Data describes the way prices of commodities or securities change with time in various transactions. The analysis of such data could provide the quantitative basis for merchants or investors in making marketing strategies or choosing investing methods, thus making the mining technology for Transaction Sequence Data a topic of great interest to both research and application.The mining of Transaction Sequence Data aims to identify the price changing patterns of commodities or securities and its major tasks include classifying, clustering, association analysis, anomaly detection and so on. It can also conduct a variety of expanded data analysis and mining, such as association rules allowing for time constraints, pattern analysis for data with missing values and so on.At present, extensive research on Transaction Sequence Data has adopted other sequence data mining and analyzing approaches, such as the one which treats the discrete chronological order as continuous and combines time-series structured or non-structured patterns with various complex algorithms or the one which neglects the numeric sequence elements and constructs characteristics into sequence of events for the mining of frequent patterns. There is also one approach which denotes numeric sequent elements in characters and search through character sequence patterns. Such research approaches are confronted with the following two types of problems. On one hand, they fail to give equal weight to the discrete chronological order and numeric value elements, two intrinsic features of Transaction Sequence Data. On the other hand, they do not utilize the available domain knowledge in the areas of economy and finance. To effectively locate various frequent patterns that fit the domain significance with the innate features of Transaction Sequence factored in will make the results of data analysis and mining more valid.Departing from basic Transaction Sequence Patterns, this paper defines five Transaction Sequence Primitive Patterns, i.e. Flat Pattern, Top Pattern, Bottom Pattern, Growth Pattern and Decline, and their correlations, i.e., Transaction Sequence Composition Patterns, with emphasis on three issues, namely, the mining of Transaction Sequence Patterns, the inquiry and prediction of Transaction Sequence Patterns and clustering based on Transaction Sequence Patterns. Below are the major findings of this paper:(1) In response to the mining of Transaction Sequence Patterns, this paper proposes an algorithm for the mining of Frequent Transaction Sequence Composition Patterns based on the algorithms for primitive pattern quick search and the mining of their TOP K frequent itemsets.Frequent Transaction Sequence Composition Patterns are composed of multiple Transaction Primitive Pattern Frequent Sets which satisfy certain time constraints and a cycling relationship. For this mining task, as the space of candidate primitive patterns grows exponentially, the efficiency issue poses as a bottleneck.Five Transaction Sequence Primitive Patterns are firstly defined according to the domain knowledge. A universal similarity measure for sequence patterns employing a stretchable distance function is put forward as well as its trend merging and symmetric calculation. Thus, Transaction Sequence Patterns (Flat Pattern excluded) with "stretchable" similarities are converted into similarity undirected graphs for spectral clustering. Then, based on the substitution of result cluster approximation for Maximal Cliques, time constraints are introduced to replace Flat Patterns so as to locate Frequent Composition Patterns composed of multiple Transaction Sequence Primitive Pattern Frequent Sets. Several similarity calculations are compared in real stock transaction sequence sets to obtain the calculation accuracy and the Frequent Composition Patterns achieved have good application interpretations.(2) In response to the inquiry of Transaction Sequence Patterns, this paper proposes two effective similarity inquiry algorithms.In practical applications, Transaction Sequence Patterns embody an important type of similarity, called "stretchable" similarity, which causes Transaction Sequence Patterns to lengthen or shorten "strecthably" in the time dimension but maintain the overall trend in the value dimension. To define appropriate similarity measures to capture such similarities is an important issue demanding solutionsRegarding the slight variations between sequences, firstly "merging" is conducted on monotonous intervals of sequences to be inquired, then candidates of sequence patterns are generated according the ratios of length to amplitude of each interval, and lastly calculation is done employing the stretchable distance function as the similarity measure with final results returned. For the changes in price intervals of Transaction Sequence, firstly all sequences are normalized, then calculation is done based on the improved definition of stretchable distance function and inquiry results are obtained. As shown by the experiment results, both "trend merging" and "price merging", the two similarity inquiry algorithms, can find all the pattern results whose overall shapes are the "enlargement" or "reduction" of the given sequence patterns.(3) In response to the prediction of Transaction Sequence Patterns, this paper proposes a sequence pattern trend prediction algorithm with high accuracy.Prediction, based on given Transaction Sequence Datasets, estimates the numerical attributes of successive intervals of given sequences to be inquired. Due to the complexity of data variation, conducting trend prediction in Transaction Sequence is more meaningful than accurate prediction and thus to increase the accuracy of trend prediction for given sequences is key to prediction.Based on the similarity inquiry of "price merging", utilizing the methods of Parzen window density and KNN estimation, this paper proves that to conduct weighted average on the patterns with a length of succeeding the TOP K results of the candidate sets of inquiry results can approximately replace all the inquiry results and subsequently produce the prediction results in a comprehensive way. The results of experimenting with real stock transaction sequence sets show that trend prediction has a high accuracy level.(4) In response to Transaction Sequence clustering, this paper proposes a clustering algorithm which factors in the goal function with time constraints.The choice of objects is crucial to the clustering of Transaction Sequence. Within a certain time span, an overall growing or declining trend can more faithfully reflect the price patterns of commodities or securities. Therefore, to extract, from the original Transaction Sequence, such growth or decline patterns with local reflection for characteristic creation and clustering is more valuable than the direct use of the original Transaction Sequence.Firstly, the inner structure of Transaction Sequence Sets is studied from the perspectives of commodity or security prices and their trends, and then a growth or decline pattern reflecting price trends is defined along with two progressive similarity measures, i.e. shifted window combined distance function and angle vector distance function. On this basis, a goal function with time constraints taken into account is designed for research which firstly studies partitioned clustering and then hierarchical clustering. The experiment results show that, under time constraints, such a characteristic extraction method for Growth or Decline Pattern and the two distance functions between patterns can effectively produce clustering results, which can be satisfactorily interpreted.

Keywords/Search Tags:

Transaction Sequence, Sequence Pattern, Similarity Metric, Similarity Query, Clustering, Prediction

PDF Full Text Request

Related items

1	Research Of Sequence Clustering Algorithm Based On Weighted Similarity
2	Research On Similarity Query Over Sequence Data
3	Sequence Similarity Based On Co-occurrence Word Frequence
4	Pedestrian Behavior Pattern Recognition And Analysis Of Indoor Location Data
5	Sequence Recommendation Methods Based On Temporal Similarity Search
6	Research On Semantic Similarity Metric Based On WordNet And Its Application In Query Suggestion
7	Based On The Behavior Properties And Dependencies Of Bpel Similarity Measure Research
8	Research On Algorithm For Similarity Search Of Biological Sequence Database
9	Functional similarity metric for UML models
10	The design and implementation of an extended database system to support biological sequence similarity analysis