Interesting Pattern Mining Algorithm. Database

Posted on:2008-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:X F You

Full Text:PDF

GTID:2208360215483341

Subject:Computer application technology

Abstract/Summary:

With rapidly advanced of the database technologies and the extensive applications of the DBMS, the ability of producing and collecting data is improved dramatically, resulting in a huge volume of data. Generally speaking, many important patterns or interesting knowledge are hided behind the tremendous amount of data. Consequently, people hope to execute a high-level analysis on the collected data in order to find out useful knowledge, which can enable them to better utilize these data and provide the decision-makers with powerful supports. Unfortunately, the traditional statistical methods cannot meet the demands at present. This results in the emergence of the data mining (DM) techniques.Data mining, also known as Knowledge Discovery in Databases (KDD), is a process that extracts potential, valid, novel, useful, ultimately understandable and applicable knowledge from vast, incomplete and noisy data. It is an intersected subject that involves the database system, computation theory, artificial intelligent, statistical theory and the cognize science, which can perform association analysis, classification, clustering, forecasting, outlier detection and evolution analysis on the data. Though the data mining techniques has a short history, but it indeed absorbs great research enthusiasms and wide attentions from many researchers as well as the industrial experts all over the world, due to its enchantment and broad application prospect. It is an important research area in data mining that extracting interesting patterns from databases. Interesting pattern is a itemset or a rule interested by user, which is potential, valid, novel, useful and ultimately understandable and applicable. The research results can be used widely in many application areas for decision-making and analysis, such as finance, stock, customer management, retailing, manufacturing, geologic prospecting, electronic business and so on. That has both in-depth theoretic value and extensive application value.In 1993, Agrawal et al first proposed the problem of mining association rules(frequent patterns) from customer transactional databases. Since then, many researchers have been done extensive studies on the problem of mining interesting patterns. They not only optimize the existing algorithms but also propose many interesting patterns. Such as exceptional patterns, unexpected patterns and change patterns .etc. we design a new algorithm for identifying against-expectation patterns. Our algorithm includes, in particular, (1) generation of a set of interesting items (i.e., items with a large relative-contrast), and (2) identification of interactions within these interesting items, based on the Nearest-Neighbor Graph and Correlation Analysis techniques.The interesting patterns mined from sequence databases include Complete periodic patterns, Segment-Wise periodic patterns, partial periodic patterns, surprising periodic patterns.etc. we design a new algorithm for identifying rarely periodic patterns (RPP). RPP is different from all kinds of periodic patterns and weak periodic patterns evaluated by support. There is a strong relative-contrast between every two events belong to one RPP like Aâ†’B. B likely occurs n times after A occurred m times. Although RPP's support may be low, RPP has broad application area, such as biology computing,earthquake forecasting,stock analysis and so on. Trying to discover the RPP is a hard job, when no period length is known in advance, and the noisy items are randomly generated, due to the RPP may be hiden in frequent itemsets or rarely itemsets. We introduce a new transformation wavelet tree (TWT) algorithm, to process the database in advance., Mining the RPP based on the process result.The algorithms for mining interesting patterns from datastream include point supervise and interval supervise. We design a compact data structure, TTI, which consists of three nested tiers of time intervals, for monitoring data entrance into the sliding window so as to identify the current abnormity at any time. The threshold for identifying abnormal items in our approach is dynamically generated by an algorithm KIC (kernel estimation and confidence interval clustering), whereas existing algorithms use predefined (static) thresholds. This leads to more accurate outputs. Based on the threshold, we design an algorithm SWMA to reduce time and space complexities.Many interesting patterns has been mined from multi-database, such as high-voting patterns, many kinds of exceptional patterns and so on. we propose a new method named KEMGP (kernel estimation for mining global patterns) to tackle this problem, which adopts kernel estimation to synthesizing local patterns for global patterns.Tha main contribution of the paper is shown as follow.Â·we expound the KDD's definition, function, extracting framework, and interesting pattern's sorts, data sourse, extracting algorithm and its significance.Â·Four kinds of interesting patterns and their interesting measures are defined.Â·An effective new extracting algorithm for each interesting pattern.Â·We test our algorithms on simulative databases or real databases and the experimental results show that our method is effective and promising.

Keywords/Search Tags:

against-expectation pattern, segment-wise periodic pattern, abnormal burst pattern, global patterns, data stream, multi-database mining

Related items

1	Research And Implementation On Burst Patterns Mining In Database
2	Research On Mining Periodic Frequent Patterns Common To Multiple Sequences
3	Mining Local Periodic Patterns In A Discrete Sequence
4	Mining Sequential Patterns With Periodic General Gap Constraints
5	The Study On Frequent Patterns Mining And Data Predicting Over Data Streams
6	Mining Approximate Frequent Patterns With Wildcards And Gap Lengths
7	Partial Periodic Pattern Mining Method For Time Series Of Multi-source Data
8	Research On Mining Sequential Patterns With Periodic Wildcard Gaps
9	Pattern Mining Algorithms Over Data Streams
10	Research On Closed Pattern Based Data Mining Technologies