Font Size: a A A

Study Of Sequential Mining In Database

Posted on:2002-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2168360032457008Subject:Computer applications
Abstract/Summary:PDF Full Text Request
In face of the soaring amount of information,people are intimidated by "Data Bomb"while they fall into the fear of "shortage of knowledge".KDD coming for the need has become one of the strongest weapons that people can use to solve the paradoxical problem. Data mining is the nontrivial process of identifying valid,novel,potentially useful, and ultimately understandable patterns in data.Algorithm is the key part in KDD,for this reason to research for efficient. On one hand ,Data Mining is used to process Large database, and so the efficiency of algorithm is the most important; On the other hand the computer in use is not satisfying to the processing of Large database. Consequently , we should modify present algorithm to fit the need above. On account of all above, this paper chooses algorithm of Data Mining as the research.This paper deeply researches the Sequence Mining Algorithm. Mohammed J.Zai puts forward the algorithm SPADE. Comparing with the Apriori, SAPDE has the feature of less time-consuming and because of its vertical storage way the time that database is scanned will be much less and so it is more efficient. In spite of the advantages above, SPADE has its own disadvantages as below:1.It will produce a large number if candidate sets in the processing;2.The frequent sequence that SPADE produces are limited to special item. To make up the deficiencies above, this paper put forward the Middle Matching algorithm, whose main idea is to produce candidate sets through matching two sequences in the middle place to reach the aim of reducing the number of candidate sets, meanwhile processing in that way can avoid the second disadvantage above to expand the domain in which the algorithm can be used.This paper compares difference of the candidate sets between two algorithms. The experiment proves that the Middle Matching algorithm is more efficient than SPADE.The algorithm researches in this paper can be used for sequence mining or searching frequent sequences related to time. At the last part of this paper ,an example of time series-based stock mining is taken to assessing the Middle Matching algorithm.
Keywords/Search Tags:Data Mining, Support, Middle Matching Algorithm, Candidate Set, SPADE
PDF Full Text Request
Related items