Font Size: a A A

The Incremental Mining Algorithms Of Sequential Patterns

Posted on:2005-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:C JinFull Text:PDF
GTID:2168360122491533Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid development of imformation technology, large amounts of data have been accumulated in many fields. It has become more and more urgent to mine useful imformation and knowledge from such huge data. The technology of Knowledge Discovery in Database (KDD) has emerged. The steps of KDD include data preparing, data mining, explaining and evaluation, etc. Of these steps Data Mining (DM) is the most important. The mining of sequential patterns is one of the hottest spots in the field of DM.The purpose of sequential patterns mining is to find the frequent sequences in transaction databases and then use these patterns to help decision-makers. The mining algorithms are devided into two types: the normal ones and the incremental ones. The normal algorithms of sequential patterns mining such as AprioriAll assume the database is static and even a small change in the database will require the algorithm to run again completely to get the updated frequent sequences. In practice, the content of a database changes continuously, and data mining has to be performed repeatedly. If each time the mining algorithms must be rerun from scratch, it will be very inefficient and time-consuming. However, the incremental algorithms are different. By using the results of previous mining and searching the added small databases instead of the whole large updated databases, they can improve the executing efficiency and thus reduce the time of mining maxtial frequent sequences. Of all the incremental algorithms, the IUS is the most advancedat present.However, there are still some neglections in IUS. Firstly, many large databases not only add data, but also delete data from time to time, while IUS doesn't consider the deleting. Secondly, all the algorithms are based on the situation of "database updating", they don't think of "parameter changing", which means the minimum support may be often changed by users. Besides, some inaccuracies alsoexist in IUS. This thesis first corrects the inaccuracies of IUS. Then, based on the principles of designing incremental algorithms of sequential patterns, it proposes an algorithm called USP to solve the fist question. While considering both adding and deleting data, USP chooses frequent and negative border sequences from the results of previous mining together with new frequent and negative border sequences of updated database. These sequences are regarde as the candidate sequences of the next iteration. An algorithm called CMS is proposed to solve the second question. CMS uses previous results to accelerate current mining process. Finally, this thesis analyzes the improved algorithms and points out the advantages of them.
Keywords/Search Tags:KDD, Data Mining, sequential pattern, incremental mining, IUS
PDF Full Text Request
Related items