Font Size: a A A

Research On Top K Frequent Closed Pattern Mining In Data Streams

Posted on:2010-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:R X HanFull Text:PDF
GTID:2178360275453256Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The emergence of the data stream model poses tremendous challenges to traditional skills in data mining. Because data stream is arriving continuously, to manage and mine these potentially unlimited and dynamic data stream is difficult for existing data processing techniques, people must make a study of mining skills suitalble for the environment of data stream. Recently the management and mining techniques for data stream has caused widely attention from researchers at home and abroad and has become a hot spot in the field of study. It is much valuable to study above-mentioned techniques not only in theory but also in a great many application areas where it has cheerful prospect, such as sensor network, weather minitoring and analysis, network minitory and security, web log analysis and so on. In this thesis, we explore serval key issues over data stream mining. Meanwhile, we primarily research on the problem of mining top k frequent closed patterns. In summary, this thesis mainly involves following aspects:(1) Compared to fixed sized dataset used in traditional data mining, we analyze the charactics of data streams, and then introduce several existing data stream models and processing techniques. The characteristics of data stream itself have put forward many claims to mining algorithms of data stream.(2) We analyze and summarize a few frequent pattern mining algorithms used in the fixed datasets and in data streams, and then acquainte ourselves with following aspects involved in mining procedure, such as the storage structures and methods of history information, how to maintain and update data structure when new data is coming, the strategies of pruning branches, the output of result set and so on.(3) Frequent closed patterns contain complete information about all frequent patterns, that is, all frequent patterns and their respective supports could be computed through frequent closed patterns, and the number is commonly orders of magnitude smaller than that of frequent patterns. Then frequent closed pattern tends to be easier to understand and be more suitable to applications in real life. To mine top k frequent closed pattern with their length no less than a given parameter min_l in data stream environment is suggested and studied in this thesis. Then we propose an algorithm based on sliding window technique to mine frequent information in the data which user is interested in at a time immediately before the present. The expected top k most frequent closed itemsets are shown to user. In consideration of some long pattern mining, a long pattern may contain some subsets which are likely to become closed itemsets when they have different supports and the supports of these subsets are greater than their superset(s). Under this circumstance, sub patterns are prone to be displayed. To prevent output patterns too shorter, we offer a parameter min_l to restrict the minimum length, and the lengths of all output patterns must be greater than or equal to the min_l value. This algorithm have good flexibility and extensibility, according to the need of users, the balance between efficiency and expected final result can be obtained through adjusting either the k value or the min_l value or both.
Keywords/Search Tags:frequent pattern, closed pattern, data mining, data streams
PDF Full Text Request
Related items