Font Size: a A A

Frequent Pattern Mining Algorithm Research For Data Stream

Posted on:2009-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiuFull Text:PDF
GTID:2178360308979402Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the information era, data mining has been an important research direction at present. Data mining technology has been maturating through over ten year's development. However, a new data mode appeared in recent years, which widely exists in real world. For example, the log files of web server, stock trading, sensor network, weather and environment detection, all of these create a vast amount of data stream. It is a challenging work to mining data streams. The traditional data mining technology can only deal with static data, but is helpless for this kind of vast and fast data stream.Frequent pattern mining is an important task in data mining. In recent years, there have been many valuable research fruits on data stream frequent pattern mining. But in those researches, plenty of memory space is still needed, and the mining efficiency is not high enough. Especially, new data can't be updated efficiently. For these problems, this thesis proposed a new data stream frequent pattern mining scheme NCH-DSFM (New Compact Hash Tree Based Frequent Pattern Mining on Data Stream), including corresponding data filtering, coding method, new synopsis structure and mining algorithm. Besides, the thesis implemented a frequent pattern mining system for data stream.Firstly, a data filtering and coding method was proposed based on Hash structure. Slide window was used to store data stream, which can restrict the data amount treated every time. At the same time, Hash table is used to filter and code the data in the basic windows to minish the data amount to be mined, simplify the data type and get the canonical order of all items. This process can help construct and update prefix tree.Secondly, a new synopsis structure NCH-Tree was proposed. NCH-Tree introduces the idea of B+ tree to store time information of all transactions in a list, which can make the updating process efficient. A new mining algorithm is designed base on this structure, the algorithm can satisfy the need of data stream mining.Thirdly, a data stream frequent pattern mining algorithm was further proposed, which can satisfy the need of data stream mining well.Lastly, a frequent pattern mining system for data stream was designed and implemented. In this system, another synopsis structure is designed to store frequent pattern set. Using this structure, the mining algorithm can communicate with user and return different results according to different requirement.Experiment results show the data stream frequent pattern mining scheme proposed by the thesis can update the newly arriving data more efficiently and ensure high mining accuracy. At the same time, the mining efficiency of the algorithm is more excellent than related ones.
Keywords/Search Tags:Data stream mining, Frequent pattern, Slide window, Hash table, B+ tree
PDF Full Text Request
Related items