Font Size: a A A

Research And Implementation Of Frequent Itemset Mining Over Data Stream

Posted on:2017-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330518495805Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of communication technology and Inter-net technology in recent years,data flow values attract more and more attention,for they contain many valuable and timely information.Data mining technol-ogy is the technology of acquiring this information.Association rule mining technology is an important branch of data mining technology,which has been drawn more and more attention for it can get the valuable and concealed rel-evance model from a large number of data.Frequent item sets mining is the core problem of association rule mining,and it is the focus of the research of association rules.Although many classic frequent item set mining algorithm have been pro-posed and are able to complete the processing of massive data sets of task,they can not adapt the data stream environment well,and most of them use a com-plex data structure.It is difficult to expand them parallel,and can not adapt them conductively to a variety of application scenarios.In view of the problems listed above,this paper studies the association rules mining in data stream,and designs MFIPS,a parallel data stream mining algorithm based on matrix.Based on the sliding window model,the data block method is introduced,and the original data in the window is compressed into a 0-1 matrix,which can greatly improve the space efficiency.By calculating the degree of the support of the item sets using matrix vector operations,the data mining process could obtain very good parallelism and scalability;the pruning strategies for mining applications are applied and the efficiency of the algo-rithm is improved as a result.Experiments show that the algorithm has better efficiency in space and time,and the structure is more suitable for distributed parallel implementation.What's more,this paper designs and implements a data stream oriented frequent item set mining system based on the MFIPS algorithm.The system architecture is based on Storm,a distributed real-time computing framework.In the system proposed in the paper,the transaction acquisition module,data pre processing module,data block generating module and frequent item set mining module of MFIPS algorithm are designed and implemented.Also,the effi-ciency is significantly increased compared to the traditional method with the usage of Intel AVX instruction set optimization of mining core process.The work of this paper proves that the MFIPS algorithm has a good performance and a good effect on the data stream,and it has the practical value.
Keywords/Search Tags:data stream, frequent itemset mining, parallelization, Storm, AVX
PDF Full Text Request
Related items