Font Size: a A A

Research On Web Pattern Mining Method Based On PrefixSpan Algrithm

Posted on:2017-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:H B JiFull Text:PDF
GTID:2348330512451234Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,the era of big data has came quietly,the size of data has shown an explosive growth compared with the previous one,and data structure is increasingly complex and changeable.In such circumstances,The higher consumption of traditional data analysis method in the time complexity and space complexity directly leads to a difficulty to deal with the rules in response to large-scale data mining.This thesis aims at improving the existing sequential pattern mining algorithm PrefixSpan,and applying it in the Web user behavior pattern mining and sequential pattern mining.The contents of this paper are as follows:(1)Comparative analysis on the existing sequential pattern mining algorithm.The traditional PrefixSpan still needs a mass of time and space to complete the process of building and scanning projection database,which has seriously affects the efficiency of the algorithm.Meanwhile,in terms of the users in Web sequence mode,analyzing the regularity and preference embodied in the actual process of browsing and accessing.(2)IPWRPIS algorithm is Proposed,compared with conventional PrefixSpan algorithm,using sequence to replace item set to expand,abandon mining sequence number is less than threshold min_support projection database and direct recursive local frequent item way as the main idea,apply it in web user behavior pattern mining.The experimental results show that the efficiency of improved algorithm has improved compared with PrefixSpan,and it can effectively obtain the user access behavior pattern information of the web page,carrying on the analysis and the research to rules of the log records.(3)IPPSIFO algorithm is given.The algorithm is aimed to reducing generated scale of projected database and cutting down scanning time.First,increase screening pruning operations and directly give up the non frequent items in the projected database sequence generation process.Second,using projection sequence layer in the specific projection processing,so as to achieving the result of optimizing the operation effects,and apply in the Web sequential pattern mining,the efficiency of the algorithm is significantly improved.To sum up,this article improves PrefixSpan algorithm,applied separately to web users behavior pattern mining and web sequential pattern mining.By contrast of experiments,improved algorithm are superior to the classic PrefixSpan algorithm.However,sequential pattern mining is still faced with many new challenges,such as how to apply this work in more practical applications,which is necessary to further explore.
Keywords/Search Tags:Data mining, PrefixSpan algorithm, Sequential patterns mining, Web log mining
PDF Full Text Request
Related items