Font Size: a A A

Research On Technologies And Algorithms Of Web Logs Mining

Posted on:2010-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Y GuoFull Text:PDF
GTID:2178360272979363Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development and popularity of Internet techniques, the Web continues to grow at an astounding rate in both the sheer volume of traffic and the size and complexity of web sites design. Web brings people rich information and great convenience, meanwhile the high requirement is desired on the design and function of websites. It is important for us to learn about the user's interests and analyze the browsing patterns so as to rationalize the structure of websites and mine potentially commercial value. One of the solutions to these questions is employing traditional data mining techniques on web logs. That is to say, basing on the principles and ideas of data mining, in accordance with the new characteristics of web logs, the traditional way of mining is expanded and improved. Web logs mining has become a new and important research field in the world and its research is of great realistic significance.The entire process of web data mining and web logs data mining is systematically introduced in this thesis. Firstly, in the data pre-process of web logs, a new Maximal Forward Reference transaction partition method is proposed. The method can effectively avoid confusion of the mining results by uninteresting navigation pages. Secondly, by making a deep research on algorithms of frequent pattern mining and FP-tree structure, a new frequent pattern mining construct algorithm IFP-tree is proposed. IFP-tree construct algorithm diminishes breadth of FP-tree so as to reduce main memory space occupation by using dynamic node insert technique. Furthermore, efficiency of frequent pattern mining algorithm is improved by similarity of prefix in IFP-tree. Thirdly, an improved maximal frequent pattern mining algorithm IFPmax is proposed based on IFP-tree. Before subsets checking, the new algorithm pre-judge the node with its level and flag tojudge whether the node have been in the path of maximal frequent pattern, in order to reduce the number of node need to be visited in the process of subsets checking and improve efficiency of Fpmax mining algorithm. Finally, the performance of improved algorithms is illustrated by the experiments. The result shows that efficiency of mining algorithm is improved more obviously given larger database or lower minimum support.
Keywords/Search Tags:web logs mining, data pre-process, frequent patterns, FP-tree, maximal frequent patterns
PDF Full Text Request
Related items