Research On Technologies And Algorithms Of Web Logs Mining

Posted on:2010-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Guo

Full Text:PDF

GTID:2178360272979363

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularity of Internet techniques, the Web continues to grow at an astounding rate in both the sheer volume of traffic and the size and complexity of web sites design. Web brings people rich information and great convenience, meanwhile the high requirement is desired on the design and function of websites. It is important for us to learn about the user's interests and analyze the browsing patterns so as to rationalize the structure of websites and mine potentially commercial value. One of the solutions to these questions is employing traditional data mining techniques on web logs. That is to say, basing on the principles and ideas of data mining, in accordance with the new characteristics of web logs, the traditional way of mining is expanded and improved. Web logs mining has become a new and important research field in the world and its research is of great realistic significance.The entire process of web data mining and web logs data mining is systematically introduced in this thesis. Firstly, in the data pre-process of web logs, a new Maximal Forward Reference transaction partition method is proposed. The method can effectively avoid confusion of the mining results by uninteresting navigation pages. Secondly, by making a deep research on algorithms of frequent pattern mining and FP-tree structure, a new frequent pattern mining construct algorithm IFP-tree is proposed. IFP-tree construct algorithm diminishes breadth of FP-tree so as to reduce main memory space occupation by using dynamic node insert technique. Furthermore, efficiency of frequent pattern mining algorithm is improved by similarity of prefix in IFP-tree. Thirdly, an improved maximal frequent pattern mining algorithm IFPmax is proposed based on IFP-tree. Before subsets checking, the new algorithm pre-judge the node with its level and flag tojudge whether the node have been in the path of maximal frequent pattern, in order to reduce the number of node need to be visited in the process of subsets checking and improve efficiency of Fpmax mining algorithm. Finally, the performance of improved algorithms is illustrated by the experiments. The result shows that efficiency of mining algorithm is improved more obviously given larger database or lower minimum support.

Keywords/Search Tags:

web logs mining, data pre-process, frequent patterns, FP-tree, maximal frequent patterns

PDF Full Text Request

Related items

1	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
2	Research On Mining And Querying Frequent Patterns Based On Simplified Frequent Pattern Tree
3	The Techniques Research On Frequent Pattern Mining
4	Research On Mining Frequent Pattern Based On The Optimized FP-Tree In Data Streams
5	Research On Algorithms For Mining Web Logs
6	Research On Data Mining Technology For Very Large Databases
7	The Techniques Research On Frequent Pattern Mining
8	Research On Mining Algorithms Of Maximal Frequent Item Sets
9	Key Techniques Of Map Database Frequent Pattern Mining
10	The Research On The Algorithms Of Mining Distributed Maximal Frequent Patterns