Font Size: a A A

The Research Of Web Log Mining Based On Intersection Relation

Posted on:2007-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:W GuoFull Text:PDF
GTID:2178360182486470Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web mining is applying the data mining technique to the Web environment, which is discovering and extracting the potential, effective, novel and interesting pattern and knowledge from Web document or Web activity. According to the different research objects, the Web mining is divided into: Web content mining, Web structure mining and Web log mining. By mining the Web log, the administrator can discover the user' browsing pattern, understand the user' purpose and behavior;improve the Web server' performance and design;provide individuation service;and discover the potential client group and so on. The thesis makes use of the intersection relation to solve the problem of mining user frequent access patterns. At the same time, considering the difference between content page and navigation page, the thesis brings forward target frequent access patterns. The main work of the thesis are showed as following:(1) The detailed description of the definition, classification, characteristic, application field and research direction of Web mining;The thesis describes the concept, research objects, application and technology of Web log mining;And the thesis introduces the three phases of Web log mining: data preprocessing, pattern discovery and pattern analysis;And the thesis describes a classical method called MFR algorithm , which identifies transactions on the phrase of data preparation;(2) The thesis introduces the algorithm similar to the Apriori, then it puts forward GITC algorithm. By theoretic analysis and experimental tests, the algorithm can be used to discover all types of users frequent access patterns. When the support threshold value is lower, the performance of the algorithm is very good.(3) Considering the difference between content page and navigation page on Web log mining, the thesis puts forward the concept of target frequent access patterns, the algorithm of mining target frequent access patterns called MTFAP algorithm and the way to use target frequent access patterns. Making use of target frequent access patterns, we can forecast user' target.(4) We designing a Web log mining prototype system. The system includes four function modules. These modules fulfill the preprocessing of the original log data and three algorithms which include the algorithm similar to the Apriori, the GITC algorithm and the MTFAP algorithm. At last the performance of these algorithms is analyzed and validated by real log data.
Keywords/Search Tags:Data Mining, Web Log Mining, Intersection Relation, Target Frequent Access Patterns
PDF Full Text Request
Related items