Font Size: a A A

Users Frequent Path Fast Web Log Mining Algorithm

Posted on:2006-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Q DuFull Text:PDF
GTID:2208360152491886Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology and the popularization of internet, the WWW data stored in the server expand fast. Web mining applies data mining techniques to large scale web data to reveal the hiding patterns about user browsing behavior, The research on Web mining has a lot of application. The Web log contains the visit information of all users, especially the path information. The analysis of this kind of information is useful for the website designer to know the users' tendency and custom. The designer can use the result of analysis to optimize the structure of website and reorganize the structure of webpage.First of all, the paper explains the basic concepts of data mining and web mining, then introduce the architecture of mining frequent path, basic knowledge and relative definition. On the basis of Apriori algorithm and graphic storage organization, a fast algorithm for mining user frequent paths has been proposed in the paper: Firstly, frequent 1 -item sets which match the criteria of certain threshold are filtered out from web access logs by session matrix, which avoids generating a great deal of intermediate items; Then we can get relative pages by clustering pages fast in similar customer group; Finally, all the relative pages are combined by trace matrix, which generates frequent paths. Experiments results show the accuracy and fast of the algorithm. The work in this paper contributes to the study and research of web mining technology, and can be of important reference value for constructing a real web mining system.
Keywords/Search Tags:session matrix, trace matrix, relative pages, user frequent paths, fast mining algorithm
PDF Full Text Request
Related items