Users Frequent Path Fast Web Log Mining Algorithm

Posted on:2006-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Du

Full Text:PDF

GTID:2208360152491886

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of computer technology and the popularization of internet, the WWW data stored in the server expand fast. Web mining applies data mining techniques to large scale web data to reveal the hiding patterns about user browsing behavior, The research on Web mining has a lot of application. The Web log contains the visit information of all users, especially the path information. The analysis of this kind of information is useful for the website designer to know the users' tendency and custom. The designer can use the result of analysis to optimize the structure of website and reorganize the structure of webpage.First of all, the paper explains the basic concepts of data mining and web mining, then introduce the architecture of mining frequent path, basic knowledge and relative definition. On the basis of Apriori algorithm and graphic storage organization, a fast algorithm for mining user frequent paths has been proposed in the paper: Firstly, frequent 1 -item sets which match the criteria of certain threshold are filtered out from web access logs by session matrix, which avoids generating a great deal of intermediate items; Then we can get relative pages by clustering pages fast in similar customer group; Finally, all the relative pages are combined by trace matrix, which generates frequent paths. Experiments results show the accuracy and fast of the algorithm. The work in this paper contributes to the study and research of web mining technology, and can be of important reference value for constructing a real web mining system.

Keywords/Search Tags:

session matrix, trace matrix, relative pages, user frequent paths, fast mining algorithm

PDF Full Text Request

Related items

1	Anonymous User Navigation Path Mining Research And Implementation
2	Research On Correlative Algorithms Of Association Rule Mining
3	Fast Algorithms For Trace-Ratio And Nonnegative Matrix Factorization Problems On Dimensionality Reduction Of High Dimensional And Large Sample Data
4	Weighted Frequent Itemsets Mining Algorithm Based On Matrix Compression And Time Decay
5	Distance Education System Based On Web Log Mining
6	Based On The Matrix Of Weighted Association Rules Mining Algorithm
7	Research On Fast Matching Algorithm Based On Image Features And Gray Values
8	Research On Frequent Itemsets Mining Algorithm Based On Matrix
9	The Research Of Web Usage Mining Algorithm Based On Web Log
10	Research And Implement Of A Matrix Based Paralleled Frequent Itemset Mining Algorithm