Font Size: a A A

Research On User Interest Clustering Based On Web Log

Posted on:2009-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:F ChenFull Text:PDF
GTID:2178360245471699Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularization of the internet, the contradiction between rapid growth of the information and the people's limited attention is unceasingly increasing, but the web log mining is an effective means to solve it.The behavior and the characteristic that the users visit the websites are concealed in the web log, and analyzing the web log can obtain users' interest patterns using clustering technologies, thus it will help us to improve the website's structure, to recommend personalized service and to promote e-commerce.Traditional clustering technologies don't take into account users' interest adequately, therefore the clustering result is not ideal.The dissertation mines users' access interest patterns according to the conception of path-interest, and clusters users' access paths using the improved clustering algorithm, the experimentation has indicated that the effect of the clustering is good.The main contents of the dissertation are as follows:Firstly, the dissertation describes and analyzes the data preprocessing technology in web log mining, and proposes the SFT algorithm that can change users' access sequences into other transactions, which improves the speed of data preprocessing and guarantees its precision.Secondly, with the disadvantage of expression of users' access interest, an interest patterns mining's algorithm named IPS is proposed, which can measure users' access interest well by three indexes of the visit interest, the access time interest and the support, and has compared the IPS algorithm and the MFS algorithm from veracity and execute time.The experimentation has indicated that the IPS algorithm is superior to the IPS algorithm.Finally, on account of the clustering algorithms at present lack the users' visit orders, a user interest clustering algorithm named UIC is proposed, which takes into page orders account in the round, and defines path-similarity, and establishes users' browsing path similarity matrix, thus we can obtain the clustering result sets, which will provide us with the personal service, the e-commerce, and so on.
Keywords/Search Tags:Web Log Mining, Data Preprocessing, Access Paths, Interest Clustering
PDF Full Text Request
Related items