Font Size: a A A

The Research And Realization Of Prototype System Based On Web Log Mining

Posted on:2012-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:H D RenFull Text:PDF
GTID:2178330335953194Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In an era of Internet information explosion, the users usually acquire information by means of using search engine. However, ignoring the knowledge background and interest of the users, the existing information retrieval system gives the same results to the same query input by the different users, and makes the users into a trek state of information resource. Therefore, this leads to a new research direction for the information retrieval field----the study on personalized information retrieval.The precondition for providing the personalized retrieval is to accurately identify the users and reasonably establish their knowledge and interest background. The Web log contains a lot of user logs. The users'knowledge and interest background can be established through mining the related information to identify the single user and analyzing the users'browsing behaviors to enrich the users'characteristics. Combining with the users'knowledge and interest background, the personalized retrieval system can give the corresponding results to the same query input by the different users to realize the personalized retrieval, enhance the recall ratio and the precision ratio, and improve the user satisfaction.This thesis focuses on establishing the users'knowledge and interest background by means of Web log mining technology and realizing the personalized retrieval prototype system. The main contents are as followings:This thesis mainly discusses the data cleaning technology of Web log data preprocessing stage and gives an introduction on the main several steps of data preprocessing. With regard that the TF/IDF algorithm based on the word frequency ignores the correlation between the user's knowledge and interest and the documents, combining with analyzing the users'browsing behaviors and the users'implicit feedback information in Web log, this thesis proposes the Page Correlation Weight. And considering that the TF calculation ignores the importance of the entry's position in the page, this thesis puts forward the Eiv that is the important factor of the entry. Then, combining with the Page Correlation Weight, the important factor of the entry and the TF/IDF algorithm based on the word frequency, this thesis presents the Partial Weighted TF/IDF Algorithm. Furthermore, this thesis establishes the users'knowledge and interest background, makes use of Rocchio feedback algorithm to update and do real-time analysis on the users'knowledge and interest background, and realizes the personalized retrieval prototype system----Easy Searcher.Finally, the whole thesis is summarized and the prospect on the further development of personalized retrieval is made.
Keywords/Search Tags:Web Ming, Personalized, Web Log, TF/IDF, Data Preprocessing, Users' Knowledge and Interest Background
PDF Full Text Request
Related items