Font Size: a A A

Based On Log Analysis, Information Retrieval Techniques And Realization

Posted on:2010-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:H R ChenFull Text:PDF
GTID:2208360275483586Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid development of Internet, search engines play an important role in our daily life, through solving the information retrieval and filter problems. Meanwhile, while faced with masses of documents, search engine can not provide the most relevant results, which lead to low efficency for the users.At first of all, we elaborated the significance of query expansion, after the analysis of the usage of query strings. With that this paper modeling the the log mining process, and apply the results of log mining into the query expansion. Query expansion method proposed by this paper is aimed to improve quality of results returned by search engine, especially when the query strings are short and ambiguous in its meaning. The main content can be divided into three parts as follows:1,Log analysis model based on search process. Taken the time span of HTTP sessions, there may exist servral search topics in one session. Thus we divide the HTTP session according to the similarity of each query string. And based on real query log of search engines, we proved the improvement of such a log analysis model through experiments.2,Search on query expansion method. We analyzed and compared the the mainly used query expansion methods. We assume that the query string is the presentation of users'intention, and the terms indexed by search engine represent the understanding of documents by search engine. So we associate the two terms based on their frequence in the corresponding documents. Meanwhile, we analyze the weight of different words, which are used to expand initial query string. Eventually, performance improvement of this method is proved through the comparison of serval methods.3,The design and implementation of prototype system. Based on the open-source project Nutch, adding the dictionary of query expansion words, and improving the word analyzer, we construct our prototype system. The dictionary is used by expansion module, and the analyzer is used for Chinese words splitting. At last, we compared our system and Nutch in the performance of Chinses words splitting and precision measured by log items order by frequence. Based on log mining, we conduct statistic on users'implicit feedback, and we associate terms submitted by users and terms of ducuments indexed by search engine, then we apply them in the query expansion, in order to improve searching performance. And eventually we prove the performance improvement of our system.
Keywords/Search Tags:Log Mining, User's Behavior, Session Splitting, Query Expansion, Nutch
PDF Full Text Request
Related items