Font Size: a A A

Query Expansion Research And Implementation Based On Clustering Documents’ Position

Posted on:2012-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:H C YuFull Text:PDF
GTID:2298330467471713Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Web technologies, search engines hava become the main way to access web information. However, people don’t care the number of results, they pay more attention on the relevance between web pages and their requirements when they are searching information on search engins. Besides, user’s queries tend to be short, search engines return a lot of irrelevant results. Query expansion is one of the core technology, which solve the problems of information overload, information isotropic and word mismatch of various expressions. In this paper, our rearch on query expansion research and implementation based on clustering documents’position has important theoretical significance and practical value.In this thesis, we introduce the knowledge of query expansion and the background including the concept of information retrieval, performance evaluation standard, the models of information retrieval, and so on, pseudo relevance feedback has a big problem that it seriously depends on the first retrieval documents. Our paper firstly proposes the improved feature extraction algorithm and improved KNN algorithm. We extract feature items on the distance of documents frequency characteristic extraction, calculate feature weights on improved TF-IDF-Dis algorithm. We try our best to transform the feedback documents to vectors which are relevant to queries. In this way, we filter the noise documents and find out the dominant documents.After that, we present the hypothesis that words closer to query words are more likely to be related to the query topic. In this paper, we analysis the positional relation between queries and words in feedback documents, fatorize the probability formula of extracting the query expansion to the probability formula which is relevant to the positions of documents, construct Gaussian kernel function as the word’s distance function, assign more weights to items which are closer to query words and select the words of higher weights as the query expansions.The experiment result shows that our rearch on query expansion research based on clustering documents’ position achieves a better result and improve retrieval system’s the average accuracy.
Keywords/Search Tags:pseudo relevance feedback, feature extraction, K neighbor algorithm, positionalrelevance feedback model, distance function
PDF Full Text Request
Related items