Query Expansion Research And Implementation Based On Clustering Documentsâ€™ Position

Posted on:2012-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:H C Yu

Full Text:PDF

GTID:2298330467471713

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Web technologies, search engines hava become the main way to access web information. However, people donâ€™t care the number of results, they pay more attention on the relevance between web pages and their requirements when they are searching information on search engins. Besides, userâ€™s queries tend to be short, search engines return a lot of irrelevant results. Query expansion is one of the core technology, which solve the problems of information overload, information isotropic and word mismatch of various expressions. In this paper, our rearch on query expansion research and implementation based on clustering documentsâ€™position has important theoretical significance and practical value.In this thesis, we introduce the knowledge of query expansion and the background including the concept of information retrieval, performance evaluation standard, the models of information retrieval, and so on, pseudo relevance feedback has a big problem that it seriously depends on the first retrieval documents. Our paper firstly proposes the improved feature extraction algorithm and improved KNN algorithm. We extract feature items on the distance of documents frequency characteristic extraction, calculate feature weights on improved TF-IDF-Dis algorithm. We try our best to transform the feedback documents to vectors which are relevant to queries. In this way, we filter the noise documents and find out the dominant documents.After that, we present the hypothesis that words closer to query words are more likely to be related to the query topic. In this paper, we analysis the positional relation between queries and words in feedback documents, fatorize the probability formula of extracting the query expansion to the probability formula which is relevant to the positions of documents, construct Gaussian kernel function as the wordâ€™s distance function, assign more weights to items which are closer to query words and select the words of higher weights as the query expansions.The experiment result shows that our rearch on query expansion research based on clustering documentsâ€™ position achieves a better result and improve retrieval systemâ€™s the average accuracy.

Keywords/Search Tags:

pseudo relevance feedback, feature extraction, K neighbor algorithm, positionalrelevance feedback model, distance function

PDF Full Text Request

Related items

1	Cross Language Information Retrieval Based On Topical Pseudo Relevance Feedback
2	The Research On Image Retrieval Technology Based Relevance Feedback
3	Research And Implementation Of Feature Extraction And Relevant Feedback Of 3D Model
4	Research On Content-based Image Retrieval And Relevance Feedback Technology
5	Research On Pre-trained BERT Based Pseudo-relevance Feedback Method
6	Research On Pseudo Relevance Feedback Based On Document Similarity
7	The Study And Implementation For The Research On Feedback Technique Of Content-based Medical Image Retrieval
8	3D Model Retrieval System Based On Relevance Feedback And Clustering Analysis Technology: Research And Implement
9	Studies On Affinity Propagation Based Pseudo-Relevance Feedback And Document Expansion For Spoken Document Retrieval
10	Points Of Interest Based On A Digital Image Retrieval And Relevance Feedback