Font Size: a A A

Research Of XML Information Retrieval Based On Pseudo-relevance Feedback

Posted on:2016-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q ShiFull Text:PDF
GTID:2348330488982001Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In an era of rapid development of Internet technology, filled with a variety of data on the network, the amount of information in complex data, people put forward higher requirements about the quality of the information data.How to effectively get the high quality related document, is a very important subject in information retrieval.However, because users are difficult to accurately describe the search query intention, and the query expression is too short, in this case, some scholars put forward by using query expansion to expand the query expression is used to improve the user's query intention, thus improve the retrieval performance of information retrieval system. In the proposed query expansion method based on feedback technology, only pseudo-relevance feedback technology don't require the participation of users, so has great applicability.This paper focuses on two problems in pseudo-relevance feedback technology, including the determination of relevant documents and extension of query term. The main research includes:(1)We study the clustering XML search results based on pseudo-relevance feedback model. Clustering XML search results is an effective way to improve performance.And,the key factor affecting the quality of the clustering. is how to measure distance between XML documents.In view of term weighting algorithms,TF-IDF,about clustering search results which is unreasonable to make use of linear and unable to emphasize the significance of key term which contribute mainly to the content of a text,a new weighting design based on frequency factor and length factor was proposed.LSI is performed to discover a new low-dimensional semantic space,in which the semantic relationship between features is strengthened while the noisy features in the original space are eliminated,and has improved speed and preciseness.Experiment results on IEEE unclassified corpus show that,compared with similar similarity calculation methods and clustering methods,the method in this paper has increased the speed and effectiveness.(2)We study the pseudo-relevance documents searching and the XML query expansion.In clustering XML search results, we performed the pseudo-relevance documents searching and the XML query expansion.we proposed a two stages sorting model based on the cluster labels and the documents in the clusters.We get N pseudo-relevance documents for query expansion based on this sorting model.Then we choose the right word in the pseudo-relevance documents as the query expansion words, and these words will return to submit to the information retrieval system with initial query. A series of experiments prove that the proposed approach has better performance than which method is without query expansion and the traditional pseudo feedback query expansion.
Keywords/Search Tags:Information Retrieval, Latent semantic indexing(LSI), Clustering search results, Weighting algorithms, Clustering algorithms, Query expansion
PDF Full Text Request
Related items