Font Size: a A A

Research On Query Expansion Algorithm In Information Retrieval

Posted on:2009-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:D G LiFull Text:PDF
GTID:2178360242497672Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the information of the Web is growing exponentially. How to get the needed information from this huge information has become a very important question. Because of the mismatch of the user's query words and the document's keywords the result of the traditional retrieval is not satisfactory. So through the research of the query expansion to expand the query in order to resolve the problem of "word mismatch" is worthwile both on theory and practice.The major content of the thesis is as follows:(1) Introduces the research background, including the definition of the information retrieval, performance criterion, retrieval model, and sums up the relevant knowledge of query expansion.(2) Points out the shortcomings of the present query expansion algorithms that based on association rules, neglecting the different weight of the document's keyword in the different documents of the document database, and puts forward an all-weighted association rules mining algorithm for query expansion (AWAR for short). AWAR fully considers the different weight of the document's keyword in the different document of the document database, and then uses BM25 to value each keyword. AWAR introduces all-weighted items and uses four pruning strategies. The experiment shows that it can efficiently improve the mining efficiency. At last the thesis puts forward a query expansion algorithm based on AWAR (AWARQE for short). AWARQE first uses AWAR on the first N documents to get the expanded words, and then uses the first K expanded words that have the biggest all-weighted confidence to do the query expansion. The experiment shows that AWARQE can effectly impove the performance of information retrieval.(3) Points out the problem of "query drift" in the ARFQE (a query expansion algorithm based on automatic relevance feedback, ARFQE for short), and then puts forward a query expansion algorithm based on K-means (KQE for short). KQE uses K-means to re-rank the result of the initial query to increase the ratio of the relevant documents in the first N documents. The experiment shows that KQE can efficiently restrain the "query drift".(4) Puts forward a query expansion algorithm based on association rules and cluster algorithm (ACQE for short). It first re-ranks the result of the initial query, and then uses AWAR on the first N documents of the re-ranked result to get the expanded words to do the query expansion.(5) Does an experiment on the CIRB030, and makes an analysis and comparison of the four algorithms that are ARFQE, AWARQE, KQE and ACQE. The results show that each of the three proposed algorithms outperforms ARFQE both on precision and on average precision, and can effectively improve the retrieval performance.
Keywords/Search Tags:Query Expansion, AWAR, AWARQE, KQE, ACQE, Information Retrieval
PDF Full Text Request
Related items