Font Size: a A A

Co-occurrence Distance And Query Expansion Based Mongolian Information Retrieval System

Posted on:2012-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q XinFull Text:PDF
GTID:2178330335972274Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, Web information is growing at an exponential rate, how to retrieve information in the vast amount of information has become a very important issue. Because the user query keywords not matching with documents, the traditional information retrieval results are not ideal, which cannot meet the user's query demands. Therefore, query expansion of information retrieval technology research, by expanding the user's initial query to solve " query words not matching " problem, has important theoretical significance and practical value.This paper main work includes:(1)This paper introduces its research background, including information retrieval performance assessment standards, concept, retrieval model and so on, and summarizes the related knowledge of query expansion.(2)In order to build a Mongolian information retrieval platform with a high recall and precision, we analyze from the Mongolian aspects of word-formation and grammar characteristic, and design the treatment scheme of the Mongolian index entry Term,which includes the discrimination of Term and determination of Stemming rules. Experimental results show that Mongolian stop words table in effectively reduce the size of the index, meanwhile, it improves the retrieval precision; Stemming rules can significantly reduce the number of Term, and effectively improve retrieval of the recall.(3) We put forward a kind of word correlation calculation methods and candidate word distance relation model, through calculation query and the candidate as a distance relation between words related degree judge both a factor. We put forward a new kind of query expansion algorithm, which combines distance model and word correlation calculation method. With this algorithm, selected expansion words are relevant with the whole query, capable of representing the theme of query. Experimental results show that the algorithm can effectively suppress "inquires drifting".(4)With the result of information retrieval model as baseline system, we do an experiment on the Mongolian corpus and make an analysis and comparison of the two algorithms that are words'relevance algorithm, word distance relation model based on word correlation calculation method. Experimental results show that:three algorithms have better results than baseline system in precision as well as average accuracy, which can improve the performance of information retrieval.
Keywords/Search Tags:Query expansion, Distance model, Co-occurrence distance, Information retrieval, Word correlation calculation method
PDF Full Text Request
Related items