Font Size: a A A

Research On The Extension Of Cross-language Information Retrieval Query Based On Topic Model

Posted on:2018-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:L GaoFull Text:PDF
GTID:2358330515982170Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The development of internationalization of the Internet makes the language of the Internet more and more rich,in addition to access to the documents expressed by the mother tongue,people's desire to obtain other language information is also increasingly urgent.Cross-language information retrieval can help users find multilingual pages,even some has nothing to do with the language resources(such as images),which greatly enrich the search results and meet the diverse needs of users,so it has gradually become the current main research topic.Among them,the query expansion can increase the content of many semantically related for query type which is relatively short,and push mining useful information to the user from the mass of information,thereby it reduces the burden on the user's search and improves the user satisfaction for multilingual information in the condition of translation quality is limited.China is a big family composed of 56 nationalities.With the popularization of education in minority areas and the improvement of the quality of the users,the research on the information retrieval of minority languages has been driven by the encouragement of the government.But compared to Chinese or other languages,minority web information space is far from enough,when users search for small language website,at the same time,they also hope to be able to retrieve the relevant Chinese website and even other foreign websites to obtain more comprehensive knowledge.Therefore,in the process of promoting the cross-language information retrieval heatwave and in minority areas the Internet are becoming more and more popular,we continue to improve the minority information retrieval technology and reduce the multi language and Chinese language acquisition less barriers,continue to eliminate language barriers,which become the most urgent the problem.This paper stands from the angle of the subject,mining the semantic level information based on the topic granularity level,selecte the thematic type related queries for expansion,which overcomes the disadvantages of traditional information retrieval that rely solely on the initial ranking of documents and use method such as TF-IDF weight to extend the disadvantage of word.The main research content of this article is:Firstly,this paper proposes a method of using LDA to model the source language or target language documents independently,so as to extend the query method.This method mainly uses the topic model independently on the source language document or target language document,in different stages of query translation to model the theme of the source language or target language document respectively,in the theme level to extract topic words as extended words,multiple dimensions such as selection strategies,different stages and expansion modes were investigated.The experiment results show that the method can obviously strengthen the semantic information of the query words in source or target corpus corpus,weaken the impact of irrelevant information caused by.Secondly,this paper proposes to use LDA to model the source language and the target language corpus,so as to extend the query method.Different from the independent topic model,this method assumes that the bilingual corpus is comparable to some extent.Therefore,they may share some semantic information.The unified topic modeling is not phased and it uses the topic model to extract the "shared" topic information for query expansion.Experiments show that the performance of the query expansion method based on the unified topic model can be improved effectively.
Keywords/Search Tags:Query expansion, Topic model, LDA, minority language
PDF Full Text Request
Related items