Font Size: a A A

Research On Cross-Language Information Retrieval For Biomedicine

Posted on:2011-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:J NingFull Text:PDF
GTID:2178330332460766Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,people are very dependent on the Internet to obtain information resources.Because of the diversity of language in the world and language differences between different users,it is of great difficulty for users accessing to the information in different language on the Internet. Therefore,research on cross-language information retrieval will benefit a lot.Because the Latent Semantic Indexing can map synonymous words in different languages in the proximity point in the semantic space,Latent Semantic Indexing model can well solve the synonym and polysemy problems which are caused by ambiguity of words. However,Latent Semantic Indexing requires to reduce the dimension of the original word-text space,so-there is risk to select a certain dimension reduction factor k.This paper describes a bilingual biomedical space model in which both Chinese and English abstract are represented using improved Latent Semantic Indexing with combined SVD and NMF matrix factorization method. The improved LSI-based method combined with location information and anchor information which classify the programs and sentences is used to improve the function of similarity in semantic space and combine the results of different matrix factorization. A set of k-dimension models is set up,under the help of which we can achieve Bilingual cross-language indexing.The experiment gets a better result.The traditional text search engines are usually computing the text similarity to calculate relationship between different texts,but in fact,the correlation between texts is more focused on the intrinsic characteristics of the document. This paper takes advantage of a document retrieval model which is based on LDA distribution model. This model builds model for query and documents at the subject level,and considers the relationship between the texts in aspects of the text correlation to improve the retrieval accuracy.In order to compensate for the impact of noise,this article uses model averaging idea to construct several latent semantic text models and LDA-based text correlation models and to take advantage of the search results in different models.Experimental results show that the Latent Semantic Indexing model plays an important role on the smoothing of the LDA model,and the recall rate is improved.
Keywords/Search Tags:Improved Latent Semantic Indexing, Semantic Space, Bilingual Corpora, Cross-Language Indexing, LDA model
PDF Full Text Request
Related items