Research On Cross-Language Information Retrieval For Biomedicine

Posted on:2011-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Ning

Full Text:PDF

GTID:2178330332460766

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet,people are very dependent on the Internet to obtain information resources.Because of the diversity of language in the world and language differences between different users,it is of great difficulty for users accessing to the information in different language on the Internet. Therefore,research on cross-language information retrieval will benefit a lot.Because the Latent Semantic Indexing can map synonymous words in different languages in the proximity point in the semantic space,Latent Semantic Indexing model can well solve the synonym and polysemy problems which are caused by ambiguity of words. However,Latent Semantic Indexing requires to reduce the dimension of the original word-text space,so-there is risk to select a certain dimension reduction factor k.This paper describes a bilingual biomedical space model in which both Chinese and English abstract are represented using improved Latent Semantic Indexing with combined SVD and NMF matrix factorization method. The improved LSI-based method combined with location information and anchor information which classify the programs and sentences is used to improve the function of similarity in semantic space and combine the results of different matrix factorization. A set of k-dimension models is set up,under the help of which we can achieve Bilingual cross-language indexing.The experiment gets a better result.The traditional text search engines are usually computing the text similarity to calculate relationship between different texts,but in fact,the correlation between texts is more focused on the intrinsic characteristics of the document. This paper takes advantage of a document retrieval model which is based on LDA distribution model. This model builds model for query and documents at the subject level,and considers the relationship between the texts in aspects of the text correlation to improve the retrieval accuracy.In order to compensate for the impact of noise,this article uses model averaging idea to construct several latent semantic text models and LDA-based text correlation models and to take advantage of the search results in different models.Experimental results show that the Latent Semantic Indexing model plays an important role on the smoothing of the LDA model,and the recall rate is improved.

Keywords/Search Tags:

Improved Latent Semantic Indexing, Semantic Space, Bilingual Corpora, Cross-Language Indexing, LDA model

PDF Full Text Request

Related items

1	Research And Improvement Of Latent Semantic Indexing Classification Model
2	The Research Of Optimization Technology In Latent Semantic Indexing Based On Pseudo Text
3	Research On Text Classification Based On Ontology And Latent Semantic Indexing Algorithm
4	Web Text Mining Based On Latent Semantic Indexing
5	A Latent Semantic Indexing Differences Model And Its Application
6	The Application Of Latent Semantic Indexing To Retrieve The Plane Failure Case
7	Design And Implementation Of Multilingual Information Retrieval System Based On Latent Semantic Analysis
8	Research Of Chinese-Text Retrieval Based On Latent Semantic Indexing
9	Research On Document Clustering Technology Based On Latent Semantic Indexing
10	Objectionable Information Filtering System Based On ATN Algorithm And Latent Semantic Indexing