Font Size: a A A

Research On Cross-language Document Sorting Learning Method Based On Bilingual Document Similarity

Posted on:2018-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:J HuangFull Text:PDF
GTID:2358330518460441Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cross-language information retrieval is the focus of current research,and it plays an important role in cross-language document analysis and cross-language news acquisition.The current research on cross-language information retrieval mainly focuses on the method of query translation and document translation,which is very dependent on statistical machine translation,and it is difficult to obtain the training corpus and usually gets low precision of translation.At present,the research of information retrieval based on learning to rank is focused on learning to rank monolingual document,learning to rank cross-language document has not been paid much attention.In this paper,a cross-language document learning to rank model based on bilingual document similarity is proposed.The ranking function is trained by machine learning method and the cross-language document is sorted by the similarity factors of bilingual documents.In this paper,we solve the following two problems while constructing the learning to rank cross-language document model:1.A method of calculating the similarity between bilingual documents is proposed:a method of calculating the similarity of bilingual documents based on bilingual word embedding is proposed for the problem that the document of different languages is difficult to be expressed in a unifying feature space.The bilingual document is extracted from the keywords,and then the bilingual documents are mapped to the same semantic space,and the distance between these keywords is used to express the similarity between bilingual documents.The experimental results show that the method can be used to calculate the similarity between bilingual documents.2.A cross-language document sorting learning model based on bilingual document similarity is constructed:the problem of the order loss cannot be accurately expressed by the point-wise and pair-wise learning to rank loss function.In this paper,the probability distribution cross-entropy is used as loss function and a ranking function based on the artificial neural network is used as ranking function to construct the learning to rank model.A method of concatenating the characteristics of similarity of bilingual documents is proposed to rank the cross language documents.The similarity of bilingual documents is used as the basis for ranking and scoring the target language documents.Experimental results show that the cross-language document learning to rank model proposed in this paper get good performance in English-Chinese corpus and English-Vietnamese corpus.
Keywords/Search Tags:information retrieval, bilingual document similarity, cross-language document ranking, word embedding, list-wise learning to rank
PDF Full Text Request
Related items