Research On Cross-language Document Sorting Learning Method Based On Bilingual Document Similarity

Posted on:2018-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:J Huang

Full Text:PDF

GTID:2358330518460441

Subject:Computer software and theory

Abstract/Summary:

Cross-language information retrieval is the focus of current research,and it plays an important role in cross-language document analysis and cross-language news acquisition.The current research on cross-language information retrieval mainly focuses on the method of query translation and document translation,which is very dependent on statistical machine translation,and it is difficult to obtain the training corpus and usually gets low precision of translation.At present,the research of information retrieval based on learning to rank is focused on learning to rank monolingual document,learning to rank cross-language document has not been paid much attention.In this paper,a cross-language document learning to rank model based on bilingual document similarity is proposed.The ranking function is trained by machine learning method and the cross-language document is sorted by the similarity factors of bilingual documents.In this paper,we solve the following two problems while constructing the learning to rank cross-language document model:1.A method of calculating the similarity between bilingual documents is proposed:a method of calculating the similarity of bilingual documents based on bilingual word embedding is proposed for the problem that the document of different languages is difficult to be expressed in a unifying feature space.The bilingual document is extracted from the keywords,and then the bilingual documents are mapped to the same semantic space,and the distance between these keywords is used to express the similarity between bilingual documents.The experimental results show that the method can be used to calculate the similarity between bilingual documents.2.A cross-language document sorting learning model based on bilingual document similarity is constructed:the problem of the order loss cannot be accurately expressed by the point-wise and pair-wise learning to rank loss function.In this paper,the probability distribution cross-entropy is used as loss function and a ranking function based on the artificial neural network is used as ranking function to construct the learning to rank model.A method of concatenating the characteristics of similarity of bilingual documents is proposed to rank the cross language documents.The similarity of bilingual documents is used as the basis for ranking and scoring the target language documents.Experimental results show that the cross-language document learning to rank model proposed in this paper get good performance in English-Chinese corpus and English-Vietnamese corpus.

Keywords/Search Tags:

information retrieval, bilingual document similarity, cross-language document ranking, word embedding, list-wise learning to rank

Related items

1	Applied Research Of Chinese-Korean Cross-Language Text Similarity Calculation
2	Scientific Research Document Retrieval And Recommendation System Based On Doc2Vec
3	Research On Technology Of Cross-language Similarity Evaluation Based On Deep Learning
4	An English Scientific Document Retrieval Method Based On Formula Description Structure And Word Embedding
5	Knowledge Transfer For Cross Domain Learning To Rank
6	Building Comparable Corpora Based On Cross-language Text Similarity Metrics
7	A Research Of Document Representation And Bilingual Word Embeddings
8	Chinese Word Semantic Similarity Measure And Its Application In Cross-language Information Retrieval
9	Research On Similarity Comparison Of Cross Language Texts Based On Multi-language Embedding
10	An Extended Research On Information Retrieval Model Based On Document Relation