Research On Technology Of Cross-language Similarity Evaluation Based On Deep Learning

Posted on:2019-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zuo

Full Text:PDF

GTID:2428330548487374

Subject:Engineering

Abstract/Summary:

Traditional cross-language similarity evaluation techniques usually rely on theories of linguistics and pragmatics,which are also inevitably related to the features of natural languages.In recent years,the rise of deep learning has continuously promoted the development of many artificial intelligence research fields,such as image recognition,speech recognition,and natural language processing.This paper aims at studying the application of deep learning technology to the cross-linguistic text similarity calculation in Chinese and English,which mainly includes the study of word level and sentence level.The study of word level lies in learning bilingual word representations and constructing bilingual word embedding model by regarding words as text units.Base on this model,bilingual-shared word embedding representations will be produced.The semantic similarity between words can be measured by calculating the spatial distance between vectors.Based on the theory of word embedding and Skip-Gram model,this paper firstly conducts word embedding training on artificially constructed pseudo-bilingual corpus.Secondly,in order to make the words embedding space as complete as possible,this paper also makes use of monolingual corpus as a supplement to learn additional word embedding knowledge.Based on the embedding model,this paper also tries to construct three algorithms by combining the partof-speech information,the topic information and the TF-IDF information with the bilingual word representation respectively.All these three algorithms can be used in cross-language text similarity calculation.Through the combinations,it can overcome the shortcomings of the original method in text semantic representation.The study of sentence level is to use sentence as a text unit.By combining the semantic information of words with the context information of each word,the whole sentence is represented as a vector for the computation of the similarity between language texts.In this regard,this paper proposes a sentence-level based crosslanguage similarity evaluation framework SCLSE.The framework is expressed by the word embedding as the underlying vector representation.It will be used to learn the semantic representations of sentence by integrating a variety of neural network structures.Finally,the similarity score of the sentences is output.By segmenting short texts into paragraphs and regarding paragraphs as long sequences,this paper also conducts the iterative calculation of similarity on a larger scale.According to the above two research points,different contrast experiments are set up to verify the validity and application value of the bilingual word embedding model and the SCLSE framework in the cross-language text similarity evaluation tasks under different text unit granularity.

Keywords/Search Tags:

deep learning, cross-language similarity, text unit, bilingual word embedding, semantic representation

Related items

1	Applied Research Of Chinese-Korean Cross-Language Text Similarity Calculation
2	Research On Cross-language Document Sorting Learning Method Based On Bilingual Document Similarity
3	Chinese-Old Bilingual Text And Sentence Similarity Calculation Research
4	Research On Chinese-korean Cross-lingual Text Classification Method Based On Bilingual Topical Word Embedding Model
5	Bilingual Word Representation Learning From Non-parallel Corpora
6	Research On Cross-language Information Extraction Based On Deep Learning
7	Research On The Application Of Chinese-Burmese Bilingual Sentence-level Embedding Semantic Representation Method Based On Neural Network
8	Research On Similarity Comparison Of Cross Language Texts Based On Multi-language Embedding
9	Deep Neural Networks For Text Representation And Application
10	Research On The Calculation Method Of Han-Thai Bilingual News Text Similarity With News Elements