Font Size: a A A

Research On The Calculation Method Of Chinese-Lao Bilingual Text Similarity

Posted on:2022-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:X D LiFull Text:PDF
GTID:2518306524952269Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The study of Chinese-Lao text similarity calculation is of great significance to the study of Lao natural language processing and the communication and development between China and Laos.Lao is a resource-scarce language.Chinese and Lao have similar sentence structure features.By integrating language features into the model,more semantic information can be obtained from limited training data to improve the performance of the similarity calculation model.In order to obtain more accurate semantic representation of bilingual text,According to the composition of the text,it is divided into paragraph short text and sentence.By studying the semantic representation methods under different granularity,high-quality sentence semantic representation and paragraph short text semantic representation are obtained,and finally a more accurate semantic representation of Chinese-Lao bilingual texts is constructed,and the similarity calculation of Chinese-Lao bilingual texts is completed.This research has certain practical application value and theoretical research significance.This paper mainly completes the following research work.(1)Method for Computing the Text Similarity Between Chinese and Laotian Sentence Based on the Structural Feature of SentencesThis paper proposes a method to calculate the similarity between Chinese and Lao bilingual sentences,which combines the structural characteristics of sentences.The purpose is to study the method of obtaining high-quality semantic representation of Chinese and Lao bilingual sentences and to construct a sentence similarity calculation model,which can provide technical and theoretical support for the subsequent research.The Chinese and Lao bilingual sentences have similar sentence structure features.By constructing feature templates,we can obtain the corresponding sentence structure features of Chinese and Lao languages,obtain more semantic information,map the bilingual word vector to shared semantic space to reduce the differences between languages,and finally build a similarity calculation model of Chinese and Lao bilingual sentences,and verify the effectiveness of the method through experiments.The experimental results show that the effect of the proposed method is better than that of the existing method.(2)Similarity Computing Method of Multi-task Short Texts in Both Chinese and Lao Combined with Part of Speech and Position CharacteristicsThis paper proposes a multi-task Chinese-Lao bilingual short text similarity calculation method fusing part-of-speech location features.The purpose is to combine the sentence semantic representation method and language feature acquisition method in(1)with the research of paragraph short text similarity calculation.Construct a method for obtaining high-quality short text semantic representation of paragraphs.In view of the characteristics of short texts,short text sentences are expressed by weights of part-of-speech position features and TF-IDF weights,and the semantic representation of bilingual short texts is weighted.At the same time,the similarity of core sentences of bilingual short texts is calculated as an auxiliary task through multitask learning method.More semantic information will finally build a similarity calculation model for Chinese and Lao bilingual short texts.The experimental results show that the proposed method is more effective than the existing methods.(3)A Weak Interactive Chinese-Lao Bilingual Text Similarity Computation Method Fusing Multi-FeaturesOn the basis of completing the above research,a weakly interactive Chinese-Lao bilingual text similarity calculation method with multi-features is proposed.By segmenting the bilingual text into short paragraphs,the core sentences of the Lao paragraphs and the sentence structure features Attention interaction is performed on the paragraph sentence of the paragraph,weighted to obtain the semantic representation of the Lao paragraph,and then it is Attention interaction with the semantic representation of the Chinese text obtained through the Chinese BERT model to obtain more semantic matching information,and the weighted semantic representation of the Lao text is finally calculated.Obtain the similarity score of the Chinese-Lao bilingual text.The experimental results show that,compared with the existing methods,the proposed method achieves better results.
Keywords/Search Tags:Chinese-Lao, text, language features, matching information, similarity computation
PDF Full Text Request
Related items