Font Size: a A A

Chinese-Old Bilingual Text And Sentence Similarity Calculation Research

Posted on:2019-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:W J HuoFull Text:PDF
GTID:2438330563457655Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text similarity computation and sentence similarity computation are very important in the field of Natural Language Processing.They are widely applied in information retrieval,text mining,question answering system and so on.In Laos,the research of text similarity calculation and sentence similarity calculation is still in the beginning stage,which is still weak.With the increasingly frequent and frequent exchanges between China and Laos,the information processing of Laos is also particularly important and urgent in the economic and cultural exchanges between the two countries.Therefore,in order to better comply with the development of politics,economy and culture,it is necessary and indispensable to study the similarity computation between Chinese and Lao bilingual texts and sentences.This article combined the special grammatical and syntactic features of Laos with Chinese and the problem of Laotian language is scarce at present.This paper mainly studies the method of similarity calculation between Chinese and old bilingual texts and sentences.The main research results are as follows:(1)Structure an experimental class of Laotian semantic dictionary.Because there is no Laotian version of Word Net downloaded on the Internet,our lab combined has designed a simple Laotian semantic dictionary with Laos students.Align the Word Net developed by Princeton University.The Laos word is the basic unit,and the object is divided into nouns,verbs,adjectives and adverbs.They are organized into a synonym network.After two years,the experimental class of Laotian semantic dictionary already has some scale.(2)Chinese-Lao Cross-Language Test Similarity Computing Based on Semantic dictionaryThe text similarity proposed in this paper is only to calculate the semantic similarity of the bilingual text,that is,the similarity of the subject content.This method mainly uses the Chinese concept dictionary and the characteristics of the experimental level Laos semantic dictionary which is built by our laboratory to align the Princeton Word Net.the experimental class of Laotian semantic dictionary is obtained through the above experiments.First,the Chinese and Laos word segmentation tools are used to preprocess the Chinese and Lao texts,and the Chinese and Laos Chinese participles are filtered into nouns.Then,we use the semantic distance between words to disambiguate noun sequences,and then use the Word Net to map the disordered bilingual noun sequences into numerical space.The sequence of nouns is converted to an integer independent with semantics.Finally,the similarity of Chinese-Lao text is calculated by using the Dice coefficient.The experimental results show that this method improves the accuracy of similarity calculation in bilingual text to some extent.(3)Chinese-Lao Cross-Language Sentence Similarity Computing Based on Relation vector modelFrom the previous research point,the bilingual text of semantic similarity is obtained,the sentences are extracted from these texts,and the similarity of the bilingual sentences is calculated.This paper presents a relationship vector model that considers the structure of bilingual sentences and semantic information based on semantic dictionary in space vector model.A Chinese sentence and a Laotian sentence are first segmented,and then select the key words.In calculating sentence similarity,only these key words are considered,and these keywords are mapped to numerical spaces and converted into integers that are semantically independent.Finally,the cross-language sentence similarity calculation is carried out.This model considers the collocation between the key words and the synonym information of the key words,which can well reflect the structure and semantic information of the sentences.The experimental results show that the relationship vector model improves the accuracy of the computation of the similarity of bilingual sentences to some extent.
Keywords/Search Tags:Cross language text similarity, Semantic dictionary, numerical space, cross language sentence similarity, relation vector model, word segmentation, Lao
PDF Full Text Request
Related items