Font Size: a A A

Research And Application Of Sentence Similarity Calculation Based On Distributed System

Posted on:2015-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2298330431493645Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of natural language processing, sentence similarity computation playsan important role. In the field of Chinese information processing it has been one ofthe hot and difficult spot. Progress in research of it is directly related to the state ofrelated fields. For example: In the filed of information retrieval, machine translationand automatic question answering system, sentence similarity is the key technology.In the process of building a semantic knowledge base, it plays an important role todifferentiate similarity of words and sentences. In this paper, the main content is thecalculation the primitive, the word and sentence similarity over three levels, of whichmainly focused on sentence similarity calculation. For the current various deficienciesof research about sentence similarity calculation, propose a sentence similaritycalculation method based ondistributed system. And validate the availability of themethod through experiments. Finally, gives an example of sentence similaritycalculation based on distributed system in the semantic knowledge base processing.The main work of this paper includes:(1) Studied the meaning of the primitive similarity calculation and the wordsimilarity calculation and proposed an improved word similarity calculation methodbased on <HowNet>. For the reason that the word similarity calculation is the basicof the sentence similarity calculation and the primitive similarity calculation is thebasic of the word similarity calculation. In this paper, detailed studies were made tothe primitive similarity calculation and the word similarity calculation using theaffluent information of the <HowNet> and improved the word similarity calculationmethod.(2) Proposes a method of sentence similarity calculation based on distributedsystem. Based on the above work, this paper has improved the sentence similarityresearch. Based on Hadoop platform and using MapReduce technology to calculatethe similarity of sentences concurrently. Because the information is structured, it iseasy to be divided into a plurality of sub simultaneously. This article gives a method of cutting language information and proposed a kind of similarity calculation methodbased on a distributed system.(3) Combined with the process of building the semantic knowledge base, gives asentence similarity calculation example based on distributed system. In this paper, themethod is applied to construct the semantic knowledge base. It demonstrates theimplementation process of the sentence similarity calculation based on the distributedsystems and verified the practicality and effectiveness of the proposed method.Finally, this paper summarizes the content of this study, puts forward to thefurther work, and points out the future research directions.
Keywords/Search Tags:Natural Language Processing, Words Similarity calculation, SentenceSimilarity calculation, MapReduce
PDF Full Text Request
Related items