Font Size: a A A

Improved Block Editor Naxi Sentence Similarity Calculation

Posted on:2014-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:H H ZhangFull Text:PDF
GTID:2268330401473426Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Similarity computation in such aspect as Naxi word, chunk and sentence is an essential task and plays a crucial role in certain areas of study. It is widely used in Naxi information processing. Considering the characteristic of Naxi language that verbs set back, and nouns and verbs appear in chunks, this paper focuses on Naxi font expansion, corpus annotation, and the similarity computation of Naxi words, chunks and sentence. The Naxi-Chinese bilingual sentence retrieval prototype system using the above theory is constructed. Specifically, in this paper, we have completed the following innovative in the similarity computation of Naxi sentence:1. The extension of the Naxi font, and the Naxi-Chinese bilingual dictionary with the word alignment corpus are accomplished. It adds over2000new words in the dictionary, and constructs satisfied around6000the Naxi-Chinese bilingual Dictionary, and builds about30000sentence pairs in the Naxi-Chinese bilingual corpus. With the Chinese corpus marked and specified, the bilingual corpus is processed.2. The similarity computation method of Naxi chunk is proposed.According to the characteristics of Naxi language that verbs set back, and nouns and verbs appear in chunks, Naxi NP and VP chunk is defined and chunk rule is extracted. According to the rules of the Naxi sentence chunking, NP and VP chunks can be extracted. Then, by using the Naxi-Chinese dictionary and the Chinese word similarity, Naxi words semantic similarity is calculated. Then the distribution method using the weight of the respective components, and the type of center chunk party words and their modifiers give different weights. Similarity of chunks is calculated by combining of the Chinese word similarity.3. The chunk of changed version of similarity computation is proposed.According to the characteristics of Naxi language, chunks similarity is defined as the replacement cost of chunk that edits operation, and Naxi sentence similarity is computed according to replacement cost. Finally, the edited distance is normalized processed, and the Naxi sentences similarity is computed. This method can include both sentence structure and semantic information into the case of having no complicated syntax paring. 4. Naxi-Chinese bilingual sentence retrieval prototype system is designed.
Keywords/Search Tags:Corpus, Naxi, Sentence similarity, Chunk, Edit-distance
PDF Full Text Request
Related items