Font Size: a A A

Research On The Chinese Semantic Word-formation Patterns Predicting Models Based On Annotated Corpus

Posted on:2016-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:W N HouFull Text:PDF
GTID:2308330461977029Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Chinese information processing technology,some issues such as the ambiguity phenomenon of text and dialogue and unknown words are prominent,because computer can’t understand the semantic of words exactly.But how to make computer understand the semantic of words exactly?It must demand computer to know the semantic word-formation rules.Therefore,this article puts forward to construct predictive models of semantic word-formation patterns with a large-scale annotated corpus.This subject can significantly improve the efficiency and accuracy of the paraphrase of the unregistered or new word, ambiguities elimination, automatic lexicography, machine translation and other applications.First of all,this article proposes a solution that combining the Chinese semantic word-formation rules with the algorithm of pattern recognition which is based on consulting literature and experts in the field of Chinese language and literature and pattern recognition.Aimed to that solution,this article adopts binary Logistic regression model and naive Bayes model to predict Chinese semantic word-formation patterns by comparing some pattern recognition algorithms.Besides,before modeling,it needs mark the words which is in the original annotated corpus with their part-of-speech automatically and check by myself to improve the accuracy rate of prediction.During the research of predicting Chinese semantic word-formation patterns,this article constructs two kinds predicting models of Chinese semantic word-formation patterns based on binary Logistic regression and naive Bayes,and then simulates.At the beginning of the simulation,it makes every two types of semantic word-formation patterns in the annotated corpus into a group. And it divides the sample which is in every group into two parts,one is training sample and another is test sample. Then training two kinds of Chinese semantic word-formation patterns predicting models based on training sample.Later,using those models to predict semantic word-formation pattern of words in the testing sample.At last,it takes Kolmogorov-Smirnov to test goodness-of-fit and confusion matrix to compare the accuracy.In the simulation process,Logistic regression models are based on SAS9.1,while naive Bayes models are based on Matlab7.0.In the end, the conclusions can be reached from above simulations as follows:the goodness-of-fit and accuracy of Chinese semantic word-formation patterns predicting models based on Logistic regression are better than on naive Bayes.So Logistic regression models can make sure that computer will know meaning of words better in the future.
Keywords/Search Tags:Annotated Corpus, Chinese Semantic Word-formation Patterns, Logistic Regression Model, Naive Bayes Model
PDF Full Text Request
Related items