Font Size: a A A

Chinese Collocation Extraction Based On R-Value

Posted on:2013-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:X C LangFull Text:PDF
GTID:2248330371966970Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The collocation has important influence for people understanding the sentence and generating language accurately and fluently. It plays an important role to improve the performances of many natural language processing tasks, such as machine translation, question answering system, etc.Considering the compactness, combination and substitution of collocation, we divide collocation to three categories, and propose a statistical value, named R value for class 1 and class 2 collocation. By examining the irreplaceable part of one collocation, R value can be used to reflect its probability for being class 1 or class 2. In this paper, we use R value together with other 41 statistics to test the performance of single statistic collocation extraction on the artificial established COLLOCATIONCIST2011 dataset. We also employ some indicators, like Recall, Precision and MAP to assess and compare the performances of extraction methods. The results of experiments show R value could bring good performance for extracting class 1 and class 2 collocation. With the analyses of several representative statistics, each statistic has its advantages and disadvantages. Based on this found, we integrate multiple statistics to extract collocation.Multiple statistics method regards each statistic as a feature for judging the collocation and build the connection between unite feature and the result of whether they can form a collocation by machine learning methods. We adopt Support Vector Machine (SVM) and artificial neural network (ANN) model to synthesize multiple statistics with 42 statistics where R value is involved. The experimental results indicate that multiple statics extraction method promotes the performance significantly than single statistic method. Meanwhile, R value makes a great contribution for multiple statistics method since its effectiveness makes up of other statistics’shortages.
Keywords/Search Tags:Collocation, R value, irreplaceable, multiple statistics
PDF Full Text Request
Related items