Font Size: a A A

Evaluation Of Collocation Identification Method Based On Co-training For Training CRF Model

Posted on:2014-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:C Q ZhangFull Text:PDF
GTID:2268330401462544Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of the semantic Web2.0which is the center of the user, the number of Internet users is also increasing. Then turn up a lot of subjective text which mainly included the view, attitudes and ideas of product, events or people and so on. These reviews are very important for the enterprise and individual. However, only depend on people to dig out available semantic information from the amount of data in network need to spend a lot of time. In order to quickly, accurately find users information, the comment text emotional tendency analysis has become an urgent task.In this thesis, through the Co-training training CRF model of characteristics to identify the evaluation object and evaluation of phrases, on the basis, identification evaluation of collocation from the Chinese comment text. This article main carry out the research from the following several aspects:(1)Training CRF model based on Co-trainingFor the CRF model, the feature selection is the most important. The characteristics of template will directly affect the performance of the final annotation model, and the numbers of labeled information also have an important influence on CRF Model. Hence this thesis presents a new method based on Co-training to train the CRF model. This thesis mainly uses the general features of CRF model, such as, word features, POS features and context features, with different proportion of initial labeled training set through the Co-training train CRF model. Lastly, the model performance tends to be stable by circuit-training.(2)Training CRF model to recognize evaluation object and evaluation of phrase based on Co-trainingIn order to identify evaluation object and evaluation of phrases in the comment text, this thesis using the above(1)training model to identify the evaluation information in the text. With the increasing proportion of labeled information, recognition effect is better and better. On car domain, the precision rate and recall rate of opinion target recognition achieve67.483%and67.832%respectively. With regard to the recognition effect of evaluate phrase:compared to the template identification experimental results, when the labeled ratio is3%, F value was higher than the experimental results of template; the experimental results are close to the standard experimental results when the marked proportion is10%.(3) Based on nearest neighbor method to identify the evaluation of collocationEvaluation of collocation is a combination of the evaluation object and the evaluation of phrases. Evaluation of collocation extraction is a basic task in the field of emotional tendency analysis. This thesis through Co-training training CRF model, then identify evaluation object and evaluation of phrases respectively, on the basis, use the nearest neighbor method for identification evaluation of collocation in the comment text. Experimental results show that the method can effectively identify the evaluation of collocation in the comment text.
Keywords/Search Tags:CRF model, Co-training, Template features, Evaluation object, Evaluation of phrases, Evaluation of collocation
PDF Full Text Request
Related items