Font Size: a A A

A Study On Chinese Noun Phrase Identification

Posted on:2016-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:L KongFull Text:PDF
GTID:2308330473457031Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Noun phrase is one of the important phrases in Chinese phrases. Noun phrase identification is a basic and important task in the field of natural language processing. Therefore, the correct identification of noun phrase has significant meaning which simplifies the structure of sentence, reduces the complexity of syntactic analysis. This paper puts forward a method of noun phrase identification based on word frequency statistics model. Moreover, a method based on conditional random field model was proposed for microblog text. Better completed the noun phrase identification task by combined the statistics-based and rules-based method.By learning the present main trend methods, this paper decides to use the method of combining the rules and statistics to identify the Chinese noun phrase. This paper presents a method for Chinese noun phrase identification:it based on word frequency statistics model, calculating the co-occurrence frequency and threshold of the noun phrase, and establishing vocabulary according to the word’s different roles in the noun phrase, adding to the unknown words processing, rule templates and error correction, analyzing the problems in results and putting forward possible solutions. In addition, we make special study on microblog text which is popular in today’s society on the basis of the analysis of the experimental results. This paper presents a method based on conditional random field model for noun phrase identification on Chinese microblog. Conditional random field model was adopted to identify the noun phrase in the microblog on the basis of standardization of the microblog text. The rule templates and stop lists of noun phrases are extracted from microblog training corpus, which will be used as the knowledge for recognition result by conditional random field model to improve the accuracy.Experiment shows that, the noun phrase identification methods are effective. Experiments carried out on the corpus based on word frequency statistics model, the average accuracy, recall rate and F value can achieve 91.28%,93.22% and 92.24%. In the experiment of microblog text identification, we select suitable feature template of microblog and add optimization of post-processing such as rule template. The average accuracy, recall rate and F value can achieve 95.01%,94.03% and 94.52%.
Keywords/Search Tags:Natural language processing, Noun phrase identification, Word frequency statistics model, Conditional random field model, Rules processing
PDF Full Text Request
Related items