A Study On Chinese Noun Phrase Identification

Posted on:2016-12-26

Degree:Master

Type:Thesis

Country:China

Candidate:L Kong

Full Text:PDF

GTID:2308330473457031

Subject:Software engineering

Abstract/Summary:

Noun phrase is one of the important phrases in Chinese phrases. Noun phrase identification is a basic and important task in the field of natural language processing. Therefore, the correct identification of noun phrase has significant meaning which simplifies the structure of sentence, reduces the complexity of syntactic analysis. This paper puts forward a method of noun phrase identification based on word frequency statistics model. Moreover, a method based on conditional random field model was proposed for microblog text. Better completed the noun phrase identification task by combined the statistics-based and rules-based method.By learning the present main trend methods, this paper decides to use the method of combining the rules and statistics to identify the Chinese noun phrase. This paper presents a method for Chinese noun phrase identification:it based on word frequency statistics model, calculating the co-occurrence frequency and threshold of the noun phrase, and establishing vocabulary according to the wordâ€™s different roles in the noun phrase, adding to the unknown words processing, rule templates and error correction, analyzing the problems in results and putting forward possible solutions. In addition, we make special study on microblog text which is popular in todayâ€™s society on the basis of the analysis of the experimental results. This paper presents a method based on conditional random field model for noun phrase identification on Chinese microblog. Conditional random field model was adopted to identify the noun phrase in the microblog on the basis of standardization of the microblog text. The rule templates and stop lists of noun phrases are extracted from microblog training corpus, which will be used as the knowledge for recognition result by conditional random field model to improve the accuracy.Experiment shows that, the noun phrase identification methods are effective. Experiments carried out on the corpus based on word frequency statistics model, the average accuracy, recall rate and F value can achieve 91.28%,93.22% and 92.24%. In the experiment of microblog text identification, we select suitable feature template of microblog and add optimization of post-processing such as rule template. The average accuracy, recall rate and F value can achieve 95.01%,94.03% and 94.52%.

Keywords/Search Tags:

Natural language processing, Noun phrase identification, Word frequency statistics model, Conditional random field model, Rules processing

Related items

1	Automatic Recognition And Parsing Of Chinese Maximal-Length Noun Phrase
2	Research On Identification Of Kazakh Basic Noun Phrase Based On Maximum Entropy
3	Research And Application Of Chinese Word Segmentation Based On Conditional Random Fields
4	Automatic Identification Of Chinese Base Noun Phrase Model
5	Automatic Recognition Of Chinese Noun Phrase Based On Probabilistic Context-free Grammar
6	Research Of Chinese Word Segmentation With Conditional Random Fields
7	Qualifying The Verb Phrases And Noun Phrases In The Field Of Semantic Analysis
8	Research And Application Of Chinese Word Segmentation Based On English-Chinese Parallel Corpus
9	Automatic Identification Of Chinese Prepositional Phrase Based On CRF
10	Structural Research And System Implementation Of Medical CT Text