Automatic Recognition Research On Syntactic Category Of Common Words

Posted on:2013-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:J Xia

Full Text:PDF

GTID:2248330371977228

Subject:Software engineering

Abstract/Summary:

As we all known,The quality of Chinese corpus plays a decisive role in the study of natural language processing. For scholars,the corpus of high quality is more and more significant.So,it is necessary to consider the multi-category words in study of modern chinese.Although,the number of multi-category words in modern chinese is small,multi-category phenomenon is very complicated.The multi-category problem of words is a common exisitence problem,which leads to great difficulties for Part of Speech Tagging.One of the most critical problem in Chinese Speech Tagging is correctly identify the multi-category.In the process of researching the function word usage Automatic identification, further realize the importance of correctly identify the multi-category. This paper mainly research the reconginition on syntactic category of words based on statistical methods(Based on conditional random field model,the maximum entropy model and K-nearest neighbor algorithm of statistical methods).The results of those experiments show that Based on statistical methods can better identify the speech of multi-category words,and have good recognition performance on commonly used multi-category words,what is more,it has achieved a higher accuracy in the corpus, however,not all multi-category words can achieve a recognition results.There are some words that are not suitable for statistical method.In view of this an isolated phenomenon,We can choose rule based methods to identify.On the basis of the statistical results,Writer choose rule based methods deal with some multi-category words that are not suitable for statistical method.According to the different characteristics of different parts of speech, extract some operable characteristics to determine and make use of BNF paradigm to describle the part of speech about multi-category. Firstly, according to the characteristics of the multi-category words and the characteristics in context to build a set of rules.Secondly, To test the rules in the corpus with Annotation tools in order to find out the problem existing in the rules.Modify the rules continually and repeatly test at the same time in order to improve the recognition accuracy rate of rules.The results of those experiments show that applying rules methods for the words that are not suitable for statistical method can get a better recognition accuracy.Finally, this papers have a summary about this study and prospects which is to study in the next step.

Keywords/Search Tags:

Chinese information processing, multi-category word, ConditionalRandomFields, MaximumEntropy, k-nearest neighbor

Related items

1	Based Segmentation Of Chinese Text Automatic Classification And Implementation
2	Study On Generalized Nearest Neighbor Pattern Classification
3	Chinese Word Auto-segmentation Design And Algorithm Realization For Chinese Network Information Retrieval
4	Improved Word Embedding And K-nearest Neighbor Algorithm For Chinese Text Classification
5	Research On Continuous Nearest Neighbor Query
6	Research On The Visual Group K-Nearest Neighbor And Group Inverse K-Nearest Neighbor Query Of Multi-Source Objects In Three-Dimensional Space
7	Research On Optimal-Nearest-Neighbor And Reverse Visible Nearest Neighbor Queries
8	Research On The Multi-Type Nearest Neighbor Query In Spatio-Temporal Database
9	Nearest Neighborhood-Based Rare Category Mining
10	Chinese Word Meaning Acquisition On Visual Information Based On Hellinger Distance