| Part of speech (POS) tagging is the problem of assigning POS or lexical categories to all the words in a text. It is the basic work in Natural Language Processing (NLP), and its tagging precision greatly affects the later step of syntax analysis or chunk analysis. The errors occurred in POS tagging will always propagate through the processing chain, so tagging POS correctly has great significance in NLP. The main goal of this thesis is to implement Chinese POS tagging task based on word segmentation, and provide the basis for later syntactic parsing and other NLP tasks.In this thesis, we first introduce the current research status of POS tagging and its significance, then implement Chinese POS tagging system based on Maximum Entropy (ME) on the basis of deep understanding of ME theory, and at last, statistical rules and POS confinement are used for tagging unlogged words.Different context information is introduced to ME model by using different templates, four ME POS tagging models are built, and the template with the highest tagging precision is selected as the final template. In order to simplify the model, three feature selection methods are used to simplify ME model's candidate features. In order to further improve the POS tagging precision, the method of combining rules, POS confinement and ME is adopted. This thesis presents the algorithm of ME tagging model and its result, moreover, the result of further unlogged words tagging is given.POS tagging is comparatively complex. Since ME can make full use of different context of a word on different levels to solve complex problems, so we used ME for POS tagging, and have achieved good results.The experimental results show that using ME for Chinese POS tagging is effective: the open test rate is 94.96%, and the test rate for unclogged words tagging is 63.32%.The POS tagging approaches introduced in this thesis can be used in actual MT system, which can provide basis for further NLP tasks. Moreover, the research of this thesis can be applied to other NLP tasks, such as information retrieval, text classification and so on. |