Font Size: a A A

Research On The Consistency Check And Auto-collation On POS Tagging Of Chinese Corpus

Posted on:2006-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2168360155956976Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Constructing high-quality and large-scale corpora has always been a fundamental research area in the field of Chinese natural language processing. In recent years, the rapid development in the fields of machine translation (MT), phonetic recognition (PR), information retrieval (IR), web text mining, and etc., is demanding more Chinese corpora of higher quality and large scale. But we notice it is really the Chinese POS tagging that reflects the value of corpus as the research resource, and when the precision of POS tagging is higher and higher, the value of corpus is higher and higher.Ensuring consistency of Part Of Speech (POS) tagging plays an important role in constructing high-quality corpora. In particular, we focus on consistency check of POS tagging of multi-tagging words, which consists of the same Chinese characters and are near synonymous, but have different grammatical functions. No matter how many different POS tags a multi-category word may be tagged, ensuring consistency of POS tagging means to assign the multi-category word with the same POS tag when it appears in the similar context. However, so far in the field of POS tagging, most of the works have focused on novel algorithms or techniques for POS tagging. There are only a limited number of studies has focused on the consistency check of POS tagging.After analyzing the POS tagging problem in the large-scale corpora, new consistency check method on consistency of POS tagging are put forward, In this paper, Firstly the method sets up the vector model of the context of multi-category word, then classifies the POS tagging sequence vectors to judge their consistency, finally presents a auto-collation method for the inconsistent POS tagging. Furthermore, we can know the consistency condition of POS tagging on every text, and improve the precision of POS tagging. The main works of this paper includes the below parts:a. After analyzing the 2M-word Chinese corpus published by Peking University and the "863" 1.5M-word Chinese corpus, we build the...
Keywords/Search Tags:the Context Model of Multi-category Word, POS Tagging, Consistency Check, Auto-collation, Multi-category Word
PDF Full Text Request
Related items