| With the development of computers, it is the inevitable trend that nature languages are used as Human-Computer interactive languages, which demands deeper and broader nature language processing. Part-of-speech tagging is a fundamental theme in nature language processing. It is signification to the tagging of Chinese corpus-based, machine translation and information retrieval of large scale text.In this paper, we study the method of the Chinese Part-of-Speech tagging and analyze the rule method and the statistic method. The amount of contextual information and the degree of data smoothing are two important parameters to evaluate performance of statistical model of Chinese Part-of-Speech tagging. This paper describes an extension to the hidden Markov model for Chinese Part-of-Speech tagging using Second-Order approximations for both contextual and lexical probabilities, as well as the traditional Viterbi algorithm is extended. The model makes use of more contextual information than standard statistical models. A smoothing algorithm based on the linear interpolation algorithm is introduced to solve the sparse data problem of the model. The new full Second-Order HMM has been proved to improve Chinese part-of-speech tagging accuracies and disambiguation accuracies over current models. |