Font Size: a A A

Research And Implementation Of Chinese Lexical Analysis Technology

Posted on:2007-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:H P ZhangFull Text:PDF
GTID:2178360185485614Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chinese lexical analysis is the base work in Chinese language processing. The result of lexical analysis will affect the performance of upper level application. This paper makes an intensive study of Chinese word segmentation, part of speech tagging and verb subdivision of lexical analysis and develops a practical lexical analysis system named IRLAS. Through official assessment and practical application, it proves that IRLAS is a high-precision, high-quality and high-reliablity lexical analysis system.As we all know, segmentation disambiguation and unknown word identification are two main difficulties in Chinese word segmentation. This paper adopts the word class based segmentation probability model. This model classifies words into many word classes and brings these classes into a unified frame of probability model. By choosing the segmentation path that has the maximum probability, it can eliminate most of the segmentation ambiguations. To solve the problem of unknown word identification, this paper adopts roles based tagging method. This method can make full use of the context information and transform the problem of unknown word identification to the problem of role sequence tagging. After training the role parameters of HMM, we can find out the optimal role sequence using Viterbi algorithm. By this way, we accomplish the identification of unknown word.Part of speech tagging and verb subdivision can provide richer grammatical information for upper level application. For example, parser can utilize the information of part of speech to distingulish the syntactical relationships of different types. Part of speech tagging is the typical application of HMM. This paper solves the part of speech tagging problem using HMM and reach a high precision. Verb subdivision is similar to part of speech tagging. It subdivides verbs into more detailed classes based on the result of part of speech tagging. According to the speciality of verb subdivision, this paper introduces a method of improved HMM to subdivide verbs. By comparing with the method of Maximum Entropy, it proves that this method is very effective. This paper also applies the verb subdivision system into the paser and greatly enhances the precision of...
Keywords/Search Tags:Chinese Word Segmentation, Part Of Speech Tagging, Verb Subdivision, Hidden Markov Model, Unknown Word Identification
PDF Full Text Request
Related items