Research And Implementation Of Chinese Lexical Analysis Technology

Posted on:2007-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:H P Zhang

Full Text:PDF

GTID:2178360185485614

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Chinese lexical analysis is the base work in Chinese language processing. The result of lexical analysis will affect the performance of upper level application. This paper makes an intensive study of Chinese word segmentation, part of speech tagging and verb subdivision of lexical analysis and develops a practical lexical analysis system named IRLAS. Through official assessment and practical application, it proves that IRLAS is a high-precision, high-quality and high-reliablity lexical analysis system.As we all know, segmentation disambiguation and unknown word identification are two main difficulties in Chinese word segmentation. This paper adopts the word class based segmentation probability model. This model classifies words into many word classes and brings these classes into a unified frame of probability model. By choosing the segmentation path that has the maximum probability, it can eliminate most of the segmentation ambiguations. To solve the problem of unknown word identification, this paper adopts roles based tagging method. This method can make full use of the context information and transform the problem of unknown word identification to the problem of role sequence tagging. After training the role parameters of HMM, we can find out the optimal role sequence using Viterbi algorithm. By this way, we accomplish the identification of unknown word.Part of speech tagging and verb subdivision can provide richer grammatical information for upper level application. For example, parser can utilize the information of part of speech to distingulish the syntactical relationships of different types. Part of speech tagging is the typical application of HMM. This paper solves the part of speech tagging problem using HMM and reach a high precision. Verb subdivision is similar to part of speech tagging. It subdivides verbs into more detailed classes based on the result of part of speech tagging. According to the speciality of verb subdivision, this paper introduces a method of improved HMM to subdivide verbs. By comparing with the method of Maximum Entropy, it proves that this method is very effective. This paper also applies the verb subdivision system into the paser and greatly enhances the precision of...

Keywords/Search Tags:

Chinese Word Segmentation, Part Of Speech Tagging, Verb Subdivision, Hidden Markov Model, Unknown Word Identification

PDF Full Text Request

Related items

1	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
2	Study On Disambiguation Algorithm For Chinese Word Segmentation
3	The Effect Of Part Of Speech On Chinese Word Segmentation
4	Research And Implementation Of Chinese Word Segmentation Algorithm
5	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
6	Research On Laodian Participle And Part-of-speech Tagging Method
7	Chinese Lexical Analysis Method Based On Morpheme Studies
8	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging
9	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
10	The Research Of Part-of-speech Tagging Based On Hidden Markov Model