Word Segmentation And Pos Tagging In Chinese

Posted on:2004-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:D X Liu

Full Text:PDF

GTID:2168360095960168

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Word segmentation and part-of-speech tagging are bases of Natural Language Processing(NLP). The former graduated students have done a great deal of work in this field. In my task I made use of the most of their result, modified their shortcoming and improved their performance. The new modified system can supply a more strong support for the future research.In their research they put forward a new method, which adopts MM and RMM simultaneously and compares their combination degree, to deal with maximal crossing ambiguities. But this method has a shortcoming. It can only deal with a part of maximal crossing ambiguities. I divide maximal crossing ambiguities into three sorts based on statistics of maximal crossing ambiguities from a large scale Chinese corpus and adopt different methods to deal with them. This modified algorithm improves the ability to deal with maximal crossing ambiguities greatly.Identifying Chinese names is another important content in Chinese text segmentation. First I observe the regularity of name from surnames, the constant use characters of names, the constant use characters in front of names or behind names based on a large-scale real corpus. Then I design an algorithm to identify Chinese names after text segmentation. The process of identifying Chinese names starts when a surname was identified. The preliminary experiment shows that the recall rate and the accurate rate of this algorithm reach over 90%.Part-of-speech tagging is a difficult task in NLP. There is usually a change in the word form when a word changes its part-of-speech in English. But there is no change in the word form in Chinese. So part-of-speech tagging is more difficult in Chinese than in English. Inaddition to judging word attribute by normal methods, I build a rule table of judging word attribute. Each word has a corresponding object in the table. When a tagging word is in the table, its corresponding object will be extracted from the table. Then the word's attribute can be judged by using its object.My last task is transferring the program background from VC to JAVA so that the project of Natural Language Processing can be published in Internet easily.

Keywords/Search Tags:

Natural Language Processing, word segmentation, max match, reverse max match, crossing ambiguity, combination degree, identifying Chinese names, part-of-speech tagging

PDF Full Text Request

Related items

1	Study On Disambiguation Algorithm For Chinese Word Segmentation
2	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
3	Chinese Word Found Its Part Of Speech Tagging
4	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
5	Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging
6	Study On The Automatic Chinese Word Segmentation With Chinese Names Recognation Function
7	The Study And Analysis Of Oracle Bone Inscriptions Based On Statistical Natural Language Processing
8	Chemical Dictionary Of Structural Design And Development Of Chinese Word Segmentation System
9	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
10	Research On Mongolian Lexical Analysis Based On Combination Of Statistical And Rule Approaches