Font Size: a A A

Word Segmentation And Pos Tagging In Chinese

Posted on:2004-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:D X LiuFull Text:PDF
GTID:2168360095960168Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Word segmentation and part-of-speech tagging are bases of Natural Language Processing(NLP). The former graduated students have done a great deal of work in this field. In my task I made use of the most of their result, modified their shortcoming and improved their performance. The new modified system can supply a more strong support for the future research.In their research they put forward a new method, which adopts MM and RMM simultaneously and compares their combination degree, to deal with maximal crossing ambiguities. But this method has a shortcoming. It can only deal with a part of maximal crossing ambiguities. I divide maximal crossing ambiguities into three sorts based on statistics of maximal crossing ambiguities from a large scale Chinese corpus and adopt different methods to deal with them. This modified algorithm improves the ability to deal with maximal crossing ambiguities greatly.Identifying Chinese names is another important content in Chinese text segmentation. First I observe the regularity of name from surnames, the constant use characters of names, the constant use characters in front of names or behind names based on a large-scale real corpus. Then I design an algorithm to identify Chinese names after text segmentation. The process of identifying Chinese names starts when a surname was identified. The preliminary experiment shows that the recall rate and the accurate rate of this algorithm reach over 90%.Part-of-speech tagging is a difficult task in NLP. There is usually a change in the word form when a word changes its part-of-speech in English. But there is no change in the word form in Chinese. So part-of-speech tagging is more difficult in Chinese than in English. Inaddition to judging word attribute by normal methods, I build a rule table of judging word attribute. Each word has a corresponding object in the table. When a tagging word is in the table, its corresponding object will be extracted from the table. Then the word's attribute can be judged by using its object.My last task is transferring the program background from VC to JAVA so that the project of Natural Language Processing can be published in Internet easily.
Keywords/Search Tags:Natural Language Processing, word segmentation, max match, reverse max match, crossing ambiguity, combination degree, identifying Chinese names, part-of-speech tagging
PDF Full Text Request
Related items