Font Size: a A A

Research On Chinesese Segmentation Method Based On Optimization Maximum Matching

Posted on:2010-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:C H LiuFull Text:PDF
GTID:2178360302959029Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Chinese information in the Internet is gradually increasing and Chinese information processing applications are increasingly extended, Chinese automatic segmentation is the precondition of Chinese information processing. There are some difference between Chinese and English, Chinese document is a continuous character stream, there is not obviously syncopate mark between word and word, so segmentation is the most important problem of Chinese information processing. Chinese document automatic segmentation has become a cutting-edge topic of Chinese information processing. This paper give a systematic investigation of Chinese document segmentation technique.Firstly, the basic theories of the technology are introduced, some issues facing by Chinese segmentation technique are proposed, exciting methods of Chinese segmentation and Maximum Matching Method are analyzed, the strong points and the weaknesses of the existing methods are pointed.Secondly, in order to improve segmentation speed, according to the mind of Maximum Matching Method, an optimization Maximum Matching Method is proposed. In the process of segmentation, exciting Maximum Matching Method which all character of string character comparing to word in the dictionary is exchanged, the last-word is compared with string character of syncopate, so that this string is decided rapidly. Because optimization Maximum Matching Method is segmentation method based on dictionary, a perfect dictionary is built and a segmentation algorithm of optimization Maximum Matching Method is proposed.Thirdly, ambiguous field is usually arisen in the process of segmentation, this paper propose a method which can improve information quantity statistics between word and word in Chinese. Due to ambiguity accounted for 85% of ambiguous field, this paper give a research on how to deal with ambiguity, information quantity statistic which syncopate ambiguous fields is proposed.Finally, a prototype of the segmentation system for Chinese based on optimization Maximum Matching Method is implemented by object-oriented method. The architecture of the system is described and the basic functions of every module are given. The validity and efficiency of this system are validated by experiments.
Keywords/Search Tags:Chinese Information Processing, Chinese Word Segmentation, dictionary, Optimization Maximum Matching, Information quantity statistics
PDF Full Text Request
Related items