Font Size: a A A

The Research And Implementation Of Automatic Chinese Word Segmentation System

Posted on:2011-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhouFull Text:PDF
GTID:2178360302964552Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation, is to cut the sentence in the vocabulary sub-out process. Since the writing habits of Chinese, Chinese sentence symbol between words is implied, the English words have the spaces between the words, so there is easy to separate. The Chinese word for each sentence, there is no space between words, and therefore must be some kind of technology to separate sentence. Chinese sentence segmentation algorithm from the 20th century, since the 80's has been a research focus, due to the complexity of the Chinese language has been in a stage of development.In recent years, foreign and domestic scholars in the field of Chinese word has made a lot of research work and made some research results. At present Chinese word segmentation algorithm also has its own merits, it's hard to than a high or low, therefore, Chinese word segmentation algorithm more often need to combine with the practical application. So far, the Chinese word segmentation includes three methods: 1) is based on the sub-string matching word; 2) is based on the understanding of the word; 3) Based on statistics word. These algorithms have their own advantages and disadvantages, but also unable to prove which method is more accurate, they have their own technical characteristics and the difference. Based on understanding of the word is still in its immature stage. Chinese word segmentation has become a research hotspot in natural language processing and difficult.Segmentation of natural language processing technology as the basic link, but also one of the key links, and its direct impact on the quality of the subsequent processing steps results. Chinese word segmentation the first step in natural language processing, and its importance can not be ignored.In this article, through a variety of Chinese word segmentation algorithms comparison, research, and proposed dictionary-based Chinese word segmentation and statistical algorithms. And used this algorithm to design and development of a Chinese word segmentation system that is used to prove that the algorithm efficiency and accuracy on the increase. This paper mainly describes the process is as follows:First of all, introduced the concept of a Chinese word segmentation, application areas and the challenges facing the lists are now commonly used in Chinese word segmentation algorithm and make a simple comparison.Secondly, on the basis of the existing algorithms, combined with the traditional mechanical word segmentation and statistical methods based on the respective merits of the sub-word proposed dictionary-based Chinese word segmentation and statistical algorithms. In ensuring the advantage of speed and improving the accuracy of the results.Moreover, in the dictionary and statistics based on the Chinese word segmentation algorithm based on the design and use Java Web technology to achieve the Chinese word segmentation system. The system can not only complete the work of Chinese word segmentation, different algorithms can also compare the speed between the word, you can verify the superiority of the algorithm.Finally, carried on the summary each work of the thesis, and prospected further work.
Keywords/Search Tags:Chinese Word Segmentation, segmentation dictionary, Chinese information processing
PDF Full Text Request
Related items