Font Size: a A A

Chinese And English Mixed Segmentation Method And Applied Research

Posted on:2010-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z X TianFull Text:PDF
GTID:2178360275965805Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, computer has been widely used ever. It develops from the data processing to knowledge processing. Since the early-1980s, Chinese information processing has proposed the automatic segmentation, many experts and scholars in this field have made great progress. The algorithm also has a wide range of applications in information retrieval, automatic archiving and other areas. The link between China and the world has been more closely due to the rapid development of China's economy, however, we unavoidably use the experience of other countries for reference.Such information's form unavoidably must be used Chinese and foreign language mixed to express our thought, especially Chinese and English mixed form. This set a higher request to the information management system,.At present, the research of Chinese and English mixed word segmentation is relatively few, and it has not formed a quite mature theory. The Chinese and English mixed word segmentation standard and the appraisal system have not been established. Based on this, The paper has studied the new features of Chinese and English mixed form and proposed a new algorithm.This paper has mainly studied the Chinese and English mixed form, the structure and the use custom. It aslo presents a practical segmentation algorithm of Chinaese and English mixed. The removing ambiguity is one of the difficulties of segmentation.This article has done the thorough analytical study and proposed the implementation method. for continuing removing ambiguity. To solve the biggest word length, a method which compared the length of the first two-character string beginning Hash dictionary of the waiting string with the length of the text to determine the maximum word length of RMM has been proposed.The experiment indicated that using this article proposed method can split the words of Chinese and English mixed effectively. The method can not only keep a higher level of removing Ambiguity, but also do well in unknown word identification. Finally it arrived at the goal of article automatic sorting based on the algorithm participle result.
Keywords/Search Tags:Chinese and English mixed word segmentation, Hash, RMM, Removing Ambiguity, Unknown word
PDF Full Text Request
Related items