| In this paper we design a new algorithm. This algorithm combines two methods: dictionary method and statistic method. Firstly, we divide the whole text into shorter sentences according to the punctuations in this text. Secondly, we take positive and reverse maximal matching to segment every sentence of this text. Thirdly we compare the segmentation results of the positive and reverse maximal matching to decide whether there are ambiguousnesses and then if the results are different, it indicates that there are ambiguousnesses in this part, and we eliminate these ambiguousnesses by using the statistic method and regulations based on the formers' research results. During the process above, we give the higher priority to the results of the reverse maximal matching because of its' higher veracity.This paper includes two betterments. In the first place, we increase the number of dictionaries, we add some special dictionaries which can be used to eliminate the ambiguousnesses and recognize new words during the segmentation process besides the basic dictionary. At the same time we reconstructed the data structure of basic dictionary in the memory of computer by using data structure "Hashtable", we choose the first two single characters of every word in the basic dictionary as the keywords of the main and sub Hashtables, the remanent words are stored in an array according to length. With these data structures, whenever our program meets a word, the program will be able to locate the word straightly and quickly in dictionary. So the cost on scanning the dictionary will be knocked down largely and the speed of matching will be quicked up by a large degree. In the second place, we do some improvements on the method of statistic. Our algorithm uses statistic method to solve special nouns, new characters and eliminate ambiguousnesses. The biggest shortcoming of statistic is that it needs that the characters to be solved appear at least more than one time during the segmentation process, so the veracity of statistic is not so good when the sentence is short. Because of these reasons, we use some rules got from analysis of Chinese language to remedy the shortcoming of statistic. Based on these ways above, we designed a new algorithm which combines language rules and statistic results to eliminate ambiguousnesses and recognize new words. It includes many rules based on statistic of Chinese language and some knowledge of language and we also consider the important influence of language environment and context on eliminating ambiguousnesses. After all these above have been done, ambiguousnesses can be eliminated effectively and efficiently, and the veracity of segmentation is improved in some conditions. |