Font Size: a A A

Automatic Text Segmentation And Algorithm

Posted on:2011-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:R R XuFull Text:PDF
GTID:2178360332455319Subject:Information Science
Abstract/Summary:PDF Full Text Request
Chinese text automatic word segmentation technology is an important link of Chinese information processing, Chinese text automatic word segmentation algorithm for Chinese word segmentation are closely related to the performance of the system. The current word segmentation algorithm can be divided into three kind, they are string-based matching word segmentation algorithm, statistics-based word segmentation algorithm and based on understanding of the word algorithm.The key and difficult of Chinese text automatic word segmentation technology is ambiguity recognition and identification of unknown words. This paper to these two problem formulation reason, the existence rule do introduced in detail that proposed the positive participle countermeasure and the suggestion. To the main participle algorithm, like the biggest matching algorithm, approaches the matching algorithm, the statistical participle algorithm, the expert system method as well as the neuron network method and so on, from the different meanings recognition, has not registered word technical principles and so on recognition precision, algorithm order of complexity to carry on thoroughly carefully the analysis and the research.In this thesis, a careful study of the maximum matching algorithm based on the improved algorithm is proposed. In this method, pre treat all text, using the natural division of the text symbols automatic segmentation, effective support for Chinese, English, digital mixed segmentation to improve the efficiency of sub-word; and then use a positive match, matching, and backtracking Last words match effectively discover ambiguities, the principle of balance through long-term priorities and ways of cluster 2, the intersection of ambiguity field to segment, identify the problem effectively solve the ambiguity. Through the experiment, the improved algorithm of time complexity and precision segmentation has improved greatly.
Keywords/Search Tags:Maximum matching method, forward matching method, Backtracking matching method, last words matching method
PDF Full Text Request
Related items