Font Size: a A A

Reverse Backtracking Research Of Chinese Segmentation Based On Last Word Dictionary

Posted on:2011-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiangFull Text:PDF
GTID:2178330332476463Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Chinese word segmentation is the first step in Chinese information processing, Chinese information processing performance is decided by its precision and efficiency. Chinese information processing technology is applied in the search engine, text proofreading, speech recognition, machine translation, Chinese word segmentation as its basis, promoting the development of Chinese information processing technology to meet the needs of various applications is of great practical significance.The article introduces the meaning, status and the current difficulties in Chinese word segmentation technology and carries on the essential analysis to the present several kinds of commonly used Chinese word segmentation technology. With the support of "Semantic Web-based semi-structured information extraction technology and its application" in Hubei Province Department of Education research project, It focuses on Chinese word segmentation based on string matching, on the one hand, a last word dictionary based Hash structure is designed to record words length, achieve to save storage space, help to reduce the matching process occur in the number of invalid matches, then increase the efficiency of the segmentation; on the other hand focuses on the core algorithm based on this dictionary designed to solve the existence problems of ambiguity in Chinese word segmentation. According to the characteristic of modern Chinese statement, the reverse maximum matching methods are often higher than the maximum matching word segmentation method in precision, the reverse backtracking algorithm is designed. The possible ambiguities of the segmentation are eliminated by the backtracking mechanism of improved algorithm.To verify, it uses Access to complete a small dictionary with Hash structure, using Delphi to achieve the reverse backtracking maximum matching algorithm, and take the achievement as experiment system, selected the People's Daily on April 16 Yushu in Qinghai Province earthquake relief project report of the three original articles for the corpus to the test, and compared with forward maximum matching and reverse maximum matching, the experimental results confirm that the design to improve the effect of segmentation efficiency and eliminate ambiguities is obvious. Meanwhile, the test results are analyzed in detail, the system can most effectively eliminate the overlapping ambiguities, but the effects of combination ambiguities were not obvious.
Keywords/Search Tags:Chinese word segmentation, segmentation algorithm, the last word dictionary, hash structure, eliminate ambiguity
PDF Full Text Request
Related items