Font Size: a A A

The Realization Of ImpFMMseg Based On Forward Maxmium Matching Of Method

Posted on:2011-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:B L LiuFull Text:PDF
GTID:2178330338478240Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As information from the network which has complicated sources is enormous, people have already become accustomed to using search engines as an information filtering tool to find information they need. The traditional inquiries whose application to some extent simplify the inquiry process include directory inquiries, keyword inquiries, and so on. Nevertheless, they still have limitations. In recent years, Chinese and foreign experts in the field of natural language understanding have been doing efforts to better integration of natural language understanding and search technology. Natural language inquiry, compared with the traditional inquiry, has its advantages in humanization and inquiry's accuracy to the premise of efficiency.Not only it is the fundmental procedure in natural language understanding, but also Chinese word segmentation is general acknowledged to be a difficult problem in Chinese information processing. Chinese word segmentation is to process articles, paragraphs, and sentences written by natural language, then output them word by word as prerequisite for follow-up processes. It shows that Chinese word segmentation whose importance is beyond all doubt is the first procedure of natural language processing.In this paper which concludes four chapters, author researches Chinese word segmentation's background, present situations, and research meanings, analyses commonly used algorithms in Chinese word segmentation, gives the comparison of their advantages and shortcomings, outlines several commonly used Chinese character encoding standards. Finally, author develop an improved algorithm named ImpFMMseg which based on FMM included in matching algorithms based on character string. The ImpFMMseg algorithm improves accuracy and recall rate of Chinese word segmentation about 3% seperately by constructing lexicon with Trie-tree and appending four rules for processing ambiguities and contrasts segmentation's results of using these four rules invidually.
Keywords/Search Tags:Chinese word segmentation, Natural Language Processing, Forward Maximum Matching method, ImpFMMseg
PDF Full Text Request
Related items