Font Size: a A A

Chinese Word Auto-segmentation Design And Algorithm Realization For Chinese Network Information Retrieval

Posted on:2008-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2178360212490838Subject:Chinese language text
Abstract/Summary:PDF Full Text Request
As the base of Chinese information processing, the technology of Chinese word segmentation has attracted a lot of computer experts' interests both here and abroad. At the same time, there come forth a lot of Chinese word segmentation systems. At present, three main methods have been used for Chinese word segmentation, which include character matching method, statistical method and understanding method.The author analyses and compares the various ways of nowadays Chinese automatic segmentation theoretically, summarizes the present condition and tendency of the development of the automatic segmentation generally. The way to segmentation and the anticipated functional criterion that are suited to this system are illustrated, and the concrete design of the Chinese automatic words segmentation are described, including the overall design and the design of each model. Meanwhile, some key program examples and the key conclusion about the program plan are given.The author also do some research on the Algorithm for Chinese word segmentation. Through analyzing the existed Chinese word segmentation algorithms, this paper emphasizes on the research of character matching method, use maximum and minimum match method to segment word firstly, then apply statistical method to ambiguous segmentation and the recognition of unknown words.On the basis of the researches mentioned above, we design and realize a Chinese automatic word segmentation system facing practical application. The experimental result shows: under the same condition, the improved maximum match method with Word frequency statistics has fastened segmentation speed than original algorithm. Through testing the system by Chinese Word Segmentation valuation Toolkit of Carnegie Melton University, the returned data show that the precision value of the improved maximum match method with Word frequency statistics has raised 3.75% and F-1 measure has raised 0. 0 1.These data have proved that our system has a better performance. Besides, the system has a good stability.The contents in this dissertation are practical and the selection of the concrete technology and the idea of the overall design are all decided by the practical need, the maturer technology is used to realize the combination of theory with practice.
Keywords/Search Tags:Chinese Intelligence Search Engine, Chinese information processing, Chinese automatic words segmentation, Harsh mapping, Mechanical Chinese Word Segmentation, Word frequency statistics
PDF Full Text Request
Related items