Font Size: a A A

Research Of Chinese Word Segment Based On Shortest Paths Of S-EK Figure

Posted on:2012-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y HanFull Text:PDF
GTID:2218330338955966Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Chinese word segmentation is the basis for processing Chinese information. It has become quite indispensable in natural language understanding, language research, Chinese text automatic indexing, information retrieval, machine translation, and so forth. Therefore, it is essentially important and necessary to do some research on the Chinese word segmentation.However, the level of research on the Chinese word segmentation has already been far behind the level of related technologies. And its research has encountered the following problems:the problem on linguistics, the appearance of new words, hard to differentiate from ambiguities, various standards on word segmentation, problems related to computer, the absence of a reasonable formal model of natural language, no valid way to understanding the semantic and formalizing it, etc. These problems will burden the development of the Chinese word segmentation. Based on comprehensive analysis of the existing skills about the Chinese word segmentation, this article focuses on the compounded study of directed graph and the Chinese word segmentation. Its main contents are as follows:Firstly, it has reviewed the main algorithm of the Chinese word segmentation and gives a summary about it. The article compares and analyzes three commonly used word segmentation algorithm, based on string matching, statistics, and the understanding of knowledge. And it compares the advantages and disadvantages among them and gives a comment. Furthermore, it puts forward the standards of evaluating the Chinese word segmentation and the significance on these.Next, in-depth compounded study of directed graph and the Chinese word segmentation is the pivot of this article. Algorithm on the directed graph of N-shortest-path to the Chinese word segmentation has been improved. It puts forward S-EK figure and figures out the present probability of a word in certain context by using the N-statistical-model, and then makes the results smooth.Finally, it proposes a kind of s-rough shortest path algorithm based on the S-EK figure. Comparing the algorithm based on the chart of S-EK with the two other algorithms N-shortest-path and Dijkstra, both experience and theoretical derivation have demonstrated that it has certain advantages and meaning.
Keywords/Search Tags:Chinese Word Segmentation, Information Processing, S-EK Figure, Shortest Path, Statistical Model
PDF Full Text Request
Related items