Font Size: a A A

Research And Implementation Of Chinese Word Segmentation Algorithm Based On K Shortest Paths

Posted on:2010-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z F LiFull Text:PDF
GTID:2178360272479383Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation is at the lowest level of language levels including morphology, syntax and semantic.It is a foundational work of Chinese natural language processing, and a basic subject of Chinese information processing field. It is a very important component of technologies such as search engine, automatic translation ,speech recognition, information retrieval, automatic classification, automatic abstracting, text automatic proofreading, and data mining. Chinese word segmentation is a great technical bottleneck, which can directly affect the development of Chinese information processing technology.This thesis mainly described the background and significance of Chinese word segmentation, analyzed the research status at home and abroad of Chinese word segmentation, explained the three typical Chinese word segmentation methods, including string matching, statistics, and comprehension based on character, sumarized the advantage and disadvantage of every method in brief, and analyzed the related algorithm model of several Chinese word segmentations and the main technical problems, which contained the word segmentation regulation, ambiguity recognition, unknown word recognition of the development of Chinese word segmentation.After that, this thesis analyzed and researched the Model of Chinese Words Rough Segmentation Based on N-Shortest-Paths Method of ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) by Chinese Academy of Sciences.It discribed the feature and storage structure of the directed graph generated by ICTCLAS.On this basis, it put forward S-EK shortest route algorithm, analyzed its time complexity, and offered a simple example, which illustrated the specific process of solving K shortest route by S-EK shortest route algorithm.Finally, it realized the experimental results by S-EK shortest route algorithm, by comparing S-EK shortest route algorithm, N-shortest route algorithm and Dijkstra algorithm at the aspect of time complexity, and gave the specific solving process of S-EK shortest route algorithm by the example.This algorithm gets a plan of the K shorest route of the The whole digraph by the partial solution of the impornant node sets.It reduced the time complexity of the algorithm, and verified it through simulation experiments.
Keywords/Search Tags:word segmentation, ambiguity recognition, N-shortest route, S-EK shortest route
PDF Full Text Request
Related items