Font Size: a A A

The Research And Realization Of Chinese Word Segmentation System Applies In Chemical Professional Search Engine

Posted on:2009-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2178360245974721Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Chinese words segmentation belongs to the category of nature language processing. It is a major part of Chinese Information Processing, and is a basic part of Chinese language understanding, literature searches, machine interpret and language synthesize system. As a crucial technology of Chinese Search Engine, it plays a very important role. Refer to Professional Search Engine, it is much more important to optimize specific field in the term of segmentation.This article bases on the research of current technology of Chinese word segmentation, realizing a Chinese Word Segmentation System of the chemical field with Java language, providing a basis for fast and accurately achieving chemical related information on the Internet.This article introduces the segmentation system with the design and realization of external interfaces, system interface, segmentor, focusing on the realization of segmentor: respectively introducing physical and logical structure design and realization of the dictionary that contains large number of chemical terminology, segment algorithm's pretreatment of to-deal text, then mechanism combined first character hash indexing with binary search, furthermore, an improved algorithm based on level-pattern shortest path method with the complementarity of the paths selection mechanism, at last, by analyzing the experiment's result of segmentation speed and accuracy, experiment' results show a desired effectiveness as well as eliminating the ambiguity to some extent. All in all, this Chinese Word Segmentation System reached the design objective, providing a good segmentation service for professional search engine.
Keywords/Search Tags:Chinese Word Segmentation, Search Engine, First Character Hash Indexing, Level-Pattern Shortest Paths, Paths Selection
PDF Full Text Request
Related items