Font Size: a A A

Research And Implementation Of A Chinese Word Segmentation Services System Based On Grid

Posted on:2007-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:C Z GuoFull Text:PDF
GTID:2178360185978456Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation is a basic research issue on Chinese NLP areas such as information retrieval, machine translation, text correction, and so on. However, it baffles the development of Chinese NLP because of Chinese word criterion, Chinese word ambiguity and unregistered words. So it will be of great significance to have a deep investigation on Chinese word segmentation. Meanwhile, the online Chinese word segmentation systems only provide test function and have such defects as processing small-scale text, inconvenient usage, no interface for program calling, etc.Grid is a novel technology following the Internet and WWW in recent years, which can offer distributed parallel environment. The grid services can be combined with each other, thus it can raise the rate of code reuse. Therefore, it is meaningful to develop an integrated word-segmentation services system oriented to users and programmers based on the Grid, so as to provide a favorable condition for the researches of NLP.Firstly, this thesis discusses the requirements of Chinese word segmentation. Based on that, seven algorithms are proposed with Grid characteristic. And in order to make the least algorithm modification, the C WS Core of GT4 is investigated and the principle and approaches of realizing Grid Services in C language are discussed. And then the seven algorithms mentioned above are encapsulated as Grid Services. Those services could be combined in different ways to satisfy multi-requirements. At present Java language is dominant in developing Grid Services based on GT4. However, a great number of applications are implemented in C language on the existing platform. Therefore, it has a certain value on transplanting application programs implemented in C language into Grid platform.Secondly, after having researches on the GRAM of Globus, Condor and PVM, this thesis proposes a novel parallel computing mechanism by integrating Condor-PVM with GT4 for the purpose of realizing parallel computing in C language in Grid context. And the experiment results show that it does speed the processing large-scale texts.Finally, the Chinese word segmentation services system based on Grid is designed and developed. Through the system Grid portal, users could choose the service types of...
Keywords/Search Tags:Chinese word segmentation, Grid service, OGSA, Condor, PVM
PDF Full Text Request
Related items