Font Size: a A A

Technology And Implement Of General-purpose Word Segmentation System In Modern Chinese

Posted on:2003-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LuoFull Text:PDF
GTID:2168360062486141Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word segmentation is the basis of Chinese information processing (NLP). Any natural language processing system beyond character level should have a built-in word segmentation block. Disambiguity and recognition of unknown words are most important points for design of word segmentation systems. In this paper, firstly, we introduce an applied strategy to disambiguity. Then we put forward an integrated and fast recognition strategy of proper noun, including Chinese person names, Chinese place names, translated foreign names and corporation & organization names, in modern Chinese word segmentation system, which successfully resolves the conflict among these proper nouns and ordinary words. Large-scale test on real corpus show that both of these strategies have got high performance and precision in disambiguity and recognition of proper nouns. In last part of this paper, we introduce the General-purpose Word Segmentation System in Modern Chinese (GPWS) and analyse the set of criteria for the evaluating a general-purpose segmentation system in terms of its comprehensiveness, extensibility and adaptiveness, and interactiveness besides precision. We also introduce an interactive strategy to provide alternative solutions and giving applications more choices without compromise. Large-scale tests on real corpus show that interaction, between word segmentation and upper applications, has made much contribution to the reduction of error in the original system.
Keywords/Search Tags:Chinese information processing, general-purpose word segmentation, disambiguity, interactive strategy
PDF Full Text Request
Related items