Font Size: a A A

Based On The Understanding Of The Chinese Word System Design And Realization

Posted on:2012-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y SuFull Text:PDF
GTID:2218330368498383Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In today's popularity of the Internet era, we face the lack of information and is no longer a question of access to information, but information on how the face of the proliferation of mass screening, processing problems. Most of these Internet information is semi-structured text or plain text form, and so naturally produce the text of the strong demand for mining information.Intelligent Processing of Chinese first condition is to realize the Chinese word processing. Chinese word segmentation problem stems from its unique language and culture different from Chinese and Western text no significant between words separated by markers. Thus was born the Chinese word technology, the research goal is to solve the Chinese text into words in a string into a combination of problems, a computer-based intelligent information processing Chinese language procedure. From the 1980s to now, Chinese word segmentation after decades of research and development has made some achievements, but dealing with ambiguity and unknown words in the handling is still not yet effective solution.Chinese word segmentation can be summarized into three: First, based on string matching word; two, based on statistical probability of word; Third, based on understanding of the word. Three kinds of segmentation methods have advantages and disadvantages, which the first and second sub-word models have been developed relatively mature and stable, and developed a lot of very good algorithm, the third sub-word model is still in its infancy is the most potential the sub-word method. Three kinds of sub-word models as used in the different technical principles and models which have their own shortcomings and deficiencies, but it has three modes have complementary functions.The main objective of this study is to solve the ambiguity problem, especially pseudo-ambiguous solution to the problem. Article discusses the causes of ambiguity, ambiguous definitions, and the causes for the ambiguity ambiguity solution is given in an attempt to build a resolve to achieve the pseudo-ambiguous Chinese word segmentation system. Meanwhile, the article also discusses the solution to unknown word, for there is a pattern of the unknown word identification program using the model to deal with.This article focuses on the ambiguity of the solution design, implementation, and focuses on the semantic information such as how to use formal models and semantics, knowledge and other information word. This paper introduces the dictionary matching, pattern processing, semantic integration of word verification system model and the overall design of data representation and storage implementation, and operation of the system involved in the algorithm gives a detailed introduction. On this basis, this database using C # language and development platform such as design models and programs for system development and implementation.Finally, the article summarizes the experiences and achievements of the program also pointed out the deficiencies and areas for improvement and development of the system next target..
Keywords/Search Tags:Chinese word Segmentation, word Segmentation, Semantic-based word Segmentation, ambiguity string
PDF Full Text Request
Related items