Font Size: a A A

Chinese Word Segmentation Algorithm Based On Ontology Research And Implementation

Posted on:2013-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y DiaoFull Text:PDF
GTID:2248330371992275Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age, human beings will face large quantities ofinformation.80percent of the information stored by the computer is carried in form of text fortransmission. Facing such huge text information, how to process these information moreeffectively is becoming a new research field of computer technology. Under this circumstance,the research of Chinese information processing technology came in to being. Chinese wordsegmentation technology as the most basic aspect became the hot research spot.The Chinese word segmentation technology changes Chinese character string without splitsign to string used in practical language with the help of computer technology. Namely, it is aprocess of establishment of word boundaries in written Chinese. Chinese word segmentation hasbeen widely applied. It is attached to the field of natural language processing. It is the foundation,the first link of advanced Chinese information processing, such as syntactic analysis, semanticunderstanding. Existing Chinese word segmentation research generally can be divided into threeclasses, namely, dictionary-based Chinese word segmentation algorithm, statistical-basedChinese word segmentation algorithm and understanding-based Chinese word segmentationalgorithm. The first two algorithms are mainstreams now, and understanding-based Chineseword segmentation algorithm is the research trend of future.The research of Chinese word segmentation mainly starts from Chinese word segmentationalgorithm, the ambiguity processing technology and identification of unknown words. This paperchose word segmentation algorithm design and ambiguity as the points to expand the researchand practice.(1) Ontology was introduced into the field of Chinese word segmentation. Traditionalmechanical dictionary was replaced by Semantic ontology. An ontology-based bi-directionalmaximum matching word segmentation algorithm was designed with the combination of forwardand reverse maximum matching method. Experimental results proved that this algorithm hadsignificantly improvement of accuracy and efficiency compared with the traditionaldictionary-based Chinese word segmentation algorithm.(2) This thesis use semantic connection strength calculation to eliminate the ambiguity ofChinese word segmentation. This approach fully reflects the advantages of the semantic.(3) The framework of ontology-based Chinese word segmentation system was designed.Pretreatment to the segmentation material was carried on using the rules set in advance. In thefirst place, we simply identified the named entities of the material. Then the initial word segmentation of the segmentation material was conducted. This significantly reduces thecomplexity of the word segmentation and saves time. Finally, word segmentation system module,ambiguity processing module were designed in detail.This thesis proposes a new ontology-based Chinese word segmentation algorithm. Thedisambiguation of Chinese word segmentation was improved with the advantage of ontology.The algorithm was proved effective by the test.
Keywords/Search Tags:Chinese word segmentation, domain ontology, ambiguity process, semantic correlationcalculation
PDF Full Text Request
Related items