Font Size: a A A

Study Of Several Technology Of Chinese Word Automatic Segmentation

Posted on:2008-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:B J ChaiFull Text:PDF
GTID:2178360212495386Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
More and more Chinese information are now available in machine readable form due to the rapid development of communication networks and inexpensive massive storage. Because there are no separators between Chinese words, automatic word segmentation plays the fundamental role in Chinese information processing.This paper studed these questions, method of Chinese word segmentation, mechanism of Chinese word segmentation, ambiguous word segmentation and special word identification. Some improvement is presented in some technology and an compare experiment have been carried out. follow research has been done:(1)The ambiguous meaning emerging in the process of word segmentation is mainly made up of special ambiguous meaning caused by computer word segmentation, ambiguous duality meaning caused by natural language and ambiguous meaning caused by the magnitude of .word segmentation library. The ambiguous fields can be classified into three aspects. From the acknowledge hiberarchy needed by the segmentation of the ambiguous field, it can be sorted to ambiguity of syntax, the ambiguity of language's meaning and the ambiguity of language's application.(2)In Chinese information system, the use of noun is the most frequent. Especially, it is very difficult to deal with special noun in Chinese automatic word segmentation. First, this paper analyses the character of the surname and firstname in Chinese name, and bring forward the automatic identification technology of Chinese name. Second, this paper takes the repository and rule library to identify the placename by the deduce mechanism. Thirdly, this paper uses college name as a example of organization name identification. According to the characters of grammar, the meaning of language and organization, itbrings forward the rule of college name identification. In additional, this paper analyses the relation between the organization name, people name and placename in brief.(3)There are three typical Dictionary Mechanism, Dichotomy of entire word, TRIE tree and Dichotomy of single Chinese character. PATRICIA-tree- based dictionary mechanism is high-effect. Some important properties of PATRICIA Tree are generalized, especially average route length from root to different length keyword of sub-PATRICIA Tree in initial-Hash-PATRICIA Tree.
Keywords/Search Tags:Automatic Word Segmentation, Dictionary Mechanism, PATRICIA Tree, Ambiguous Word Segmentation, Name Identification, Organization Name Identification, Placename Identification
PDF Full Text Request
Related items