Computer-Oriented Analysis On The Chinese Character "å¾—" In Modern Chinese

Posted on:2008-04-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Luo

Full Text:PDF

GTID:1118360272966804

Subject:Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Automatic processing of Chinese information is taking increasing importance in the age of information. The lack of detailed and rule-oriented syntax description of Chinese language, however, has become a bottleneck in this automation process. Syntactical parsing, a focal part in natural language process, has undertaken research and development for tens of years. Nevertheless, when it comes to Chinese language, with its complexity and flexibility, complete analysis of its syntax becomes a big challenge both spatially and chronologically. Partial parsing, a new language process technique popular in recent years, focuses on chunk recognition and parsing, which, taking into account the attachment between the chunks, will constitute a complete grammar tree since each chunk is a sub-graph of this tree, although partial parsing in itself does not directly bring this result.This simplifies the syntax parsing process in a certain degree and enables the application of syntax parsing techniques in large scale authentic text processing software. This paper, Computer-oriented Analysis on the Chinese Character"å¾—"(obtain) in Modern Chinese, aims to realize automatic digital recognition of Chinese language by studying"å¾—"(obtain) structure recognition as a sub-graph of the complete grammar tree. Since our study is based on the sole purpose of computer recognition and digital information process of natural languages, every glyph of the Chinese character"å¾—", regardless of its origin, pronunciation or part of speech, falls into the row of our discussion. Our research concentrates on the following three aspects.First, distribution of the"å¾—"structure. By means of syntactically and semantically defining the character"å¾—", detailed description is given on the genre distribution of the"å¾—"structure with due analysis of its apparent inclination. Emphasis is imposed in the predicate-complement"å¾—"structure on the correlation of the parts before the"å¾—"to various genres and on the distribution of different kinds of complements after the"å¾—"in various genres with causes analyzed for this correlation.Second, combination feature of"å¾—"structure. Based on the statistics of the adjacency distribution of various"å¾—"structure combined with its adjacent restriction feature, detailed description is conducted on both right and left adjacent features and its restriction feature including implicit adjacency of"å¾—1","å¾—2","å¾—3"and"å¾—4", and their adjacency rules are found, which consequently enables the careful observation and description of the concurrence of both right and left explicit adjacency of the"å¾—". With the introduction of entropy calculation and data operation, it is further clarified that the character"å¾—"is selective to its adjacent words or phrases.Third, grammatical and semantic parsing of the predicate-complement"å¾—"structure. On the basis of the existing research results and for the convenience of computer recognition and processing, a clear definition is given, in terms of syntax and semantic selections between the syntactical parts, to the structure type of the predicate-complement"å¾—"structure, ie. predicate-complement structure for probabilities and improbabilities. The complement structural types in the predicate-complement structures for improbabilities are classified and their structure forms and grammar-semantic correlations are defined.Creative parts in this paper consist in the following:(1) For the first time, the study is conducted aiming at serving computer recognition. Based on the digital processing of natural language information, the modern Chinese character"å¾—"is comprehensively observed and investigated from the angles of genre distribution, adjacency, grammar structure and semantic relations. Formal marker is used to realize the recognition of different types of"å¾—"structure, enabling the compute to"understand"the meaning of different"å¾—".(2) Lexical and grammatical theories are employed to study the adjacent and concurrent relation between the inner parts of"å¾—"structures with quantitative statistics and qualitative parsing combined. Elaborate description is made on the explicit adjacent feature and the concurrence of the parts before and after the character, and prediction is made to the adjacent feature of its implicit feature.(3) The concept of entropy is introduced in the study of"å¾—"structure. Data operation is performed to further clarify the selectiveness of"å¾—"to its adjacent words, which is bound to be a potential data support to the further studies on the statistics of collocation probabilities.(4) Aiming at setting up for computer automatic recognition the most formal and operable linguistic criteria, the predicate-complement"å¾—"structure is clearly defined in terms of syntax and semantic selectiveness of the syntactical parts. The complement structural types in the predicate-complement structures for improbabilities are classified and their structure forms and grammar-semantic correlations are defined. (5) A large-scale authentic text corpus is built up with self-made labels, which lays a solid theoretical ground for this study, ensuring the reliability and validity of the research results. For the first time, exhaustive research is done on the applications of modern Chinese character"å¾—"in authentic language data by means of statistics based on a large scale corpus.(6) Two software systems WordParse and DataWord are developed by our own to complete the database construction and data statistics for long-term continual observation and statistics. Again for the first time, XML technique is adopted successfully in modern Chinese corpus processing and grammatical research, a novel trial in the area of computer-aided modern Chinese research.(7) This study is a successful trial in standardized syntax description of Chinese language. The results of the study and frameworks designed is very helpful and beneficial for further computer-based studies of similar areas, and at the same time provides linguistic support for the development of applicable Chinese information processing systems in the future.

Keywords/Search Tags:

"å¾—", Structure, Distribution, Adjacency, Concurrence, Statistics

PDF Full Text Request

Related items

1	Research On The Theory And Techniques Of Self-organizational Intelligent Recognition Of Scanned Engineering Drawings Based On Primitive Regions Adjacency Graph
2	Image Filtering And Detection Based On Fractional Lower Order Statistics
3	Research And Implementation Of Slicing Algorithm Of 3D Printing Based On Adjacency Topolgy For STL Model
4	The Study On Novel Time Delay Estimation Methods Based On Stable Distribution
5	Research Of Image Completion Based On Structure Information And Patch Statistics
6	Some extensions of U- and V-statistics
7	Studies On Adjacency-based Signature Verification Algorithm
8	Constant false alarm rate detection techniques based on empirical distribution function statistics
9	Data mining for induction of adjacency grammars and application to terrain pattern recognition
10	The structure of introductory statistics knowledge: An investigation exploring students' and experts' cognitive organization using multidimensional scaling and pathfinder analyses