Font Size: a A A

The Stduy Of Formal Description Of Chinese Character Glyph And Application

Posted on:2010-03-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:M LinFull Text:PDF
GTID:1118360275951145Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the field of Chinese characters information processing, the present approaches to the formal description of Chinese character glyph are mostly base on structure analysis method used for describing the topography of Chinese characters in the research on Chinese characters and teaching of Chinese, where strategic descriptions are adopted by applying the human perceptive units, viz. glyph formation units such as types of structure, components and strokes. These methods result in ambiguities and description deficiency with regard to glyph resolution, structure classification, and selection of descriptive elements, therefore they can not meet the need to describe any possible glyph skeletons (including wrongly written characters, variant forms of characters in ancient literatures, and combined-characters), nor can they support automatic computation of glyph comparison, let alone to meet the practical need based on glyph comparison and analysis, such as the description of wrongly written characters or the quantitative analysis of misused characters in the teaching and research of Chinese characters, the description and analysis of variant forms of characters in ancient literatures, or the retrieval of rare character glyphs in the electronic books and so on.For special Chinese characters the glyph samples of which can not be collected in advance, such as wrongly written ones, variant forms in ancient literatures, and combined-characters, since no sample training can be done, comparative computation of the glyph cannot be supported and the recognition and identification of them cannot be guaranteed. It would also be difficult for the glyph features generated by statistics, which are adopted by recognition models, to logically resolve and map to the structure types of characters, components and strokes derived from human cognition. They are rather blackbox-like, and they do not meet the demand to human-oriented comparison and analysis of different types of glyph.With regard to the core issue of the lack of universally accepted effective means of the formal description and automatic glyph comparison computation of Chinese character glyph, this paper, oriented from the application of comparison and analysis of Chinese character glyphs, offers a new approach to describing them and provides a set of algorithms of related character glyphs comparison and some practical tools. The main innovative includes:1) A method is offered formally describe Chinese characters by a stroke-segment-mesh, which uses a line-segment of pre-defined length and direction as a glyph description element (stroke segment). Since it is equipped with suitable granular degree, free of ambiguity, and standardized, it can describe the glyph skeleton of all Chinese characters (including wrongly written characters, variant forms of characters in ancient literatures, and combined-characters). Experiments show that, compared with dot-matrix glyph, which have the same amount of element, the number of effective elements reduces a great deal in the stroke-segment-mesh glyph description, and yet a higher efficiency is achieved. What's more, the accuracy and reliability of computation are improved thanks to a higher discrepancy degree between different Chinese character stroke-segment-mesh glyphs.2) Based on stroke-segment-mesh Chinese characters formal description method, a set of glyph comparing algorithm is presented. The algorithm of glyph comparing by stroke-segment and its context uses stroke-segment as comparing unit. The experiments on the GB2312 character set and some wrongly written characters, variant forms of characters, and combined-characters show that the results of glyph similarity comparing are less affected by the factors such as character structure types and strokes division. Free of training,the algorithm can compare character glyphs, and has a high rate of accuracy when the input character is basically the same size as the compared one. The algorithm of glyph comparing by the combination of stroke-segments, based on the stroke-segment-mesh, can automatically extract simple strokes, compound strokes. It uses simple strokes, or compound strokes and simple strokes adaptively as comparing unit. Experiments on the same character set of Chinese show that the algorithms based on simple stroke and compound strokes can also compute the similarity between character glyph without training, and the result is less subject to the size and different deformation of inclined strokes. The algorithms enjoy a high accuracy rate (nearly 100%) when choosing the first candidate from input glyphs of normal structure. The algorithms use bigger glyph comparing unit and can be applied for large-scale Chinese characters glyph searching with high efficiency. The comparing unit adopted can be easily mapped to the units in human cognition, and it is a"white-box" approach to glyph similarity computation. The method can be applied to the comparison of an entire Chinese character or part of it. It can find the differences between characters of non-standard structure with standardized structure characters, and therefore it can meet the needs of glyph-analysis-oriented application.The description and computation method of the structure relationship, based on the relationship matrix of strokes, are also provided, which can be used for the automatic identification of structure types of Chinese characters.3) With regard to the importance of components of Chinese characters in the research of physical structure of them, a component description method and the algorithm of automatically detecting components are attached to simple strokes of stroke-segment-mesh glyph. Experiments show that the algorithm can accurately detect the Chinese characters that have specific components, free from the influence of the location and the size of the components in the glyph.4) This paper also improves the description system of Chinese character structure of "Chinese character information dictionary", offering an algorithm for the calculating glyph similarity of Chinese characters based on structure description. The experiment results show that the similar character lists found by this algorithm have a high degree of consistence on structure and conform to human cognition. Therefore, the algorithm is suitable for similarity calculation of Chinese characters of definite structure classes.5) In this paper, an application software system– Toolkit of Chinese Character Glyph Description and Automatic Comparison and Analysis is designed and implemented, The tool creates a stroke-segment-mesh glyph description by popular hand-written and drawing method. Any imaginable Chinese characters can be put in, including wrongly written characters, variant forms of characters in ancient literatures, combined-characters, and other related information. The stroke-segment-mesh glyph can be automatically transformed to corresponding TrueType font, and processed just like those in the set of standard Chinese character. The tool can make a comparison among stroke-segment-mesh glyphs and find their similarities and differences as a whole or as part, and can find a similar character lists sorted by similarity. The work of creating 20,902 Chinese characters stroke-segment-mesh glyph description in GBK character sets and wrongly written characters written by foreign students studying in Beijing Language and Culture University has been completed by this tool. The Chinese characters glyph database has been applied to the analysis of spelling errors made by foreign students.The work will benefit the standardization of Chinese character glyph description and will found wide application in various fields based on Chinese character glyph computing, such as the input of Chinese characters outside of the standard character set, the construction of digital libraries in China, the research, the teaching, and international promotion of Chinese, the research into the history of Chinese characters and culture, the informationalized social management, etc.
Keywords/Search Tags:Chinese characters glyph, formal description, stroke-segment-mesh, glyph comparison, components
PDF Full Text Request
Related items