A Study On Sorting Algorithm Based On ISO/IEC10646 International Standards For Tibetan Coded Character Set

Posted on:2010-05-26

Degree:Master

Type:Thesis

Country:China

Candidate:B W D Bian

Full Text:PDF

GTID:2155360278963172

Subject:Chinese Ethnic Language and Literature

Abstract/Summary:

PDF Full Text Request

Sorting issue in Tibetan language is one of the key components of Tibetan Information Technology, and is also one of the most important indicators to represent the degree of the development of Tibetan information technology. It not only reflects the path of the development of Tibetan information technology, but even more importantly provides unprecedented technical supporting for the works like file searching, information searching, text sorting and etc.Taking a view of characteristics of Tibetan language, the paper designs a Tibetan sorting algorithm by analyzing Tibetan grammar rules and basic sorting rules of major Tibetan dictionaries. The algorithm resolves the issues of Tibetan sorting through four key modules, respectively the Root-letter Recognition Algorithm, Priority Algorithm, Sorting-related Digital Code String Access Algorithm and Quicksort Algorithm. In the process of designing the priority algorithm, the paper separates the priority algorithm into three different modules as structure priority, component priority and character priority by considering the complexity of Tibetan language and needs of Tibetan sorting. Since consonant characters of Tibetan language present the orderly nature, the paper proposes creatively the Root-letter Recognition Algorithm and three Priority Algorithms according to the basic rules of Tibetan sorting.The Root-letter Recognition Algorithm is firstly to accurately extract the root letter from each syllable in Tibetan and assign the syllable into the corresponding sorting group by recognizing the root letter. Then the structure priority deals with the sorting issues for those words with different structure and same root letter in each syllable, the component priority deals with the sorting issues for those words with same structure and different components in each syllable, and the character priority deals with the sorting issues for those words with same structure and components, but different component elements in each syllable. Thus, the algorithm not only resolves fundamental issues of Tibetan sorting, but decreases the complexity of time and space, which presents strong vitality.In Tibetan, different phrases have different numbers of syllables, and each syllable might include seven component elements. The median of digital code strings that generate these component elements reaches to over 28 bit, and the number increases dramatically while the numbers of syllables increase. Considering the facts listed above and complexity of Tibetan language, the maximum length of the syllable is set in the process of designing algorithm. While the numbers of syllables increase, new issues are emerged in the storage of digital code strings. 32 bit PC is unable to hand directly the numerical sequence for it exceeds the limit of digital capacity. In order to overcome the problem, the paper turns the original digital-formatted code strings into text format, which successfully solves the issue.

Keywords/Search Tags:

Tibetan Sorting, Root-letter, Structure Priority, Component Priority, Character Priority, Algorithm

PDF Full Text Request

Related items

1	An OT Account Of Priority "V+ON" Construct In Chinese
2	Effects Of Interference On Items Of Different Priority In Visual Working Memory
3	The Study Of Jonathan Schaffer’s Priority Monism
4	A Study On Representational And Morpheme Position Priority In Auditory Chinese Compound Word Processing
5	Chinese-English Longer Sentence Associated Tag Type And Model Of Commonality And Divergence
6	On The Priority Of Basic Liberty
7	A Study On The Priority Sequences Of Multiple-attributive Phrases In Modern Chinese
8	A Study On Priority Sequences Of Adjective-predicated Clauses In Modern Chinese
9	The Study Of He Xuntian’s Rupa Dance And Sunyata Dance
10	A Study On Priority Sequences Of Time Adverbs"Zheng"ã€"Zhengzai"ã€"Zai"