Font Size: a A A

Southwest Jiaotong University Researsh And Realization Of Tibetan Encoding Recognition And Converdion

Posted on:2011-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChunFull Text:PDF
GTID:2178360305961025Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of computer and network technology, Tibetan information processing work made great progress, because international and national standards of Tibetan coded character sets is relatively backward, software vendors uses different Tibetan coded character standards in the current market, therefore, various Tibetan information recourses, website recourses can't compatible and share, thus seriously affected the development of Tibetan information processing. At present, most of the document information code is still using the GB2312 system where coded Tibetan character sets.This thesis first discussed critical problems with Tibetan encoding identification and conversion. According to Tibetan character structural and its statistical characteristics, then introduce various possible recognition rules and the results were analyzed and compared. Used characteristics of distance regulation and high frequency between Tibetan syllables to determination encoding identification of FOUNDER Windows, FOUNDER Dos, Tonguer, HURGURNG Windows, HURGURNG Dos, Pandita, ISO/IEC10646 Basic set and Tibetan coded character sets-Extension A, correctly distinguish Tibetan text with other languages.For implementation of nonstandard Tibetan into the national standards or international standards, this thesis mainly to work in two parts:first designed mapping table for nonstandard Tibetan encoding based on GB2312 system to Tibetan coded character sets-Extension A, according to results were analyzed and compared difference encoding characteristics, realized conversion program from FOUNDER Windows, FOUNDER Dos, Tonguer, HURGURNG Windows, HURGURNG Dos, Pandita to ISO/IEC 10646 Basic set and Extension A of Tibetan coded character sets. Also realized conversion from Extension A of Tibetan coded character sets to ISO/IEC 10646 Basic set of Tibetan. Finally designed Tibetan encoding identification and conversion system and large numbers data test, achieve the desired effect for Tibetan encoding identification and conversion.
Keywords/Search Tags:Tibetan encoding, Tibetan encoding identification, Tibetan encoding conversion, syllable dot
PDF Full Text Request
Related items