Font Size: a A A

Research On Several Key Techniques Of Tibetan Information Processing

Posted on:2017-07-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhuFull Text:PDF
GTID:1318330518499283Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and progress of Tibetan information technology, the Tibetan word processing technology has become increasingly mature. With the Unicode Tibetan coding international standards published and successful applications of OPENTYPE technology in Tibetan typeface design,the problem that the long-standing coding is not unified has been effectively resolved, greatly promoting the further development of Tibetan information processing technology. With the maturity of Tibetan words processing technology and the popularization and promotion of Internet in the Tibetan area, various Tibetan electronic resources have sprung up on the Internet, and establish the development of the Tibetan information processing technology. Currently the number of researchers has been increased to do the researches in Tibetan including words, phrases and syntactic processing technology, and they tend to deal with various tasks of Tibetan natural language using machine learning techniques. Due to the restriction of various factors, the information processing technology of Tibetan can't reach, such as the level of English, Chinese and other major languages. There are still many problems which should be solved. In this dissertation,we discuss and study several important issues in this stage about the Tibetan language information processing technology,and propose the corresponding solutions and algorithms.Concrete contents are listed as follows:1. For the quality of the text, we study algorithms of Tibetan syllable spell checking and methods of Tibetan text automatic proofreading. (1) For Tibetan syllable spelling errors, we analyze the Tibetan spelling error types, study the hidden text organization law, establish the Tibetan syllable rule model, and present the Tibetan syllable recognition algorithm and Tibetan syllable spelling checking algorithm; (2) For Tibetan Sanskrit transliteration error,connection relation error, word error, gramma error, and so on in the Tibetan text, we study a method of Tibetan text automatic proofreading, design a framework of Tibetan text automatic proofreading, propose the methods of spelling check, Tibetan Sanskrit transliteration check and word check and present an algorithm of Tibetan connective relation check.2. For the problem of preprocessing Tibetan stop words in Tibetan text, we investigate the methods of automatic selection of Tibetan stop words by TF, DF, and entropy, and present an approach for the selection of Tibetan stop words by the combination of Tibetan function words, special verb and automatic processing, which can determine a reasonable Tibetan stop words list. We show the Tibetan word distribution which also satisfies Zipfs law, and analyze the distribution of Tibetan function words, special verbs and high-frequency words.3. For the problem of named entity recognition in Tibetan text, we study Tibetan person name recognition techniques using conditional random field principle by taking the trigger words, function words, dictionary of names and personal noun suffix as features, present a method of personal name recognition based on word (syllable) and word bit information, and investigate the effect of personal name recognition under the combination of different features and feature optimization, and the refinement of different function words. In addition,we propose an approach for personal name recognition in Tibetan text based on deep learning. The Tibetan word embedding is trained firstly by word2vec and then Tibetan personal name is recognized by a deep neural network model. This method can not only produce better word embedding, but also can achieve high efficiency of Tibetan personal name recognition by adjusting the parameters in the neural network.All of the proposed methods and models are verified by experiments. Experimental results show their feasibility and effectiveness, which establish the foundation for the subsequent development of Tibetan automatic proofreading and information extraction.
Keywords/Search Tags:Tibetan information processing, spelling check, automatic proofreading, stop words, personal name recognition
PDF Full Text Request
Related items