Font Size: a A A

Research On Tibetan Word Representation Techniques

Posted on:2020-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:M Z X LaFull Text:PDF
GTID:2428330578964435Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The representation of a language unit is the fundamental work of machine learning,its goal is to represent the language unit in an optimized digital form so that the computer can better understand natural language.Words are the most basic semantic units of texts and the basis for understanding natural language.In recent years,with the development of neural network technology,word representation has played an important role in the field of natural language processing.The representation of words,sentences and documents in English and Chinese have promissing achieved results and have been widely used.Tibetan words represent technology,is still in the stage of exploration and initiation,its research has important theoretical significance and wide application value for the analysis of Tibetan lexical,syntactic and semantic analysis and the use of deep learning technology to deal with Tibetan language.The work presented in this paper was inspired the word representation techniques in English and Chinese,analyzed the lexical distribution rules and grammatical features in Tibetan texts,studied the key technologies of Tibetan word representation technology from the following three aspects: blend recognition,stop words selection,and Tibetan word representation model optimization.The main work includes:(1)Blend recognitionTibetan text segmentation is one of the key techniques for Tibetan word representation,blend recognition is a difficult problem in Tibetan word segmentation,its recognition effect has a great influence on the performance of Tibetan text segmentation.By analyzing the current condition and existing problems of Tibetan text segmentation,this paper designs a Tibetan blend recognition algorithm combining rules and statistics,and verifies the effectiveness of the method.(2)Stop words selectionStop words contain less semantic information and contribute less to contextual meanings.Since all languages contain many stop words,they are used frequently and therefore has a negative effect on the system model.The selection of stop words is not only an important technology for Tibetan natural language processing,but also one of the key technologies for Tibetan words representation.In this paper the author designs a Tibetan stop word recognition algorithm by establishing a Tibetan stop words table,which lays a foundation for the Tibetan word representation model optimization.(3)Tibetan word representation model optimizationThe word representation model aims to obtain the semantic information among the lexical sequences,the techniques to reveal the semantics of the target words according to the context is the core of word representation model.This paper improved the traditional word representation model by analyzing the traditional word representation methods,and designed a Tibetan word representation model that combines the word representation generated by the original texts and by the non-n stop words texts.The model was improved in performance compared to the traditional words representation model.
Keywords/Search Tags:Natural Language Processing, Tibetan, Blend words, stop words, word representation
PDF Full Text Request
Related items