Research On Tibetan Word Representation Techniques

Posted on:2020-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:M Z X La

Full Text:PDF

GTID:2428330578964435

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The representation of a language unit is the fundamental work of machine learning,its goal is to represent the language unit in an optimized digital form so that the computer can better understand natural language.Words are the most basic semantic units of texts and the basis for understanding natural language.In recent years,with the development of neural network technology,word representation has played an important role in the field of natural language processing.The representation of words,sentences and documents in English and Chinese have promissing achieved results and have been widely used.Tibetan words represent technology,is still in the stage of exploration and initiation,its research has important theoretical significance and wide application value for the analysis of Tibetan lexical,syntactic and semantic analysis and the use of deep learning technology to deal with Tibetan language.The work presented in this paper was inspired the word representation techniques in English and Chinese,analyzed the lexical distribution rules and grammatical features in Tibetan texts,studied the key technologies of Tibetan word representation technology from the following three aspects: blend recognition,stop words selection,and Tibetan word representation model optimization.The main work includes:(1)Blend recognitionTibetan text segmentation is one of the key techniques for Tibetan word representation,blend recognition is a difficult problem in Tibetan word segmentation,its recognition effect has a great influence on the performance of Tibetan text segmentation.By analyzing the current condition and existing problems of Tibetan text segmentation,this paper designs a Tibetan blend recognition algorithm combining rules and statistics,and verifies the effectiveness of the method.(2)Stop words selectionStop words contain less semantic information and contribute less to contextual meanings.Since all languages contain many stop words,they are used frequently and therefore has a negative effect on the system model.The selection of stop words is not only an important technology for Tibetan natural language processing,but also one of the key technologies for Tibetan words representation.In this paper the author designs a Tibetan stop word recognition algorithm by establishing a Tibetan stop words table,which lays a foundation for the Tibetan word representation model optimization.(3)Tibetan word representation model optimizationThe word representation model aims to obtain the semantic information among the lexical sequences,the techniques to reveal the semantics of the target words according to the context is the core of word representation model.This paper improved the traditional word representation model by analyzing the traditional word representation methods,and designed a Tibetan word representation model that combines the word representation generated by the original texts and by the non-n stop words texts.The model was improved in performance compared to the traditional words representation model.

Keywords/Search Tags:

Natural Language Processing, Tibetan, Blend words, stop words, word representation

PDF Full Text Request

Related items

1	Natural Language Processing, Words Related To Knowledge No Guide For Build And Balanced Classifier
2	Research On Words Segmentation Algorithm And Word Variant Extraction Method Of Message Variety Based
3	The Key Technologies Of Representation Of Tibetan Word Vector
4	A Representation Method Of Chinese Characters And Words Based On Word-Character Alignment
5	Natural Language Processing-A Study Of Vectorization Of Chinese Words And Short Texts
6	Research On Tibetan Text Classification Technology Based On TWC?CNN
7	Research On Several Key Techniques Of Tibetan Information Processing
8	The Study Of Comparison Between Mongolian Stop Words And English Stop Words
9	Chinese Automatic Segmentation Technology And Its Intelligent Interface In Robot-assisted Education In Applied Research
10	Research And Application Of Mining Algorithm For Online Chinese Comments