Font Size: a A A

Research On Neural Networks Based Uyghur Word Vectors Representation And Its Application

Posted on:2019-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:L H R L AiFull Text:PDF
GTID:2428330566467005Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The data representation is the basic task of natural language processing.Traditional data representation refers to the process of manually sorting feature information.In recent years,with the widespread use of deep learning and representational learning,data representation based on neural networks has performed outstandingly in various fields.In the more common natural language processing task,the word-bag model is used as the main semantic representation method.This method results in data sparseness due to incomplete data volume.Therefore,early methods are generally used to solve a certain type of problems,and the application level has limitations.This article summarizes and analyzes the neural network word representation technology,and uses this technique in Uighur morphology induction techniques and text sentiment classification tasks.In the study of word vector representation methods,the existing word representation techniques were analyzed theoretically and experimentally.Theoretically,the theoretical system of Skip-gram model and CBOW model is studied,and the experimental results are compared.Experimentally,the word representation technology was analyzed from the perspective of models,corpus and parameters.After the word vectors were generated using the above two models,the experimental results were evaluated on the performance of the two models in semantic,morphological,and neural network classification tasks.Due to the limited size of corpus,the experimental results of this paper show that the performance of CBOW model is stronger than that of Skip-gram.Based on the morphology induction method of unsupervised learning,only corpus training is needed during the entire process,and no additional morphological linguistic knowledge is necessary.The word vector is used to evaluate the difference rules according to the semantic similarity and morphological difference,and the semantic association is used to evaluate the morphological rules trained during morphological transformation,and this rules are used to build the morphological analyzer.The morphological analysis rules were evaluated using 1000 hand-organized morphologically segmented test sets,and finally an accuracy of 81% was obtained.Based on neural network sentiment classification tasks,theoretical analysis and experimental evaluation of CNN model,LSTM model and Bi LSTM model were performed.In the sentiment classification task,first of all,in the preprocessing part,the stemming,noise reduction and dimensionality reduction is performed on the corpus.Second,the pre-trained word vector is introduced to enable the model to obtain the semantic information between words and words.Make up and increase the emotional characteristic information contained in the corpus.Experiments show that in the same sentiment classification corpus,the morphology induction at the preprocessing stage and CNN model after the input of the word vector initialization model are increased by 1.8%,the LSTM model is increased by 3.7%,and the Bi LSTM model is increased by 3.9%,overall reflects the effectiveness of the classification method.
Keywords/Search Tags:Word representation, Morphological induction, Neural Networks, sentiment classification
PDF Full Text Request
Related items