Font Size: a A A

Chinese Word-breaker Training Optimization Based On Tensorflow

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y P DongFull Text:PDF
GTID:2428330605469232Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The development of speech recognition,image processing and other fields are inseparable from deep learning technology,the use of deep learning structure of language models can be automatically characterized learning?Due to this characteristic,these language models are large-scale use of natural language processing field.Statistical language models are natural language models built through statistical-based methods,such as log-linear,n-gram,etc.In recent years.Chinese word-sharing technology has developed rapidly,laying the foundation for dealing with related problems in the field of natural language processing,and has also been widely used in data mining,precision recommendation and so on,This paper applies the circular neural network to the word-vector training task by making the corpus,building the skip-gram learning model,initial training and training optimization.The process of making a corpus involves downloading the corporate,pre-processing the corporatis and establishing a dictionary.Choose news as the buzzword because news is more semantic than other types of articles.Pre-processing of corporatic libraries includes de-punctuation,traditional trans-simplified.de-deactivating words,de-low-frequency words,Chinese word-sharing with jieba word breakers,etc.The purpose of creating a dictionary is to let the learning model know how many non-repeating words there are in the corpus.In the framework established by deep learning,the skip-gram model is constructed using the word2vec tool,the word vector training under this learning model is carried out.the training results are visualized,and the training effect is judged by calculating the average accuracy of the test set.The structure of the skip-gram learning model is only three layers,corresponding to the input layer,hidden layer and output layer of the circular neural network structure,and the construction of the skip-gram learning model includes the construction of word vector variables,the definition of the weight and bias of logic regression in negative sampling,the access of training,and the minimization of loss values.In this paper,two optimization schemes are developed,one is parameter optimization and the other is algorithm icing.Parameter optimization is the use of control variable method,the seven parameters are optimized,the optimization process to change the quality of one of the parameters,the other six parameters remain unchanged.Algorithm optimization is to improve the quality of word vector training by establishing hierarchical thesaurus.In the process of optimization,according to the training effect,the aim is to be able to more accurately unearth the semantic information contained between the words in the Chinese text by continuously optimizing the parameters of the learning model and the method of algorithm optimization.In this paper,after the successful construction of the training model,many optimizations were made,and the average accuracy rate was raised from 0.467 at the initial training time to 0.768 after the optimization training,compared with the initial training,the training effect was improved by 64.5%,and the optimization effect was remarkable.
Keywords/Search Tags:Deep Learning, Word Vector, Recurrent Neural Network, Language Model
PDF Full Text Request
Related items