Font Size: a A A

Research On Chinese Lexical Analysis Model Algorithm Based On Deep Learning

Posted on:2020-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:S P WangFull Text:PDF
GTID:2428330572468593Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the Natural Language Processing,Chinese lexical analysis is a key basic research field;Its research results are directly related to the accuracy of Chinese syntactic analysis and semantic analysis,and affect the efficiency of advanced processing such as machine translation,intelligent question and answer.Chinese lexical analysis consists of two tasks: Chinese word segmentation and Chinese part-of-speech tagging.The two tasks are the key to Chinese lexical analysis.The existing Chinese lexical analysis research is mainly based on statistical methods.With the rise of deep learning,based on deep learning to solve the shortcomings of existing methods has become a hot issue in the field of Natural Language Processing.Firstly,aiming at the shortcomings of the existing Chinese word segmentation methods,such as long training time,inability to effectively use long-distance information,this paper proposes a combined model between Bi-directional Gated Recurrent Unit model and Linear Chain Conditional Random Field model based on deep learning.The model utilizes the powerful modeling ability of the Gated Recurrent Unit neural network,and obtains the score matrix quickly by calculating forward and backward.Then,using the Linear Chain Conditional Random Field model to consider the whole sentence's weight of the local features,getting the final segmentation result.The combined model breaks through the limit of the traditional method window,has a simple structure,is easy to operate,can automatically learn features,reduces the learning of specific knowledge,effectively uses context information,and truly realizes end-to-end processing.Experiments in MSRA and PKU prove the word segmentation model proposed in this paper not only obtains the best word segmentation results,but also greatly reduces the training time on the basis of guaranteeing the segmentation speed.Secondly,the existing research methods for Chinese part-of-speech tagging task still dependent on artificial feature.This paper proposes a pre-training algorithm on combined model.The combined model based on pre-training algorithm not only can automatically acquire features,but also has smaller error,with the increase of neural network depth,the robustness is better and the average variance is smaller.Through the experimental analysis of the PRF corpus,the combined model based on the pre-training algorithm effectively improves the accuracy and speed of part-of-speech tagging.Finally,we study the ambiguous word recognition problem.By adding the external weights of the Bi-directional Gated Recurrent Unit model,the common ambiguous words recognition result has improved.On the basis,this paper proposes an integrated model and a training algorithm for the integrated model.After experimental comparison,the integrated model achieves better results comparable to the individual word segmentation model and the part-of-speech tagging model.
Keywords/Search Tags:Chinese word segmentation, Chinese part-of-speech tagging, Bi-directional Gated Recurrent Unit, Linear Chain Conditional Random Field
PDF Full Text Request
Related items