Research On Chinese Lexical Analysis Model Algorithm Based On Deep Learning

Posted on:2020-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:S P Wang

Full Text:PDF

GTID:2428330572468593

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the Natural Language Processing,Chinese lexical analysis is a key basic research field;Its research results are directly related to the accuracy of Chinese syntactic analysis and semantic analysis,and affect the efficiency of advanced processing such as machine translation,intelligent question and answer.Chinese lexical analysis consists of two tasks: Chinese word segmentation and Chinese part-of-speech tagging.The two tasks are the key to Chinese lexical analysis.The existing Chinese lexical analysis research is mainly based on statistical methods.With the rise of deep learning,based on deep learning to solve the shortcomings of existing methods has become a hot issue in the field of Natural Language Processing.Firstly,aiming at the shortcomings of the existing Chinese word segmentation methods,such as long training time,inability to effectively use long-distance information,this paper proposes a combined model between Bi-directional Gated Recurrent Unit model and Linear Chain Conditional Random Field model based on deep learning.The model utilizes the powerful modeling ability of the Gated Recurrent Unit neural network,and obtains the score matrix quickly by calculating forward and backward.Then,using the Linear Chain Conditional Random Field model to consider the whole sentence's weight of the local features,getting the final segmentation result.The combined model breaks through the limit of the traditional method window,has a simple structure,is easy to operate,can automatically learn features,reduces the learning of specific knowledge,effectively uses context information,and truly realizes end-to-end processing.Experiments in MSRA and PKU prove the word segmentation model proposed in this paper not only obtains the best word segmentation results,but also greatly reduces the training time on the basis of guaranteeing the segmentation speed.Secondly,the existing research methods for Chinese part-of-speech tagging task still dependent on artificial feature.This paper proposes a pre-training algorithm on combined model.The combined model based on pre-training algorithm not only can automatically acquire features,but also has smaller error,with the increase of neural network depth,the robustness is better and the average variance is smaller.Through the experimental analysis of the PRF corpus,the combined model based on the pre-training algorithm effectively improves the accuracy and speed of part-of-speech tagging.Finally,we study the ambiguous word recognition problem.By adding the external weights of the Bi-directional Gated Recurrent Unit model,the common ambiguous words recognition result has improved.On the basis,this paper proposes an integrated model and a training algorithm for the integrated model.After experimental comparison,the integrated model achieves better results comparable to the individual word segmentation model and the part-of-speech tagging model.

Keywords/Search Tags:

Chinese word segmentation, Chinese part-of-speech tagging, Bi-directional Gated Recurrent Unit, Linear Chain Conditional Random Field

PDF Full Text Request

Related items

1	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
2	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
3	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
4	The Effect Of Part Of Speech On Chinese Word Segmentation
5	Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging
6	Word Segmentation And Pos Tagging In Chinese
7	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging
8	Research And Implementation Of Chinese Word Segmentation Based On Character Tagging Method
9	Research And System Implementation Of Chinese Word Segmentation In Specialized Fields Based On Conditional Random Fields
10	Research And Application Of Chinese Word Segmentation Method Based On Conditional Random Field