Font Size: a A A

Research Of Chinese Word Segmentation Based On Deep Learning

Posted on:2019-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhaoFull Text:PDF
GTID:2428330578971915Subject:Engineering
Abstract/Summary:PDF Full Text Request
Chinese word segmentation(CWS),a basic task and key research topic in Natural Language Processing,has attracted more and more attention.Its segmented effect directly affects the processing of Part-of-Speech tagging,Named Entity Recognition and Semantic Analysis tasks.Therefore,it is a great theoretical and practical significance to study Chinese word segmentation.There are many algorithms for the Chinese word segmentation technique.The most common is to use the sequence labeling method for Chinese word segmentation.However,this method is greatly influenced by the size of the window.In this paper,we study the Chinese word segmentation algorithm based on Gated Combination Neural Network(GCNN)and Gated Recurrent Unit(GRU)neural network.The main research content is as follows1.Summarizing the development of Chinese word segmentation technology and Chinese word segmentation method based on Long Short-Term Memory(LSTM)neural network and Gated Recursive Neural Network(GRNN)was introduced.Both of these methods are based on the Chinese word segmentation method of sequence labeling,which transforms the process of Chinese word segmentation into the process of learning words position information.2.Analyzing the deficiencies of LSTM and GRNN networks,proposing an improved Chinese word segmentation method based on GCNN-GRU neural network.This method does not make tagging decisions on individual characters,but directly evaluates the relative likelihood of different segmented sentences,and then searches for a segmentation with the highest score.Firstly,the word vector is obtained from the word embedding through the GCNN network to avoid ignoring the uncommon and out-of-vocabulary words,and at the same time,the word score is calculated;then,the word vector is input into the GRU neural network to get contextual information,and the score of word segmentation result is the sum of the word score and the sentence score;Finally,the beam search algorithm is used to decode and obtain the highest scoring sentence.3.Under the same experimental settings,this paper carries out experiments on GCNN-GRU neural network and traditional neural network segmentation algorithm.Experimental results show that comparing with LSTM and GRNN,the word segmentation accuracy(F-value)of the improved Chinese word segmentation algorithm is improved by 0.7%and 0.8%respectively on the data set PKU and 1.1%and 1.2%respectively on the data set MSR.4.This paper uses the Dropout method to avoid over-fitting and pre-training character embedding by word2vec to get better segmentation results.GCNN-GRU network that have pre-training and Dropout processed improved word segmentation accuracy by 0.4%and 0.1%,respectively,over unprocessed GCNN-GRU.
Keywords/Search Tags:Chinese word segmentation, Natural Language Processing, Deep learning, Word vector
PDF Full Text Request
Related items