Font Size: a A A

Research And Implementation Of Chinese Word Segmentation Based On Neural Networks

Posted on:2018-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiFull Text:PDF
GTID:2428330515960113Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In Chinese,words are the smallest semantic unit,several Chinese natural language processing tasks are based on words.In alphabetical scripts,like English,words are separated by spaces.However,Chinese is ideographic scripts,punctuation is only used to split phrases and sentences,but not for words.Therefore,Chinese word segmentation is one of the basic task of Chinese natural language processing,and also the first difficulty to overcome.In recent years,as neural network has excellent nonlinear mapping ability,and ability to learn features automatically and avoid the traditional feature engineering,it has become a hotspot of research,and has been widely applied in various fields.In this article,we first introduce the basic concept of Chinese word segmentation,and its significance and common algorithm,then,we focus on the Chinese word segmentation based on neural networks,including Chinese word segmentation based on traditional feedforward neural network and Chinese word segmentation based on recurrent neural network.For Chinese word segmentation based on feedforward neural network,we mainly introduce the architecture of the algorithm and the details of each step.As feedforward neural network is limited by the input window size and is not able to capture long distance dependence,we introduce the recurrent neural network for Chinese word segmentation,and mainly introduced the long-short term memory(LSTM)neural network for Chinese word segmentation.As Chinese word segmentation based on LSTM has long training and predicting time,we propose a Chinese word segmentation method based on gated recurrent unit(GRU)neural network.In order to optimize this model,we introduce a preprocessed character embedding,which makes the model converge faster and achieves higher precision.To avoid over-fitting,we also introduce dropout.Finally,we implement the Chinese word segmentation based on GRU neural network and feedforward neural network,and compare them with the Chinese word segmentation based on LSTM neural network and traditional conditional random field(CRF),and we find that the Chinese word segmentation based on GRU neural network can achieve comparable segmentation precision with the best method,and has a significant improvement in training and predicting speeds.Meanwhile,through the experiments,it is proved that the introduction of preprocessing vector is able to make the model converge faster and get higher accuracy,and using dropout method is capable to avoid over-fitting.
Keywords/Search Tags:Chinese word segmentation, neural networks, gated recurrent unit
PDF Full Text Request
Related items