Font Size: a A A

Applied Study On Chinese Word Segmentation Based On Deep Learning

Posted on:2018-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y H XieFull Text:PDF
GTID:2348330533460836Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Natural Language Processing(NLP)is a technology that makes a computer to understand human language.Among them,the word segmentation technology is a basic task.The NLP algorithm which commonly used in the world,or the deep semantic analysis are usually based on the word as the basic unit,word segmentation is usually the primary task of NLP.When establishing the model of NLP field,it is necessary for the researchers to grasp certain linguistic knowledge in order to extract suitable features.Deep learning has excellent generalization ability as well as ability of extraction characteristics based on unsupervised data set.Deep learning has an advantage that learning context information features from the training data,researchers just design the structural the neural network,in addition,provide high quality training data.In this paper,a Chinese word segmentation model based on word embedding and bidirectional Long Short-Term Memory(LSTM)model is constructed.As well as the parameter setting,and experimental process of the model are introduced in detail.Evaluation of word segmentation effect of Hidden Markov Model(HMM),Conditional Random Field(CRF),and bidirectional LSTM from the perspective of word segmentation accuracy.Let the corpus of Microsoft Research of Bakeoff 2005 be test corpus.For the closed test,the F-Measure of the segmentation models are respectively,CRF: 0.965,bidirectional LSTM: 0.931,HMM: 0.759.Otherwise,let the people's daily within 2014 corpus be the training data of the open test.For the open test,the F-Measure of the segmentation models are respectively,CRF: 0.854,bidirectional LSTM: 0.853 HMM: 0.762.Finally,compared with the open source Chinese word segmentation framework jieba,for the same test corpus,the F-Measure of jieba is 0.815.
Keywords/Search Tags:Natural Language Processing(NLP), Chinese Word Segmentation, Deep Learning, Long Short-Term Memory(LSTM)
PDF Full Text Request
Related items