Font Size: a A A

Research On Chinese Word Segmentation Based On Deep Learning

Posted on:2020-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y ShiFull Text:PDF
GTID:2428330590995686Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation is a basic task of Chinese natural language processing.The performance of the results will directly affect the final performance of pragmatic tasks,such as machine translation and information retrieval.With the wide application of deep learning in natural language processing,the neural network model has also shown excellent application results in word segmentation,but there are still some research points that can be improved.Based on the analysis of CNN,RNNand LSTM,this thesis proposes two word segmentation methods to improve the model architecture from different perspectives.The first word segmentation method proposed in this thesis treats the word segmentation as a sequence labeling problem,uses the Bi-LSTM-CRF to mark the text,and introduces the attention mechanism to improve the performance of traditional LSTM.Through a GCNN to fuse the environment block vector in the target word window effectively,and assist in a named entity discovery dictionary and the idea of PMI,calculating the attention weight.The LSTM model is strengthened to process the close-range context information,so as to improve the extraction of the feature relationship between word and word.The second participle method proposed in this thesis,for the limitation of the sequence labeling model,breaks the flaws of the window when the sequence is marked.The beem search algorithm is introduced to use the complete segmentation history to perform dynamic word segmentation.And with the powerful modeling ability of the deep learning model,the likelihood that a sequence of characters is a word and the rationality of the sequence connection are scored.Compared with the traditional segmentation method,this method can learn the rich features of three levels of characters,words and sentences,and can use the complete segmentation history to construct model.This model have sequence-level word segmentation ability,which can get better word segmentation performance.Finally,this thesis explores the impacts of several improved methods which proposed in this thesis on word segmentation performance to verifie that these two deep learning architectures have a certain positive effect on improving word segmentation performance.The method described in the text has many commonalities with the mainstream deep learning method,so it can also be applied in the post-processing of speech recognition and can be widely extended in other NLP sequence labeling tasks.
Keywords/Search Tags:natural language processing, deep learning, chinese word segmentation, attention mechanism, long short term memory, beam search
PDF Full Text Request
Related items