Font Size: a A A

Joint Chinese Word Segmentation And Punctuation Prediction Based On Improved BLSTM Multilayer Network

Posted on:2019-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y K LiFull Text:PDF
GTID:2428330566982891Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Currently,the popular way of Chinese word segmentation is to regard this task as an sequential labeling problems,and punctuation prediction can also be used as sequence labeling problems.Some traditional machine learning models have achieved good results in this task,such as hidden Markov model,conditional random field model,support vector machine model,maximum entropy model and so on.Besides,deep learning method achieves better results in sequence annotation tasks and other Natural Language Processing tasks than traditional machine learning methods.RNN(Recurrent Neural Network,RNN)is widely used in NLP(Natural Language Process,NLP).such as word tagging,Machine Translation,entity naming,and so on.Because the LSTM(Long Short-term Memory,BLSTM)network can effectively overcome the problem of gradient disappearance in the original RNN,it has been widely used in many NLP tasks.With regard to the network composed of LSTM units,the original LSTM network is a one-way structure,but the one-way LSTM network can only detect the information of the single side of the sequence.In order to overcome this shortcoming,the bidirectional LSTM network appears.At the same time,in order to get more abstract semantic information,Some scholars tend to use multilayer lstm networks.The existing multi-layer bidirectional BLSTM network structure is composed by two multi-layer LSTM networks.one positive,one negtive.The information fusion is carried out at the last layer of the two network output,and the output after the fusion contains the information of the two directions of the text sequence.This paper studies the network structure,and proposes an improved Bidirectional Long Short-term Memory(BLSTM)network.The BLSTM of each layer of the network will have a information fusion,and the output information contains more rich context information.At the same time,we find a joint task method,which can perform Chinese segmentation and punctuation jointly.Comparing the original pipeline scheme which perform the tasks one by one,the method described in this paper can greatly reduce the system complexity.This method can be used to deal with irregular social network data,and can also be applied to the later processing of speech recognition,and this processing method and idea can be widely extended to other NLP sequence labeling tasks.
Keywords/Search Tags:Chinese word segmentation, punctuation prediction, sequence label, bi-direction LSTM(long short term memory network)
PDF Full Text Request
Related items