Font Size: a A A

Complextext Sequence Labeling With BILSTM And CRF Algorithm Based On Peephole

Posted on:2019-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:L R ZhangFull Text:PDF
GTID:2428330551458114Subject:Wireless communications
Abstract/Summary:PDF Full Text Request
In recent years,great progress has been made in deep learning both in academia and industry.The use of deep learning skills to deal with natural language tasks has also been developing rapidly.Chinese word segmentation,part-of-speech tagging(POS),and named entity recognition(NER)serve as the basis for syntactic and semantic analysis are an important branch of natural language processing(NLP).In the extraction of current text information,the feature mining algorithm based on machine learning is difficult to exhaust all the features,and often depends on the technical level of experts in the field,so its accuracy is limited.In the task of processing segmentation,part-of-speech tagging and named entity recognition,it is still difficult to identify uncommon and ambiguous words.It is not optimistic to deal with longer sentences,and it is also difficult to judge sentences that are incorrect in grammar.Segmentation,part-of-speech tagging,and named entity recognition all belong to the seq2seq problem,so they are very similar in processing methods.With the development of deep learning in the processing of seq2seq problem,in recent years,the more popular processing algorithm is a combination model of bidirectional long and short-term memory network plus random vector field(BILSTM+CRF).This method provides an excellent framework for the processing of seq2seq problems.However,it is still very difficult to deal with the identification of complex named entities and long-sentence POS tagging.Based on this,this paper proposes a combination model of bidirectional long and short-term memory network with peephole plus random vector field.Combined with the RNN's Batch Normalization and dropout method,it deals with long sentence,irregular grammar and complex NER recognition.The algorithm was implemented and applied to the brief case analysis section of Sichuan Public Security Bureau,which provided a high-quality solution to the public security case handling process.This paper introduces the specific details of the project and uses the TensorFlow deep learning framework to accomplish the following tasks:(1)Translate high-latitude sparse matrices represented by ordinary words into embedded matrices of words,embed the embedded algorithm in front of the neural network,and allow the neural network to learn the representation of the embedded matrix by itself,eliminating the need for pre-training of the embedded matrix.(2)A bi-directional long and short-term memory network based on peephole connections is constructed as a hidden layer of the neural network,adding long-term memory information to memory decisions,and increasing long-term information decision-making.So that in the sentence of irregular grammar,the meaning of the text can be well analyzed.(3)Construct a random vector field algorithm to the neural network output layer,making up for the fact that the neural network cannot globally consider the probability characteristics of the generated sequence from a statistical point of view.(4)Introduced the fusion method of the first three algorithm modules,and used multi-language data enhancement strategy to process domain-specific corpora.(5)Introduced the related tasks of the brief case analysis module of the auxiliary case handling system of the Sichuan Public Security Bureau,and introduced how and why the algorithm was applied to this task.(6)Introduce the construction of the engineering environment,configuration,model deployment and on-line process.Finally,the algorithm's accuracy rate for word segmentation based on the People's Daily 2014 corpus reached 97%,and the accuracy of part-of-speech tagging reached 99%.Named entity recognition can accurately identify cases in the brief case text provided by Sichuan Public Security Bureau.And also can accurately identify the location of the incident,the person involved,the time of the crime,and can rule out other time interference,other address interference.
Keywords/Search Tags:Word Segmentation, POS tagging, Named entity recognition, BILSTM(with peephole), Conditional Random Field, Batch Normalization
PDF Full Text Request
Related items