Font Size: a A A

Research On Punctuation Prediction Method In Real-Time Scenario

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y F SuFull Text:PDF
GTID:2428330575455163Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the process of manual text input,people will usually add punctuation in the text spontaneously according to the language habit,but some automatic text generation pro-cesses can only generate text without broken sentences and punctuation(Unpunctuated text).This kind of text makes it impossible for readers to find the starting point and ending point of a sentence and to distinguish a clear reading unit from the recognition result,thus causing serious ambiguity.The unpunctuated text also makes downstream natural language processing tasks difficult.Models that deal with these tasks often use punctuated text for supervised training,and using these models to deal with unpunctu-ated text can cause serious quality degradation.Therefore,it is necessary to annotate the unpunctuated text to improve the readability of the text and facilitate the process-ing of downstream tasks.It is worth noting that in the real-time scenario,the delayed punctuated text will cause heavy reading pressure to the reader.Especially in real-time speech recognition scenarios,this kind of text can also seriously distract the audience.Therefore,the streaming punctuated text with high annotation quality and low delay should be provided in the real-time scenario.In the aspect of annotation quality,there have been many works on predicting punctuation for the text with a certain length.The current research focus is to use neural networks to automatically extract features from texts,and then use extracted features for annotation.In some domains with sufficient training data,the existing methods can achieve relatively high quality.However,in some domains where the data is scarce,the existing methods can not give good results.In the aspect of annotation delay,existing annotation methods usually annotate long texts,resulting in a large delay in real-time scenarios.However,annotate short texts directly will seriously harm the annotation quality.The challenge of simultaneous punctuation prediction is how to distinguish the relatively short text from the dynamic text stream without significantly degrading the quality of the annotation,which means,to determine the appropriate timing of annotation in the streaming text.In view of the problems mentioned above,this paper conducts the following re-search:· For the problem of insufficient training data,this paper attempts to augment the training data and finds a more effective way to select data from multiple domains.In order to make more effective use of the semantic information in the expanded data,this paper uses pre-trained language model parameters to improve the model-ing ability for sentence representation.On this basis,in order to make effective use of the syntactic information in the expanded data,this paper improves the modeling ability for the syntactic structure by using the part-of-speech tagging task through the multi-task learning method.Experiments show that the proposed methods can effectively improve the quality of punctuation prediction in the spoken English do-main.· For the problem of time delay,this paper designs a simultaneous annotation model based on the existing punctuation prediction model.Firstly,the decision model is trained with the reinforcement learning method to model the timing of annotation execution,and the punctuation prediction model and the decision model are com-bined into a simultaneous annotation model.Secondly,in order to improve the an-notation quality without increasing the time delay,this paper proposes a method to fine-tune the parameters of the punctuation prediction model by using the action se-quence generated by the decision model.Finally,this paper also tries a pre-training method using short texts.Experimental results show that the method proposed in this paper can give relatively stable results under a short time delay.
Keywords/Search Tags:Simultaneous Punctuation Prediction, Deep Neural Network, Attention Mechanism, Reinforcement Learning
PDF Full Text Request
Related items