Font Size: a A A

Research On Abstractive Short Text Automatic Summarization Method In Speech Recognition Scenarios

Posted on:2021-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhuFull Text:PDF
GTID:2518306194975939Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet and mobile communication technology,the problem of information overload is becoming more and more serious.As one of the effective methods to solve this problem,automatic text summarization technology has always been a research hotspot in the field of natural language processing.At present,speech recognition and natural language processing are closely linked,and the text output by speech recognition system will also face a variety of natural language processing tasks.Therefore,this paper studies an automatic text summarization method in speech recognition scene to explore the feasibility of automatic text summarization technology in complex scenes.The research content of this paper is divided into two parts: text preprocessing based on ASR and text automatic summarization based on ASR.Part I aims to preprocess the text after speech recognition,and make it a general text preprocessing technology,which can be used by many natural language processing tasks including automatic text summarization.This part focuses on the three subtasks of Chinese word segmentation,part of speech tagging and punctuation prediction,and proposes a method that can complete these three tasks at the same time,and uses several popular sequential tagging models for comparison.Part II aims to use the generative method to generate high-quality summary based on the preprocessed text,and verify the necessity of the text preprocessing operation in Part I.Based on the Transformer model,I added the Pointer-Generator network and LVT mechanism,and then tried to add the part of speech tagging features,and compared all models with characters and words as basic encoding units.The experimental results show that in the part of text preprocessing,the method which integrates Chinese word segmentation,part of speech tagging and punctuation prediction to complete at the same time has relatively large accuracy loss for punctuation prediction task,but has little impact on the other two tasks.For these three tasks,the bidirectional LSTM network combined with self attention mechanism has better performance than the classic sequential annotation model of bidirectional LSTM combined with CRF.In the part of text summarization,the Transformer model is used as the benchmark model,and the experiments show that all the word-based models' performance are better than the character-based models'.In addition,the Transformer model combined with the Pointer-Generator network and LVT mechanism has achieved the best performance on the LCSTS data set so far.
Keywords/Search Tags:speech recognition, text preprocessing, automatic text summarization, Transformer, Pointer-Generator network
PDF Full Text Request
Related items