Font Size: a A A

Research On Chinese Single Document Automatic Summarization Based On Deep Learnin

Posted on:2024-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2568306926484824Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text abstract generation technology is an important task of natural language processing,and text abstract generation based on deep learning has achieved good research results and is widely used in intelligent meetings,search engines and other fields.However,the current effect of abstract generation for Chinese is not satisfactory,mainly because the language structure of Chinese has no inter-word boundary and there are many polysemy phenomena of Chinese words.Therefore,the model cannot obtain text vectors with sufficient and comprehensive semantic features,which ultimately affects the generation effect of abstracts.With the exponential explosion of Internet text data,the actual demand for Chinese automatic abstracts is gradually increasing,how to improve the generation effect of Chinese abstracts,and more accurately extract the essence and main purpose of the text,has become an urgent problem to be solved,and the study has important research significance and application value.Therefore,this paper conducts indepth research on the problems of poor readability,poor essence of extracted text and high redundancy in the Chinese abstract generation task,and proves that the proposed method effectively improves the effect of Chinese abstract generation through theoretical analysis,model design and experimental verification on LCSTS.The main work of this paper includes the following two aspects:First,aiming at the problems of non-standard and poor readability of the generated summary language in the Chinese summary generation task,this paper proposes a Chinese digest generation model based on dual-stream feature encoding under the Seq2Seq framework,and introduces the character features of the Chinese pre-training model NeZha and Word2Vec to obtain high-quality Chinese word vectors.In order to solve the problem of Out of vocabulary(OOV)in the coding stage,this paper introduces a pointer mechanism to help the model copy the word expansion vocabulary list in the input original text to obtain a more accurate semantic description.Through the above improvements,the Chinese summary language generated by the model is more standardized and more readable,and the results are better in both the automatic evaluation index Rouge and manual evaluation.Second,aiming at the problem of duplicate generated words and many related confused words in the Chinese abstract generation task,which leads to the generation of wrong facts,this paper introduces the noise collaborative attention mechanism on the basis of the RoBERTa model,and first controls the model to accurately identify the input body text and the output summary text by constructing a special MASK,and jointly completes the two-way training in the encoding stage and one-way training in the decoding stage.At the same time,a noise collaborative attention mechanism is designed to denoise the abstract generation stage to filter out the repeated or wrong words in the abstract,so as to alleviate the problems of high redundancy and poor consistency in Chinese generation.With the above improvements,you can get more accurate text summary generation results.
Keywords/Search Tags:Abstract generation, Dual-stream features, Roberta, Seq2Seq, Attention mechanism
PDF Full Text Request
Related items