Research On Chinese Single Document Automatic Summarization Based On Deep Learnin

Posted on:2024-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:S Yang

Full Text:PDF

GTID:2568306926484824

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Text abstract generation technology is an important task of natural language processing,and text abstract generation based on deep learning has achieved good research results and is widely used in intelligent meetings,search engines and other fields.However,the current effect of abstract generation for Chinese is not satisfactory,mainly because the language structure of Chinese has no inter-word boundary and there are many polysemy phenomena of Chinese words.Therefore,the model cannot obtain text vectors with sufficient and comprehensive semantic features,which ultimately affects the generation effect of abstracts.With the exponential explosion of Internet text data,the actual demand for Chinese automatic abstracts is gradually increasing,how to improve the generation effect of Chinese abstracts,and more accurately extract the essence and main purpose of the text,has become an urgent problem to be solved,and the study has important research significance and application value.Therefore,this paper conducts indepth research on the problems of poor readability,poor essence of extracted text and high redundancy in the Chinese abstract generation task,and proves that the proposed method effectively improves the effect of Chinese abstract generation through theoretical analysis,model design and experimental verification on LCSTS.The main work of this paper includes the following two aspects:First,aiming at the problems of non-standard and poor readability of the generated summary language in the Chinese summary generation task,this paper proposes a Chinese digest generation model based on dual-stream feature encoding under the Seq2Seq framework,and introduces the character features of the Chinese pre-training model NeZha and Word2Vec to obtain high-quality Chinese word vectors.In order to solve the problem of Out of vocabulary(OOV)in the coding stage,this paper introduces a pointer mechanism to help the model copy the word expansion vocabulary list in the input original text to obtain a more accurate semantic description.Through the above improvements,the Chinese summary language generated by the model is more standardized and more readable,and the results are better in both the automatic evaluation index Rouge and manual evaluation.Second,aiming at the problem of duplicate generated words and many related confused words in the Chinese abstract generation task,which leads to the generation of wrong facts,this paper introduces the noise collaborative attention mechanism on the basis of the RoBERTa model,and first controls the model to accurately identify the input body text and the output summary text by constructing a special MASK,and jointly completes the two-way training in the encoding stage and one-way training in the decoding stage.At the same time,a noise collaborative attention mechanism is designed to denoise the abstract generation stage to filter out the repeated or wrong words in the abstract,so as to alleviate the problems of high redundancy and poor consistency in Chinese generation.With the above improvements,you can get more accurate text summary generation results.

Keywords/Search Tags:

Abstract generation, Dual-stream features, Roberta, Seq2Seq, Attention mechanism

PDF Full Text Request

Related items

1	The Research On Chinese Automatic Abstract Generation Technology Based On Deep Learning
2	Research On Question Generation Method Based On Seq2seq Architecture
3	Research Of Image Abstract Generation Based On Deep Learning
4	Research On The Generation Of Tibetan News Abstracts Based On A Unified Model
5	Improved Attentional Seq2seq With Maxout Fusion Layer For Dialogue Generation
6	Research On Text Summary Generation Technology Based On Deep Learning
7	Research On Automatic Generation Of Chinese Text Abstract Based On CNN
8	Research And Implementation Of Automatic Abstract Generation System Based On Deep Learning
9	Research On Image Caption Generation Methods With Dual Attention Mechanism
10	Research And Application Of Image Description Generation Algorithm Based On Deep Learning