Font Size: a A A

The Application Of Automatic Extraction Technology Of Text Abstract In Digital Printing

Posted on:2021-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:X X GuanFull Text:PDF
GTID:2428330623981251Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In the era of digital information explosion,"fast reading" has become a fashion.Abstract automatic generation is the epitome of "fast reading".Automatic extraction of text abstract has been one of the research focuses in academia.Applying the automatic extraction of text abstract in digital publishing can quickly extract the main content of the article,and ensure the reading efficiency of users.Digital publishing has strict requirements for the text structure.Accordingly,this paper divides the abstract automatic extraction into three parts: the first part mainly involves the Chinese text segmentation,the second part focuses on the generation of abstract title,and the third part deals with the abstract extraction.Finally,by combining the three parts,an automatic text abstract extraction system is designed and applied to digital publishing.Firstly,a text segmentation model based on Bidirectional Long and Short-Term Memory Neutral Network of the Self-Attention Mechanism(SAt-BiLSTM)is proposed By processing word vectors with the Self-Attention mechanism,the text corpus is simplified sentence by sentence.The simplified sentences are represented by the feature sequence with BiLSTM to synthesize the feature vector of the text and complete the text segmentation.Secondly,this paper designs an Automatic Title Generation Model based on the dependency syntax tree.In this model,TF-IDF algorithm and Stanford Core NLP are used to build an Automatic Title Generation Algorithm based on the dependency syntax tree,and some rules are built to cut the syntax tree to ensure the maximum compression rate.Through research,it is found that the designed model can ensure the readability and better extraction of text main information.After that,in order to ensure the accuracy of automatic extraction of text abstracts,combined with the above Chinese word segmentation model,a model of automatic text abstract extraction combined with reader reviews is designed.This model introduces reader reviews as the comment factor so as to improves the disadvantage of the deficiency of overly focusing on the original text and neglecting the reader reviews.It also introduces the topic factor and sentence position factor to ensure the readability of the generated abstract.Experiments show that compared with the traditional method,this method can increase the P,R,F,and the extracted abstracts are improved in terms of consistency and connectivity,which ensures the standardization of digital publishing language.Finally,combining with the practical application of digital publishing,the paper puts forward the planning and designing scheme of the automatic extraction system of text abstract.The system can effectively achieve tasks of text segmentation,review keyword extraction,title generation,and automatic extraction of text abstract.Meanwhile,it is simple in design and convenient in operation.
Keywords/Search Tags:Digital publishing, SAt-BiLSTM, Title Generation, Abstract generation, Text review
PDF Full Text Request
Related items