Font Size: a A A

The Research And Implementation Of Automatic Text Summarization System For New Media

Posted on:2020-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZhaoFull Text:PDF
GTID:2428330575957132Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rise of new media such as Weibo and Weixin Public Number,digital media on the Internet has replaced traditional paper media such as newspapers as the main channel for people to publish and obtain information.The rapid development of the Internet and the popularity of various mobile terminal devices have brought explosive growth of electronic text information in the network.How to quickly obtain effective information from massive text information has become a common concern of all sectors of society,and automatic text summarization technology is the core to solve this problem.Automatic text summarization technology can not only improve the efficiency of information acquisition,but also support many upper applications such as dialogue system,public opinion analysis and so on.It has broad application prospects.Current automatic text summarization technology has the following problems:(1)in short-text-oriented generative summarization technology,the generated abstract words are confused and repetitive,can't generate epiglossary words,and the model degrades in training;(2)in long-text-oriented extractive summarization technology,the extracted sentences are not coherent enough;(3)on the ground of model technology,high precision.There are few open systems for automatic summarization,and there is no practice of landing in the direction of distributed summarization model.In view of the above problems,this paper carries out the research on long and short text summarization technology and the exploration,research and practice of text summarization model in large data environment.The main works are as follows:(1)A new short text generating summary model(HCRPGN network)is designed and implemented:using CRF(Conditional Random Field)layer to alleviate the chaotic repetition of generated words;designing pointer generation mechanism to basically solve the OOV(Out Of Vocabulary)problem;constructing information path based on Highway architecture to avoid it.Degradation of deep network.Thus,the HCRPGN(Highway Condition Radom Pointer-Generator Network)model has been improved by 3%in three basic ROUGE indicators.(2)Exploring,researching and realizing a new model of extracting abstracts from long text-TextRank+CNN+VAE fusion model.By innovatively applying the idea of variational self-encoding in image domain to text summary,a text rewriting model is constructed and seamlessly linked with TextRank,which solves the problem of incompatibility of extractive summary sentences.(3)The landing and optimization of automatic text summarization system in distributed and massive data environment are studied.Using Redis to build distributed crawler to speed up data acquisition,using parameter server to build distributed GPU cluster to speed up model operation,and using CPU cluster to pre-process and post-process data to improve the parallelization of the system.The system formed by the thesis research institute will eventually run in Alibaba's "Shenma Search" department.
Keywords/Search Tags:Automatic Text Summarization, Neural Network, Extractive Abstract, Abstractive Abstract, Distributed Machine Learning
PDF Full Text Request
Related items