Font Size: a A A

Research And Application Of Semantic-based Automatic Text Summarization Generation Technology

Posted on:2021-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2518306512487764Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the proliferation of text data,natural language processing technology has developed well.In the field of natural language processing,automatic text summary generation technology is an important research direction.This technology can greatly reduce the time required to form artificial abstracts,help relevant personnel to quickly obtain domestic and foreign news updates,and make emergency response processing in a timely manner.In addition,using data mining,machine learning and other technologies,it is possible to dig deeper into the semantics behind the text to form a summary with higher quality and accuracy.This paper proposes a semantic-based automatic text summary generation algorithm that takes overseas news documents as research objects and uses extractive summary generation technology for multi-document summary extraction.However,traditional extractive abstract generation techniques rarely take into account the characteristics of timeliness and clear themes of news texts.In addition,few methods can effectively use semantic and structural information in the text to extract important information in the text.In view of these defects,an automatic text summary generation system for news text is designed with text network as the core model,aiming to generate summaries with higher quality,clearer theme,less redundancy and stronger semantics.Specifically,the contributions of this paper include the following four points:1.This paper proposes an improved TF-IDF text feature vectorization algorithm,which incorporates news popularity into the calculation formula of term weights,so that the term weights become smaller as news popularity fades.This method is a weight calculation method that accords with the characteristics of news itself.2.This paper designs a semantic network construction algorithm based on the similarity between documents and sentences to construct a document-level network and a sentence-level network.This paper uses the idea of LSA when considering document similarity.Based on the traditional "term-document" matrix,the matrix is reduced in rank by singular value decomposition,and the similarity calculation of the document is performed using the cosine similarity based on this low rank matrix.The rank reduction operation can remove the noise in the text,highlight the subject of the text,make the distance between related documents smaller,and the distance between unrelated documents larger,which helps to form a "high cohesion,low coupling" clustering effect.When considering the sentence similarity,this paper fully considers the contribution of sentence structure and semantics to the sentence similarity,and designs a sentence similarity calculation formula that meets the application scenario of this research.3.This paper proposes a two-stage density clustering method to cluster document-level networks and sentence-level networks respectively.The algorithm is adaptive and does not need to manually determine the number of clusters.It can automatically determine the number of clusters through the "power law".In addition,the two-stage clustering process can improve the efficiency of clustering.4.This paper proposes a feature abstraction-based text summary unit extraction algorithm that takes into account the curve characteristics of the exponential function and cosine function,and incorporates the factors of news text release time,and designs a sentence importance calculation formula that meets the characteristics of news text,making the generated abstract more time-sensitive.Finally,experimental verification was performed on the artificial data set and the standard data set.Through verification analysis from multiple perspectives,the semantic-based automatic text abstract generation model proposed by this research can effectively generate high-quality abstracts,and has better effects on news texts.
Keywords/Search Tags:Extractive Abstract, Text Clustering, Multi-feature Fusion, Semantic Similarity
PDF Full Text Request
Related items