Font Size: a A A

Automatic Summary Extraction Of News Documents Incorporating Sentence Sentiment

Posted on:2024-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:H HuangFull Text:PDF
GTID:2568306920986879Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In modern society,the richness of Internet resources makes it easy and fast for people to get various types of news,but the different positions of news writers and other reasons lead to news often with certain emotional tendencies.These news with positive or negative emotions have a profound impact on people’s perceptions,and they influence people’s views on current events in a subtle way,thus directly or indirectly influencing the trend of news and public opinion.Especially when a negative news is spread,it may have a certain negative impact on society.We need to collect and classify the huge amount of information,condense and refine it,and form an emotional overview of the document content,thus helping us to quickly and effectively grasp the key information,content,ideas,etc.in the article and grasp the current social dynamics and the new situation and trend of public opinion,which can reduce the browsing pressure on users due to information overload,and also allow the relevant media to better guide public opinion and Reduce some negative news.To this end,this research proposes a sentiment digest algorithm incorporating sentiment tendency and topic similarity based on graph ranking model and sentence feature approach with Chinese news text as the research object.The specific research contents as well as the main work are as follows:(1)A bi-directional long-and short-term memory neural network(Self-AttentionBi LSTM)text word separation model based on the self-attention mechanism is proposed.Due to the small number of open source datasets of Chinese news summaries,four datasets are firstly merged,and the data such as the number of text sentences,length and frequency distribution in the text set are statistically analyzed,and the unreasonable texts are compressed and deleted.The text separation model Self-Attention-Bi LSTM is proposed,which uses Self-Attention for word vectors and converts the streamlined sentences into feature sequences by Bi LSTM model to focus on bidirectional semantic dependencies.The improved Self-Attention-Bi LSTM textual word separation model improves the word separation accuracy to about 77.12%.(2)Improving Lex Rank algorithm to calculate sentence sentiment feature weights.Using the Chinese sentiment dictionary of Dalian University of Technology,sentiment markers are added to sentences in the news to facilitate the inclusion of sentiment information,and sentiment feature vectors are constructed.Based on the in-depth study of the Lex Rank graph model algorithm,the sentiment information is introduced to model the relationship between nodes in the graph network and to calculate the edge weights,so as to achieve the extraction of sentiment weights of sentences.After experiments to determine the specific values of sentiment feature values and comparison with the traditional Lex Rank algorithm,ROUGE-1 improves by about 4% and ROUGE-2improves by about 7%.(3)Extracting text topics and calculating sentence topic weights.A certain number of topics are extracted from the text using the topic model to find the topic sentences,which reflect the main content of the news article.In the process of extracting topics,the similarity between extracted topics and text sentences is calculated to obtain the ranking of topic weights,and the importance of sentences in the news text is measured by the similarity with topics,so as to identify topic sentences more accurately,and the experimental results show that ROUGE-1 improves about 9% and ROUGE-2 improves about 6% when compared with the traditional Lex Rank algorithm.(4)A text summarization method that fuses sentence sentiment and topic similarity is proposed to calculate sentence integrated feature weights.By calculating the integrated weights of sentiment feature weights and topic weights,the efficient extraction of sentiment summaries of news texts is achieved.After data training,fixed values of each weight are determined,and three sets of comparison experiments are set up.Compared with the traditional Lex Rank algorithm,ROUGE-1 and ROUGE-2 of the improved method improve about 10%;compared with the summary method that only integrates sentence sentiment ROUGE-1 improves about 12% and ROUGE-2 improves about 3%;compared with the summary method that only integrates text topics ROUGE-1 improves by about 3.5% and ROUGE-2 improves by about 4%.
Keywords/Search Tags:Text splitting, LexRank algorithm, Sentiment similarity, Topic similarity, Text summarization
PDF Full Text Request
Related items