Font Size: a A A

Research On Extracting Topic Sentences From News Based On Text Features And Correlation Analysis

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2428330629488461Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,news information has grown and spread rapidly.The generation and rapid dissemination of mass news information has enriched people's lives,but it also brings the problem of information overload,and people waste a lot of time in obtaining information.With the development of artificial intelligence and natural language processing technology,the research and application of news topic sentence extraction has solved this problem well.News topic sentence extraction is the basic work of automatic text processing applications such as automatic text summarization.It is an important research topic in the field of natural language processing.It aims to extract sentences that can concisely and accurately describe the main content of news documents.Most of the existing topic sentence extraction researches extract topic sentences based on the feature analysis of sentences or words.They only consider the statistical or positional characteristics of the text and ignore the semantic and topic information of the text.They also do not fully consider the contextual information of the text.Affecting the effect of topic sentence extraction.Or analyze the relationship between sentences and words in the document based on the graph model and sort the importance of the sentences in an iterative manner.The ordinary graph model only shows the binary relationship between sentences and words,and ignores the multiple relations between sentences and sentences,words and words,and words and sentences in the document.The topic sentence of a news is not only a sentence that accurately represents the content of the news topic,but also an important sentence in a news document.Based on the limitations of existing research and the characteristics of news topic sentences,this paper proposes a research method of news topic sentence extraction based on text features and association analysis,which will be mainly studied from the following two aspects:1)Research on text feature extraction.Starting from the representation of news text vectors,the statistical features of news text are extracted.The semantic features and topic information features comprehensively and accurately represent the statistical features of news text content,news semantic information,context information,and global topic relationships.Use similarity to calculate the relationship between sentences and text topics in text.2)Research on internal correlation analysis of text.This article constructs a hypergraph model to represent news text,constructs sentences in news text as edges of hypergraphs,constructs words as points of hypergraphs,analyzes sentences and words in news text,and the higher order between words and words,words and sentences relationship.The weights of the edges are reasonably designed based on the relationship between the sentence,the topic,and the title to distinguish between descriptive and non-descriptive sentences.Random walk the hypergraph according to the weight of the edges,and rank the importance of the sentences.Finally,the maximum boundary correlation algorithm is used to control the topic sentence redundancy,so that the extracted topic sentence can comprehensively represent the main information of the news document.This paper combines the statistical characteristics,semantic characteristics and topic information characteristics to reasonably represent the characteristic relationship between sentences and topics in text.The method of feature extraction in this paper has achieved good results in text classification experiments.The hypergraph model is used to analyze the relationship between multiple objects in the text,and the extracted text features are used to reasonably design the weights of the edges.The topic sentence is extracted by random walk of the edges.This method has achieved good results in the extraction of news topic sentences.
Keywords/Search Tags:topic sentence extraction, Text feature extraction, hypergraph, random walk
PDF Full Text Request
Related items