Font Size: a A A

Research And Implementation Of Multi-document Automatic Summary System Of Network News

Posted on:2020-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuangFull Text:PDF
GTID:2428330578450895Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the context of information explosion in the network,there is a large amount of redundancy in the information in each of the same news topics.Different editors report different news and describe the news from hundreds of different perspectives.Although there are different key information,there are also a lot of redundant redundancy.It is already more difficult for users to take advantage of the fragmentation time to get the streamlined information of news content in a short period of time.In order to meet the needs of users to obtain target information,multi-document automatic summary technology has been the research target of more and more researchers.The so-called multi-document summary is a content summary document that is extracted from a plurality of news documents under the same news topic and extracts key information contained in the theme and removes a large amount of redundant information.By reading the system generated news content summary document,users can quickly and comprehensively understand the news key information,thus avoiding wasting time due to excessive redundant information.At the same time,if the user is interested in a certain news or a topic,the original news content can also be read in detail.The main requirements modules implemented by the system include news acquisition and preprocessing,news retrieval,summary document generation and data analysis reports.The news acquisition and preprocessing module mainly uses crawlers to obtain news data and process the data into the format required by the system.The news retrieval module can meet the needs of single item search or compound search according to news content,release time,and channel source.Summary document generation mainly uses NLPIR of Chinese Academy of Sciences for word segmentation,and then semantic disambiguation based on semantic dictionary to determine the unique meaning of words,and to mine new words according to the characteristics of online news.Next,the similaritycalculation based on the semantic dictionary is performed on words and sentences,which facilitates the subsequent clustering of words and sentences based on density.After the clustering,the content judgment is scored according to the content richness and importance of the sentence,and finally the summary sentence is sorted by the syntax recognition scoring method based on the dependency syntax analysis and the summary document is output.The data analysis report module uses D3 interactive technology to realize data visualization,visually displays the trend of news popularity and the source of news channels in the form of line graphs and pie charts,and forms a news topic report in combination with the summary text.Through the design and implementation of the above four modules,the system can basically generate a single news summary document and news topic summary document containing the main news content,and realize conditional retrieval and related news data visualization.
Keywords/Search Tags:Multi-document Summary, New Words Mining, Cluster analysis, Content determination, Syntactic recognition
PDF Full Text Request
Related items