| With the rapid development of the Internet,the number of online news is growing rapidly,and it is difficult for people to accurately and quickly obtain the key information they need in the face of massive news information.Therefore,an automatic summarization system of news events was designed.The system may be able to collect news reports of relevant events according to keywords input by the user,obtain an evolution process of news events and a text summarization of each sub-topic.The main work and innovations of this paper are as follows.Firstly,an improved Single-Pass clustering algorithm is proposed.This algorithm uses the Doc2 vec model to represent news text information,which can better mine the semantic information of the text.According to the characteristics of the sub-topic of news events,a composite similarity calculation method for news reports is designed.This method fully considers the importance of news headlines to the expression of news text information,and the text similarity of news reports is composed of the similarity between news headlines and news body content.In addition,considering that the release time is a key factor in the clustering of news sub-topic,this paper introduces the time similarity calculation method of news reports.Then,using the text similarity and time similarity of news reports to calculate the composite similarity of news reports,an improved clustering algorithm is given.Secondly,an automatic text summarization algorithm based on Text Rank was proposed.A representation method based on Word2 vec model is designed to solve the problem of sentence representation,and the automatic summarization work of text is completed as follows:(1)the sentence text was vectorized using the proposed sentence representation method.(2)considering the similarity between sentences,the coverage of keywords and the similarity between sentences and titles,the weight of influence between sentences is calculated.(3)the Text Rank iterative algorithm is used to calculate the final weight of the sentence,and select the sentence with the highest ranking for retouching and reordering,as the text summarization of the news report.Finally,based on the above method,the clustering module and the summarization module are designed and implemented to complete the automatic summarization system of news events.The system obtains the summarization of evolutionary stages of news events by implementing functions such as news data collection,text preprocessing,clustering of sub-topic,automatic text summarization and Web presentation.At the same time,the system adopts the MVC’s design philosophy,each module has its own functions and does not interfere with each other,which is conducive to the update and expansion of system functions in the future. |