Font Size: a A A

The Research And Implementation Of Network Hostpot Analysis Based On Hierarchical Topic Model

Posted on:2020-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiuFull Text:PDF
GTID:2428330575457078Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and the popularization of mobile devices,text data presents an explosive increase,which accumulates and stores more and more information every day.Faced with such a huge amount of data and information,getting the information of actual needs and concerns from the numerous chaotic information becomes extremely difficult.Therefore,finding an effective way to avoid information overload and quickly obtain effective information from mass information has become one of the hot issues in today's information explosion.The emergence of search engines has provided great help for people to find useful information from mass information,however the search engines are usually based on keyword matching to complete the search of information.The query results are discrete and fragmented,which can not well reflect the timing and relevance of information acquisition.Therefore,the application of topic model in information extraction is of great practical significance for users in the analysis and requirement of personalization and verticalization in news information.This paper mainly studies and implements keyword extraction,hierarchical topic discovery,visual analysis of topic evolution and integration of network hotspot analysis system.1)Keyword extraction is studied based on TextRank model.The characteristics of current TextRank model keyword extraction are analyzed.A weighted TextRank model of word spacing and location distribution in documents is proposed,which is applied to keyword extraction of network news text.The weighting of word spacing and location distribution in documents is an extension of Markov chain,in which iteratively calculating the weights of the words in the document and calculates the weights to generate the probability transfer matrix,compared with other models,the F-value increment of the improved method is increased by 1.29%,3.14%,5.43%and 5.88%respectively when the number of keywords is 3,5,7 and 10,which verifies the effectiveness of the proposed method.2)Hierarchical topic discovery,which is based on PEM-HLTA model,an improved PWA-PEM-HLTA model is proposed for hierarchical topic discovery of web news text.The method of improving the model is to adds part of speech information to the pre-processing of term selection of the original model,and accelerates Aitken in the calculation of the original model.Three data sets,NIPS paper data set,Reuters data set and collected network text data set were used for experimental comparison.On standard NIPS and Reuters datasets,the average operating efficiency of the improved model is increased by five times.On the network text dataset,the average running efficiency of the improved model is increased by 4.7 times.The validity of the improved model is verified.3)In visualization analysis of topic evolution,static and dynamic presentation methods of existing text visualization models are analyzed and studied,and a static visualization presentation model of dynamic data of text theme is designed.The theme content of news text is simultaneously visualized in three dimensions:hierarchy,sequence and detail.The visualization analysis is analyzed and explained based on the"Kunshan hacking" incident on August 27,2018.4)Based on the above three research contents,this paper refers to the hierarchical structure design and implements a low-coupling,high cohesion network hotspot analysis system.The system integrates keyword extraction,hierarchical topic discovery and topic evolution analysis,including data acquisition,preprocessing,analysis from back end and information display from the front end.
Keywords/Search Tags:Keyword extraction, Word location distribution, Hierarchical topic discovery, Topic visualization
PDF Full Text Request
Related items