Font Size: a A A

Design And Construction Of The Network Public Opinion Analysis System Based On The Identification Of Topics

Posted on:2014-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2248330395984169Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Accompanied by the rapid development of the internet, network has provide a broad platformfor more and more people to express their emotions and opinions and has gradually developed intothe gathering place of public opinion. In this realistic background, negative and destructive internetpublic opinion may deceive and mislead the public, causing a growing threat to the public safety ofthe community. As a result, it is necessary to effectively analyze the topic and remarks on thenetwork and to capture the dynamics of the internet public opinion timely. All of this has importantpractical significance to maintain social stability and to build a harmonious society. Therefore, thestudy of internet public opinion analysis technology has become a very urgent and important topic.This thesis focuses on the study and analysis of the key technology in the internet publicopinion analysis system---information acquisition, information preprocessing, hot discovery andtracking technology:1. Information acquisition and information preprocessing refers to the process of grabbingspecified range of web by the web crawler and forming weight vector through the webpurification and the Chinese word segmentation.2. The task of hotspot discovery technology is to aggregate related news coverage of an event intoa collection, discovering new events and forming new topic when new news coverage comes.Hotspot discovery is essentially a process of text clustering, according to a comprehensivecomparison of several existing cluster methods, ultimately determining to use Single-passclustering algorithm in this system..3. Topic tracking technology can help users to get the topic news they interested in and timelytracking these topics. For this reason, the task of topic tracking is system determining a topicaccording to a small number of samples and requesting the topic to identify with related newscoverage in information obtained afterwards. Topic tracking is essentially a text classificationproblems, according to a comprehensive comparison of several existing text classificationmethods and performance improvement of SVM algorithms, ultimately determining to useimproved SVM classification algorithm PCA-GA-SVM model as topic tracking algorithmin this system.Finally, the thesis introduces the specific function realization of the internet public opinionanalysis system. After the design of whole framework of the system, function realizations of eachmodule are specified. In this system, achieving hot topic detection and tracking based onMyEclipse6.0development environment and Mysql4.1.20database management system under theWindows operating system to provide auxiliary support for the relevant departments.
Keywords/Search Tags:Internet public opinion, Web crawler, Single-pass clustering algorithm, PCA-GA-SVMclassifier
PDF Full Text Request
Related items