Font Size: a A A

The Research And Design Of The Network Public Opinion Analysis System

Posted on:2016-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z M HeFull Text:PDF
GTID:2348330476455752Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, Internet has become one of the most important way for people to express their thoughts and sentiments. The information dissemination of Internet has features such as high-speed, wide-spreading and anonymity, so that it is easily to form large-scale network public opinions on the Internet. The network public opinions means the comments and sentiments to political policies, hot news and social issues made by Internet users via social network services such as weibo and BBS. Via analyzing the network public opinions on the Internet, the government and enterprises can control the situation and trends about the public opinions, and decide policies to avoid potential crisis and danger, so as to ensure the stable and harmonious developing of the enterprise and society.With respect to the efficiency and performance shortage of the traditional public opinion data acquisition, this thesis designs efficient and stable distributed web crawler public opinion data acquisition architecture. Text clustering and classification which are two core technology in network public opinion analysis are studied, and the deficiency thereof is improved and applied to the system in this thesis.. In this thesis, the main work includes:1. Distributed web crawler architecture. the Hadoop distributed cloud platform and Nutch distributed crawler technology are analyzed, the distributed crawler architecture which integrates the Nutch in the Hadoop cluster to obtain public opinion data is designed, the efficiency and accuracy of the data acquisition of public opinion by the system are ensured.2. Text Clustering Method. This thesis researches text clustering method and compares various algorithms, and selects BIRCH algorithm which is appropriate for network public opinion topics discovery to research, via experiments exploring and optimizing the threshold T selection, the accuracy of the text clustering algorithm is ensured, and the algorithm is used as a method of public opinion topics discovery.3. Text Classification Method. This thesis researches text classification methods and compares various algorithms,selects SVM as the base classification algorithm. With respect to the shortage that the algorithm only fits two classes classification, the binary tree model is added to reduce and optimize the selection of classifier, the efficiency and accuracy of classification is verified via experiments, and the algorithm is used as a method of public opinion topics tracing.4. System Design and Implementation. On the basis of relevant technical studies, this thesis designs the MVC framework mode system, logic architecture, software architecture, each functional module and UML class diagrams, completes the deployment of the system implementation.
Keywords/Search Tags:Network Public Opinion, Distributed Web Crawler, Text Clustering, Text Classification
PDF Full Text Request
Related items