Font Size: a A A

Research And Implementation Of Network Intelligent Analysis System For Business Public Opinion

Posted on:2017-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2308330485988160Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to internet’s fast, low threshold and interactive features, freedom of speeches has been realized. Internet public opinion has become an important factor affecting the decision behavior of the ruling party. At present, the domestic and international business competition grew more and more tense; consumer awareness and self-protection awareness grew more and more mature, but the crisis response ability of commercial institutions in the market is really weak, this highlights the importance of network business public opinion guiding strategy. The public opinion system in foreign countries is more complete, domestic network public opinion systems nowadays are more about the government application and military regulation, public opinion systems involving commercial applications are quite few. Plus it is complex to construct a public opinion system. It needs plenty of technologies, including many algorithms which directly affect the accuracy of public opinion in information extraction.In this thesis, several key technologies in the analysis of public opinion will be considered focused on the accuracy of clustering and information extraction. The main work is as follows:1. We adopt and merge several existing corpus of public opinion, focusing on the commercial content extraction, using the Boolean model to obtain the rough classification and vector space model to express into a matrix form. This reduces the number of subsequent clustering text and provides the possibility to improve clustering accuracy.2. A new algorithm called EM-NWTF is proposed according to original TF-IDF text representation method which aims at the shortcomings of the original algorithm. The emphasis is on the calculation method of IDF in the formula. The anti-document frequency among different classes is calculated combined with the results of the Boolean model which solves the impact of rare words in original algorithm and the low distinction among similar texts. At the same time, the position weight and emphasize factor are proposed. The position weight takes the importance of characteristic values in the first and last paragraph into consideration. The emphasize factor considers whether the characteristic value of the middle part of the text is evenly distributed. A simulation experiment is designed and used to compare with the original algorithm and analyzes the reasons for the accuracy of the algorithm.3. An improved K-means algorithm is proposed, focusing on the shortcomings of the original algorithm: need to confirm the number of clusters K before clustering, sensitive to noise and isolated points, huge impact on selection of initial center. We using similarity to measure the distance between texts, set threshold to filter noise and isolated points, create an influence coefficient and a new method to calculate the new cluster centroid. A simulation experiment is designed and used to compare with the original algorithm. Finally we analyze robustness, algorithm accuracy and other related parameters.4. A simulation experiment is designed to test the public opinion system based on Hadoop platform, Map Reduce programming model and HDFS storage mechanism. Using the related data to test functions and then analyze the results.
Keywords/Search Tags:public opinion, TF-IDF, k-means, hadoop
PDF Full Text Request
Related items