Font Size: a A A

Research On Mining Of Internet Public Opinion Based On Semantic And Statistic Analysis

Posted on:2013-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WanFull Text:PDF
GTID:1118330374971201Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and applications of network, the Internet has become an important source, even the main source from which public access to information. It also becomes an important place for people to exchange information and to express their views. Understanding Internet public opinion through the net and concerning about trends of Internet public opinion have important practical significance for the maintenance of social harmony and stability and also for the promotion of social democracy and legal construction.Information on the Internet is so enormous that the identification and assessment in manual way are powerless. How to use computer network technology, artificial intelligence techniques and data mining techniques to mine and analyze the Internet public opinion becomes a new research focus. There are some urgent and important issues in this domain, such as how to identify and categorize the hot topics, from public opinion information on web; how to determine whether the attitude of the people on an social event is positive or negative; how to analyze trend of the fluctuation of the hot social events, etc. These issues have important scientific and practical significance on recognizing and guide of Internet public opinion.This dissertation does some research in the mining and analysis of Internet public opinion information, such as utilizing Web document classification technique to do some classification of emergencies on Web, adopting machine learning methods to analyze the sentimental orientation of Internet public opinion, and giving statistic analysis for fluctuation in investigating the trend of Internet public opinion. Specific contents and the innovations are as follows:1. In this dissertation, Fisher discriminate analysis is introduced in text classification of Internet public opinion, and then so is the classification of emergencies. Internet public opinion, caused by the unexpected events, is in the content form of document, classification of Internet public opinion is then converted to one text classification problem. Fisher criterion is an effective way in solving the dimensionality reduction problem, but few studies are available in text classification. As a feature extraction method, Fisher discriminate criterion is applied for text classification problems, and then is used in classification of emergency management. As for Internet public opinion research, according to accordance public safety, emergencies are always classified into four types of emergencies, sudden natural disasters, accidents, disasters, public health events, social security events. The experiments proved that the Fisher criteria slightly inferior to the method of information gain, but compared with other feature extraction methods are better.2. Through analyzing of latent semantic analysis theory, including the singular value decomposition, calculation of the similarity relations between documents, this dissertation proposed a new algorithm, LR-LSA, for web texts classification. In order to eliminate the analysis of the limitations of the method of latent semantic analysis, the proposed algorithm LR-LSA, SVM classifier for a category relevance of each document, and then uses the correlation to different categories to generate the local area. Two classification experiments on Chinese corpus verified that the performance of the LR-LSA is more effective than LSA.3. The method based on machine learning for the sentiment analysis considers less sentimental feature extraction, and then in this dissertation we present a method, PMML, combining the feature extraction and machine learning for analysis of sentiment. We introduced relevant research, including the classification at the different granularity, and the methods based on machine learning. Web comments texts, after conducting basic sub-word first, are divided into set of keywords. For those key words, we designed some patterns which are often used in emotional expression. After the success matching to those patterns, emotional features are gotten and in the form of sequence. For each feature pattern, we calculated separately the emotional tendency, and then adopting machine learning method finally to obtain the emotional tendencies of the web comment. The experiments illustrate the effectiveness of the PMML when compared to machine learning method in the classification performances.4. This dissertation analyzes the evolution of fluctuations of Internet public opinion information based on GARCH model. In the propagation of hot events, fluctuation is an important feature of the strong fluctuations often means that the information content are copied, reproduced and update fast. By the analysis of the evolution of Internet public opinion in the fluctuations of characteristics, such as the rate of change of sequence showing the heteroscedasticity with sharp peak and thick tail, and compared to the fluctuation of the financial variables, we introduce GARCH models to express such fluctuations. Use the numbers of Web pages that are collected by the hot events from the major search engines and quantitative analysis of public opinion, the evolution of trends are investigated associated with this event. We choose social hot events "Wenzhou Train Accident" as example, including data collection, calculation of the rate of change and we establish GARCH, EGARCH, TGARCH models, respectively. The empirical analysis shows that these GARCH models are feasible in the analysis of the fluctuation of public opinion evolution.
Keywords/Search Tags:Internet public opinion mining, Fisher discriminate analysis, Classification ofEmergencies, Local Latent Semantic Analysis, GARCH models
PDF Full Text Request
Related items