Font Size: a A A

Research And Development Of Network Public Opinion Text Classification System

Posted on:2015-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZengFull Text:PDF
GTID:2308330473950551Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet, news commentary, micro-blog,forums become popular. More and more people tend to express their views and opinions on the network, so the network public opinion is very important. Due to the anfractuous information and overly abundant form of content on the Internet, the government cannot collect the network consensus expediently. In order to facilitate the government viewing the information and opinion which they are interested in, it is necessary to have the text clustered. This system is a subsystem under network public opinion monitoring system,which aims to classify the public opinion information grabbed by crawler.The public opinion information grabbed by crawler can be divided into long text and short text, the source of information form news, blogs and BBS topics is called long text, while the source of information from micro-blog, BBS posts is named short text.So far, the common classification algorithm performs well on long text, but for short text is not satisfactory. This paper first focus on the research of long text classification algorithm, then concentrated on the difficulties of classification of short text, it will emphasis how to improve the existing technology based on the research and analysis of the classification of short text. The research says:1. A research on feature selection algorithm and classification algorithm of long text. According to the results, in this system ultimately choose CHI statistics as feature selection algorithm, and chooser the RBF kernel function of SVM algorithm as classification algorithm.2. Propose a short text classification method based on feature expansion. In this method, the feature items expand through word2 vec, and then the results are feature selected and classified. The test results show that, under the premise of appropriate parameters, this algorithm can significantly improve the performance of short text classification algorithm.3. On the basis of related technologies, this thesis detailed design and implement the network public opinion text classification system. This system is consists of four modules: pre-processing module, feature selection module, text classification module communication modules. Pre-processing module introduces and implements text segmentation, filtering stop words and word frequency statistics. Feature selectionmodule implements commonly used feature selection algorithm and feature expansion.Text classification module implements the Naive Bayes and SVM algorithm.Communication module describes the implementation of the classification results in the web terminal display.4. Finally, the test is consists of functional testing and performance testing, and demonstrates the effectiveness and practicality of the system. Testing results show that the use of the short text classification based on feature expansion in this system, when the number of feature words in the rational case, the accuracy rate for short text classification test set was 73.98%, the recall rate was 74.61%, F1 values to 74.29%.
Keywords/Search Tags:Feature selection, Short text, Text classification, Network public opinion
PDF Full Text Request
Related items