Font Size: a A A

Research And Development Of Opinion Mining Sub-system Based On Topic Model

Posted on:2014-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q GaoFull Text:PDF
GTID:2268330401965868Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous growth of web users, people began to get used to express theirviews and attitudes of hot events on the Internet. However, some netizens publishuntrue statements on sensitive events, or try to incite people’s discontent, which willpose a threat to social stability. Thus, all levels of government and relevant departmentshave begun to use network public opinion monitoring system to deal with this problem.Most existing opinion monitoring systems use statistics and keyword-basedapproach, and analysis only at word level. However, to ensure the speed, featureextraction must be used to reduce the dimension of the text vector, and this will lost alot of features and semantic information, leading to inaccurate results. To solve thisproblem, the topic model is introduced to the field of opinion monitoring to replacevector space model using in classification and opining judging. Compared to traditionalsystems, by using topic model, the dimension of feature matrix can be further reducedwhile keeping the accuracy of relative algorithms. Then the main algorithm and systemdesign are proposed:1. A sub-category classification algorithm based on SVM and LDA model isproposed after researching on performance of common classification algorithms.According to the experimental results, with proper parameters, this method can achieveacceptable performance with a99%reduction of features. Besides, the result ofclassification can also be an important reference for final decision.2. A public opinion judging model for single page is proposed. This model is basedon topic features, while considering traditional features such as keywords, writing style,author and so on, so that it can be used for opinion judging on kinds of web pages.Besides, a series of features are selected for opinion judging based on this model, andprove its validity through decision tree experiment.3. A topic-based keyword and shortest abstract extraction algorithm is proposed byimproving tradition algorithms. The algorithm proposed in this thesis uses a trainingmodel as “related fields”, so that keywords and abstracts can be extracted within asingle text. 4. A detailed design on analysis subsystem of the whole opinion monitoring systemis given based on research of related key technologies. The system can providefunctions on early warning, opinion searching, keywords attracting and reportgeneration. At last, the system is tested on false positive rate, negative rate and speed ofanalysis.The test shows that by replacing vector space model with LDA topic model, andusing classification and opinion judging model proposed in this thesis, the system canreduce the dimension of feature space while maintaining a low false positive rate andfalse negative rate.
Keywords/Search Tags:opinion monitoring, topic model, feature extraction, text processing
PDF Full Text Request
Related items