Font Size: a A A

Research On Text Classification System For The Internet Public Opinion Analysis

Posted on:2018-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhangFull Text:PDF
GTID:2348330512988988Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and information technology,more and more people are accustomed to using the network to obtain information resources,the use of network platform to express their own ideas and emotions.Because the Internet has a wide range of coverage,fast dissemination of information and high real-time characteristics,when there are some controversial events in social life,people are often able to report through the network in the first time to obtain relevant information.Therefore,the Internet has gradually become a global knowledge resource warehouse,affecting and changing the human lifestyle and behavior habits.Internet public opinion refers to the social society when the emergence of hot spots or focus on the Internet,the Internet users to form a tendency to the views of the opinions of the strong influence of public opinion.The network public opinion has the characteristics of suddenness,diversification and deviation,and has certain orientation and guidance to people's view and attitude.Network public opinion data mainly text,voice and video and other forms of organization,including text data accounted for the main part.Therefore,for the text data type as the main body of the public opinion analysis is very important.In this paper,a new text classification model,the Vitid Possibility(FP)classification model,is proposed for the urgent needs of Internet public opinion analysis.The core idea of this classification model is that for each category of text data in the training corpus,the statistical probability method is used to obtain the weight of its feature.Then the weighted statistics of the feature of the test text are obtained,and the similarity value is obtained.The category corresponding to the maximum value of the similarity is the judgment category.The FP classification model is a statistical text classification method that calculates the weight of a feature by using the absolute word frequency,the number of documents,and the number of feature items.The FP classification model does not require or requires the participation of domain experts to accomplish the task of text classification well and can be effectively applied to the text analysis of network public opinion.Based on the Java language platform,this paper implements a text classification system,and integrates two classification models: FP classification model and Naive Bayesian classification model.By comparing the traditional Naive Bayesian classification model under the same conditions and comparing the experiments in different corporal contexts,it is verified that the proposed algorithm can effectively improve the comprehensive performance of the text classification,The ideal classification effect,and greatly shorten the classification time,showing a stable classification ability.
Keywords/Search Tags:text classification, Internet public opinion, feature term, similarity
PDF Full Text Request
Related items