Font Size: a A A

Text Mining And Visualization On Hot Event-based Of Microblog

Posted on:2019-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2428330575492234Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Internet Web2.0 era,Sina Weibo has become a leader in China's social platform and is an important platform for people to obtain news and propose ideas.The number of blog posts has grown exponentially and covers a wide range of fields.And it contains user's emotions and opinions which are important for government agencies to monitor public opinion,and for enterprises to formulate user behavior strategies.This paper takes the blog post data and users' information of the hot events as research subjects,and conducts research on data acquisition,text mining,and information visualization.In the data acquisition part,implement web crawler based on simulated login,and take the keyword of hot event and the time of start and end as a filter condition,achieved data collection efficiently,provides comprehensive and sufficient data for subsequent research.In the text mining part,apply the traditional text categorization methods to textual emotion classification of Weibo.Analysis of the effect of different feature extraction methods,feature selection and different classification algorithms on text sentiment classification.This paper uses the unigram and bigram models of the N-gram algorithm are selected for feature extraction.The information gain algorithm is used for feature selection.Support Vector Machine,Naive Bayesian and other five classification algorithms are selected for emotional classification.In the experiment,compared the effects of the different feature extraction and before and after the feature selection.The effects of five classification algorithms are compared under different feature dimensions.The experimental results show that,under a certain feature dimension,adopt unigram model,information gain algorithm and the Bernoulli Naive Bayesian algorithm to the emotion classification is best,accuracy rate reached 86%,the AUC value is 0.93,In the information visualization part,combined with the E-Charts framework,the text sentiment analysis results,users' information and blog basic information are displayed on a web browser.The visual forms include bar charts,pie charts,and maps.
Keywords/Search Tags:Weibo, Crawler, Text mining, Machine learning, Sentiment analysis, Visualization
PDF Full Text Request
Related items